Research Quick Start Guide¶

NON-NORMATIVE.

Goal: Run experiments to validate Morphism's governance claims scientifically

For benchmark suite and CI behavior, see benchmarks-and-experiments.md.

Prerequisites¶

# Install dependencies
pip install numpy pandas scipy matplotlib seaborn

# Verify Morphism installation
python scripts/maturity_score.py --ci --threshold 60

Experiment 1: Generate Synthetic Corpus¶

# Generate 100 small repositories with random drift
python scripts/research/generate_synthetic_repo.py \
  --output experiments/synthetic_repos \
  --count 100 \
  --size small \
  --seed 42

# Output:
# experiments/synthetic_repos/
#   synthetic_0000/
#     GROUND_TRUTH.json
#     AGENTS.md
#     SSOT.md
#     ...
#   synthetic_0001/
#   ...
#   MANIFEST.json

Expected: 100 repositories, ~200 drift injections total

Experiment 2: Measure Detection Accuracy¶

# Run all enforcement scripts on corpus
python scripts/research/measure_detection_accuracy.py \
  --corpus experiments/synthetic_repos \
  --output experiments/detection_accuracy.md

# Output:
# experiments/detection_accuracy.md  (human-readable report)
# experiments/detection_accuracy.json (machine-readable results)

Expected Results: - Precision ≥ 0.95 - Recall ≥ 0.95 - F1 ≥ 0.95

If targets not met: Investigate false positives/negatives, improve detection logic

Experiment 3: Convergence Validation¶

python scripts/research/measure_convergence.py \
  --corpus experiments/synthetic_repos \
  --kappa 0.1 0.3 0.5 0.7 0.9 \
  --iterations 12 \
  --perturbation-trials 100 \
  --output experiments/convergence/results.json

Expected: Convergence iterations match theoretical bound within 5%

Experiment 4: Performance Benchmarks¶

python scripts/research/benchmark_performance.py \
  --corpus experiments/synthetic_repos \
  --output experiments/benchmarks.json

# Generates timing data for complexity analysis

Expected: O(n log n) or better for all scripts

Experiment 5: Comparative Evaluation¶

python scripts/research/compare_tools.py \
  --corpus experiments/synthetic_repos \
  --output experiments/tool_comparison.json

Expected: Morphism detects 3x more drift than baselines

Experiment 6: Real-World Case Studies¶

python scripts/research/analyze_real_repos.py \
  --input docs/research/real_repo_targets.json \
  --output experiments/real_repos/results.json \
  --markdown docs/research/reports/real-repo-analysis.md \
  --workdir /tmp/morphism-real-repos

# Use --limit N or per-entry \"local_path\" overrides for fast local/debug runs
# After each run, update docs/research/reports/real-repo-analysis.md with TP/FP notes.

Expected: Find actionable violations in 80%+ of repos

Longitudinal Drift Tracking¶

python scripts/research/longitudinal_tracker.py \
  --repo-config docs/research/longitudinal_targets.json \
  --output-root experiments/longitudinal \
  --summary experiments/longitudinal/summary.csv \
  --workdir /tmp/morphism-longitudinal

# Optional flags:
#   --date YYYY-MM-DD (backfill)
#   --dry-run (skip writes)
#   --continue-on-error

Expected: 90 consecutive days of κ trajectories with ≤3 missed runs and weekly annotations in docs/research/reports/longitudinal.md.

Generate Publication Figures¶

# TODO: Implement generate_figures.py
# Creates plots for paper

python scripts/research/generate_figures.py \
  --data experiments/ \
  --output paper/figures/

# Generates:
# - convergence_rate.pdf (κ vs. iterations)
# - detection_accuracy.pdf (precision/recall bars)
# - performance_scaling.pdf (time vs. repo size)
# - comparison_chart.pdf (Morphism vs. baselines)

Write Paper¶

# Use template from docs/research/experimental-framework.md

# Structure:
# 1. Introduction (motivation, problem, contribution)
# 2. Background (category theory, Banach theorem)
# 3. Morphism Framework (7 invariants, architecture)
# 4. Experimental Design (synthetic corpus, metrics)
# 5. Results (RQ1-RQ4 with figures)
# 6. Case Studies (real-world validation)
# 7. Discussion (threats, limitations, future work)
# 8. Conclusion

# Target: ICSE 2027 (deadline ~August 2026)

Reproducibility Package¶

# Create Docker container
docker build -t morphism-experiments -f experiments/Dockerfile .

# Run all experiments
docker run morphism-experiments bash experiments/run_all.sh

# Expected output:
# experiments/
#   synthetic_repos/
#   detection_accuracy.md
#   convergence_*.json
#   performance.json
#   comparison.csv
#   case_studies/
#   figures/

Timeline¶

Week	Task
1-2	Tune script parameters and ensure reproducible outputs
3-4	Run Experiments 1-2 (corpus generation, detection accuracy)
5-6	Run Experiments 3-4 (convergence, performance)
7-8	Run Experiments 5-6 (comparison, case studies)
9-10	Statistical analysis, generate figures
11-12	Write paper draft
13	Internal review, revisions
14	Submit to ICSE 2027

Success Criteria¶

Experiment 1: Synthetic Corpus¶

[ ] 100+ repositories generated
[ ] 8 drift types represented
[ ] Ground truth validated manually (sample 10%)

Experiment 2: Detection Accuracy¶

[ ] Precision ≥ 0.95 for all scripts
[ ] Recall ≥ 0.95 for all scripts
[ ] F1 ≥ 0.95 for all scripts

Experiment 3: Convergence¶

[ ] Actual convergence matches O(κⁿ) within 5%
[ ] Perturbation robustness ≥ 0.99
[ ] κ = 0.3 converges faster than κ = 0.7 (p < 0.05)

Experiment 4: Performance¶

[ ] Pre-commit overhead < 2s
[ ] CI overhead < 30s
[ ] Complexity O(n log n) or better

Experiment 5: Comparison¶

[ ] Morphism detects 3x more drift than Nx (p < 0.05)
[ ] Morphism detects 3x more drift than Turborepo (p < 0.05)
[ ] False positive rate < 5%

Experiment 6: Case Studies¶

[ ] 10 real-world repos analyzed
[ ] Violations found in 80%+
[ ] Developer feedback collected

Publication¶

[ ] Paper submitted to ICSE 2027
[ ] Artifact submitted to artifact track
[ ] Dataset published on Zenodo with DOI

Troubleshooting¶

"Detection accuracy too low"¶

Check ground truth generation logic
Verify detection parsing is correct
Add more test cases for edge cases

"Convergence doesn't match theory"¶

Verify κ calculation is correct
Check if remediation operation is truly a contraction
Measure actual distance to fixed point

"Performance overhead too high"¶

Profile scripts to find bottlenecks
Optimize hot paths
Consider caching/memoization

"Comparison shows no difference"¶

Ensure baseline tools are configured correctly
Verify drift types are detectable by baselines
Check if corpus is too easy/hard

Next Steps¶

Implement remaining scripts (see TODOs above)
Run pilot experiments on small corpus (n=10)
Validate metrics match expectations
Scale up to full corpus (n=100)
Analyze results and iterate
Write paper and submit

Resources¶

Experimental Framework: docs/research/experimental-framework.md
Synthetic Generator: scripts/research/generate_synthetic_repo.py
Detection Measurement: scripts/research/measure_detection_accuracy.py
Paper Template: See experimental-framework.md Section "Publication Strategy"

Questions? Open an issue or discussion on GitHub.

Last Updated: 2026-02-22