Skip to content

Research Quick Start Guide

NON-NORMATIVE.

Goal: Run experiments to validate Morphism's governance claims scientifically

For benchmark suite and CI behavior, see benchmarks-and-experiments.md.


Prerequisites

# Install dependencies
pip install numpy pandas scipy matplotlib seaborn

# Verify Morphism installation
python scripts/maturity_score.py --ci --threshold 60

Experiment 1: Generate Synthetic Corpus

# Generate 100 small repositories with random drift
python scripts/research/generate_synthetic_repo.py \
  --output experiments/synthetic_repos \
  --count 100 \
  --size small \
  --seed 42

# Output:
# experiments/synthetic_repos/
#   synthetic_0000/
#     GROUND_TRUTH.json
#     AGENTS.md
#     SSOT.md
#     ...
#   synthetic_0001/
#   ...
#   MANIFEST.json

Expected: 100 repositories, ~200 drift injections total


Experiment 2: Measure Detection Accuracy

# Run all enforcement scripts on corpus
python scripts/research/measure_detection_accuracy.py \
  --corpus experiments/synthetic_repos \
  --output experiments/detection_accuracy.md

# Output:
# experiments/detection_accuracy.md  (human-readable report)
# experiments/detection_accuracy.json (machine-readable results)

Expected Results: - Precision ≥ 0.95 - Recall ≥ 0.95 - F1 ≥ 0.95

If targets not met: Investigate false positives/negatives, improve detection logic


Experiment 3: Convergence Validation

python scripts/research/measure_convergence.py \
  --corpus experiments/synthetic_repos \
  --kappa 0.1 0.3 0.5 0.7 0.9 \
  --iterations 12 \
  --perturbation-trials 100 \
  --output experiments/convergence/results.json

Expected: Convergence iterations match theoretical bound within 5%


Experiment 4: Performance Benchmarks

python scripts/research/benchmark_performance.py \
  --corpus experiments/synthetic_repos \
  --output experiments/benchmarks.json

# Generates timing data for complexity analysis

Expected: O(n log n) or better for all scripts


Experiment 5: Comparative Evaluation

python scripts/research/compare_tools.py \
  --corpus experiments/synthetic_repos \
  --output experiments/tool_comparison.json

Expected: Morphism detects 3x more drift than baselines


Experiment 6: Real-World Case Studies

python scripts/research/analyze_real_repos.py \
  --input docs/research/real_repo_targets.json \
  --output experiments/real_repos/results.json \
  --markdown docs/research/reports/real-repo-analysis.md \
  --workdir /tmp/morphism-real-repos

# Use --limit N or per-entry \"local_path\" overrides for fast local/debug runs
# After each run, update docs/research/reports/real-repo-analysis.md with TP/FP notes.

Expected: Find actionable violations in 80%+ of repos


Longitudinal Drift Tracking

python scripts/research/longitudinal_tracker.py \
  --repo-config docs/research/longitudinal_targets.json \
  --output-root experiments/longitudinal \
  --summary experiments/longitudinal/summary.csv \
  --workdir /tmp/morphism-longitudinal

# Optional flags:
#   --date YYYY-MM-DD (backfill)
#   --dry-run (skip writes)
#   --continue-on-error

Expected: 90 consecutive days of κ trajectories with ≤3 missed runs and weekly annotations in docs/research/reports/longitudinal.md.


Generate Publication Figures

# TODO: Implement generate_figures.py
# Creates plots for paper

python scripts/research/generate_figures.py \
  --data experiments/ \
  --output paper/figures/

# Generates:
# - convergence_rate.pdf (κ vs. iterations)
# - detection_accuracy.pdf (precision/recall bars)
# - performance_scaling.pdf (time vs. repo size)
# - comparison_chart.pdf (Morphism vs. baselines)

Write Paper

# Use template from docs/research/experimental-framework.md

# Structure:
# 1. Introduction (motivation, problem, contribution)
# 2. Background (category theory, Banach theorem)
# 3. Morphism Framework (7 invariants, architecture)
# 4. Experimental Design (synthetic corpus, metrics)
# 5. Results (RQ1-RQ4 with figures)
# 6. Case Studies (real-world validation)
# 7. Discussion (threats, limitations, future work)
# 8. Conclusion

# Target: ICSE 2027 (deadline ~August 2026)

Reproducibility Package

# Create Docker container
docker build -t morphism-experiments -f experiments/Dockerfile .

# Run all experiments
docker run morphism-experiments bash experiments/run_all.sh

# Expected output:
# experiments/
#   synthetic_repos/
#   detection_accuracy.md
#   convergence_*.json
#   performance.json
#   comparison.csv
#   case_studies/
#   figures/

Timeline

Week Task
1-2 Tune script parameters and ensure reproducible outputs
3-4 Run Experiments 1-2 (corpus generation, detection accuracy)
5-6 Run Experiments 3-4 (convergence, performance)
7-8 Run Experiments 5-6 (comparison, case studies)
9-10 Statistical analysis, generate figures
11-12 Write paper draft
13 Internal review, revisions
14 Submit to ICSE 2027

Success Criteria

Experiment 1: Synthetic Corpus

  • [ ] 100+ repositories generated
  • [ ] 8 drift types represented
  • [ ] Ground truth validated manually (sample 10%)

Experiment 2: Detection Accuracy

  • [ ] Precision ≥ 0.95 for all scripts
  • [ ] Recall ≥ 0.95 for all scripts
  • [ ] F1 ≥ 0.95 for all scripts

Experiment 3: Convergence

  • [ ] Actual convergence matches O(κⁿ) within 5%
  • [ ] Perturbation robustness ≥ 0.99
  • [ ] κ = 0.3 converges faster than κ = 0.7 (p < 0.05)

Experiment 4: Performance

  • [ ] Pre-commit overhead < 2s
  • [ ] CI overhead < 30s
  • [ ] Complexity O(n log n) or better

Experiment 5: Comparison

  • [ ] Morphism detects 3x more drift than Nx (p < 0.05)
  • [ ] Morphism detects 3x more drift than Turborepo (p < 0.05)
  • [ ] False positive rate < 5%

Experiment 6: Case Studies

  • [ ] 10 real-world repos analyzed
  • [ ] Violations found in 80%+
  • [ ] Developer feedback collected

Publication

  • [ ] Paper submitted to ICSE 2027
  • [ ] Artifact submitted to artifact track
  • [ ] Dataset published on Zenodo with DOI

Troubleshooting

"Detection accuracy too low"

  • Check ground truth generation logic
  • Verify detection parsing is correct
  • Add more test cases for edge cases

"Convergence doesn't match theory"

  • Verify κ calculation is correct
  • Check if remediation operation is truly a contraction
  • Measure actual distance to fixed point

"Performance overhead too high"

  • Profile scripts to find bottlenecks
  • Optimize hot paths
  • Consider caching/memoization

"Comparison shows no difference"

  • Ensure baseline tools are configured correctly
  • Verify drift types are detectable by baselines
  • Check if corpus is too easy/hard

Next Steps

  1. Implement remaining scripts (see TODOs above)
  2. Run pilot experiments on small corpus (n=10)
  3. Validate metrics match expectations
  4. Scale up to full corpus (n=100)
  5. Analyze results and iterate
  6. Write paper and submit

Resources

  • Experimental Framework: docs/research/experimental-framework.md
  • Synthetic Generator: scripts/research/generate_synthetic_repo.py
  • Detection Measurement: scripts/research/measure_detection_accuracy.py
  • Paper Template: See experimental-framework.md Section "Publication Strategy"

Questions? Open an issue or discussion on GitHub.

Last Updated: 2026-02-22