Research Quick Start Guide¶
NON-NORMATIVE.
Goal: Run experiments to validate Morphism's governance claims scientifically
For benchmark suite and CI behavior, see benchmarks-and-experiments.md.
Prerequisites¶
# Install dependencies
pip install numpy pandas scipy matplotlib seaborn
# Verify Morphism installation
python scripts/maturity_score.py --ci --threshold 60
Experiment 1: Generate Synthetic Corpus¶
# Generate 100 small repositories with random drift
python scripts/research/generate_synthetic_repo.py \
--output experiments/synthetic_repos \
--count 100 \
--size small \
--seed 42
# Output:
# experiments/synthetic_repos/
# synthetic_0000/
# GROUND_TRUTH.json
# AGENTS.md
# SSOT.md
# ...
# synthetic_0001/
# ...
# MANIFEST.json
Expected: 100 repositories, ~200 drift injections total
Experiment 2: Measure Detection Accuracy¶
# Run all enforcement scripts on corpus
python scripts/research/measure_detection_accuracy.py \
--corpus experiments/synthetic_repos \
--output experiments/detection_accuracy.md
# Output:
# experiments/detection_accuracy.md (human-readable report)
# experiments/detection_accuracy.json (machine-readable results)
Expected Results: - Precision ≥ 0.95 - Recall ≥ 0.95 - F1 ≥ 0.95
If targets not met: Investigate false positives/negatives, improve detection logic
Experiment 3: Convergence Validation¶
python scripts/research/measure_convergence.py \
--corpus experiments/synthetic_repos \
--kappa 0.1 0.3 0.5 0.7 0.9 \
--iterations 12 \
--perturbation-trials 100 \
--output experiments/convergence/results.json
Expected: Convergence iterations match theoretical bound within 5%
Experiment 4: Performance Benchmarks¶
python scripts/research/benchmark_performance.py \
--corpus experiments/synthetic_repos \
--output experiments/benchmarks.json
# Generates timing data for complexity analysis
Expected: O(n log n) or better for all scripts
Experiment 5: Comparative Evaluation¶
python scripts/research/compare_tools.py \
--corpus experiments/synthetic_repos \
--output experiments/tool_comparison.json
Expected: Morphism detects 3x more drift than baselines
Experiment 6: Real-World Case Studies¶
python scripts/research/analyze_real_repos.py \
--input docs/research/real_repo_targets.json \
--output experiments/real_repos/results.json \
--markdown docs/research/reports/real-repo-analysis.md \
--workdir /tmp/morphism-real-repos
# Use --limit N or per-entry \"local_path\" overrides for fast local/debug runs
# After each run, update docs/research/reports/real-repo-analysis.md with TP/FP notes.
Expected: Find actionable violations in 80%+ of repos
Longitudinal Drift Tracking¶
python scripts/research/longitudinal_tracker.py \
--repo-config docs/research/longitudinal_targets.json \
--output-root experiments/longitudinal \
--summary experiments/longitudinal/summary.csv \
--workdir /tmp/morphism-longitudinal
# Optional flags:
# --date YYYY-MM-DD (backfill)
# --dry-run (skip writes)
# --continue-on-error
Expected: 90 consecutive days of κ trajectories with ≤3 missed runs and weekly annotations in docs/research/reports/longitudinal.md.
Generate Publication Figures¶
# TODO: Implement generate_figures.py
# Creates plots for paper
python scripts/research/generate_figures.py \
--data experiments/ \
--output paper/figures/
# Generates:
# - convergence_rate.pdf (κ vs. iterations)
# - detection_accuracy.pdf (precision/recall bars)
# - performance_scaling.pdf (time vs. repo size)
# - comparison_chart.pdf (Morphism vs. baselines)
Write Paper¶
# Use template from docs/research/experimental-framework.md
# Structure:
# 1. Introduction (motivation, problem, contribution)
# 2. Background (category theory, Banach theorem)
# 3. Morphism Framework (7 invariants, architecture)
# 4. Experimental Design (synthetic corpus, metrics)
# 5. Results (RQ1-RQ4 with figures)
# 6. Case Studies (real-world validation)
# 7. Discussion (threats, limitations, future work)
# 8. Conclusion
# Target: ICSE 2027 (deadline ~August 2026)
Reproducibility Package¶
# Create Docker container
docker build -t morphism-experiments -f experiments/Dockerfile .
# Run all experiments
docker run morphism-experiments bash experiments/run_all.sh
# Expected output:
# experiments/
# synthetic_repos/
# detection_accuracy.md
# convergence_*.json
# performance.json
# comparison.csv
# case_studies/
# figures/
Timeline¶
| Week | Task |
|---|---|
| 1-2 | Tune script parameters and ensure reproducible outputs |
| 3-4 | Run Experiments 1-2 (corpus generation, detection accuracy) |
| 5-6 | Run Experiments 3-4 (convergence, performance) |
| 7-8 | Run Experiments 5-6 (comparison, case studies) |
| 9-10 | Statistical analysis, generate figures |
| 11-12 | Write paper draft |
| 13 | Internal review, revisions |
| 14 | Submit to ICSE 2027 |
Success Criteria¶
Experiment 1: Synthetic Corpus¶
- [ ] 100+ repositories generated
- [ ] 8 drift types represented
- [ ] Ground truth validated manually (sample 10%)
Experiment 2: Detection Accuracy¶
- [ ] Precision ≥ 0.95 for all scripts
- [ ] Recall ≥ 0.95 for all scripts
- [ ] F1 ≥ 0.95 for all scripts
Experiment 3: Convergence¶
- [ ] Actual convergence matches O(κⁿ) within 5%
- [ ] Perturbation robustness ≥ 0.99
- [ ] κ = 0.3 converges faster than κ = 0.7 (p < 0.05)
Experiment 4: Performance¶
- [ ] Pre-commit overhead < 2s
- [ ] CI overhead < 30s
- [ ] Complexity O(n log n) or better
Experiment 5: Comparison¶
- [ ] Morphism detects 3x more drift than Nx (p < 0.05)
- [ ] Morphism detects 3x more drift than Turborepo (p < 0.05)
- [ ] False positive rate < 5%
Experiment 6: Case Studies¶
- [ ] 10 real-world repos analyzed
- [ ] Violations found in 80%+
- [ ] Developer feedback collected
Publication¶
- [ ] Paper submitted to ICSE 2027
- [ ] Artifact submitted to artifact track
- [ ] Dataset published on Zenodo with DOI
Troubleshooting¶
"Detection accuracy too low"¶
- Check ground truth generation logic
- Verify detection parsing is correct
- Add more test cases for edge cases
"Convergence doesn't match theory"¶
- Verify κ calculation is correct
- Check if remediation operation is truly a contraction
- Measure actual distance to fixed point
"Performance overhead too high"¶
- Profile scripts to find bottlenecks
- Optimize hot paths
- Consider caching/memoization
"Comparison shows no difference"¶
- Ensure baseline tools are configured correctly
- Verify drift types are detectable by baselines
- Check if corpus is too easy/hard
Next Steps¶
- Implement remaining scripts (see TODOs above)
- Run pilot experiments on small corpus (n=10)
- Validate metrics match expectations
- Scale up to full corpus (n=100)
- Analyze results and iterate
- Write paper and submit
Resources¶
- Experimental Framework:
docs/research/experimental-framework.md - Synthetic Generator:
scripts/research/generate_synthetic_repo.py - Detection Measurement:
scripts/research/measure_detection_accuracy.py - Paper Template: See experimental-framework.md Section "Publication Strategy"
Questions? Open an issue or discussion on GitHub.
Last Updated: 2026-02-22