Working Papers
Paper 1: A Benchmark for Few-Shot Catastrophe Prevention Techniques
Status: In development (v0.1) Expected: 2026
Systematic evaluation framework comparing different catastrophe prevention techniques (probes, fine-tuning, monitoring). Building on validated pilot studies to understand the fundamental tradeoffs between control and alignment.
Pilot Studies
Catastrophe Prevention Pilot (December 2025)
Result: GO decision - benchmark produces meaningful signal
Validated that the benchmark framework can distinguish between effective and ineffective prevention techniques:
- Baseline: 32% (control condition)
- Probe intervention: 32% (failed - no improvement)
- Fine-tuning intervention: 18% (succeeded - significant improvement)
Location: ~/Documents/catastrophe-prevention-pilot/
Psychometric Variance Pilot (December 2025)
Result: Hypothesis falsified (r = -0.29, p = 0.49)
Initial hypothesis about psychometric variance and AI capabilities was falsified. This pilot saved potentially 4 years of research on a dead-end direction by testing assumptions early.
Location: 01-Research/experiments/archive/psychometric-variance-pilot/
Future Work
Paper 2: How many caught examples do probes actually need? Paper 3: Does fine-tuning help alignment or teach hiding? Paper 4: What should AI labs do in practice?
All work is pre-publication and subject to revision. For daily progress updates, see my research blog.