Working Papers

Paper 1: A Benchmark for Few-Shot Catastrophe Prevention Techniques

Status: In development (v0.1) Expected: 2026

Systematic evaluation framework comparing different catastrophe prevention techniques (probes, fine-tuning, monitoring). Building on validated pilot studies to understand the fundamental tradeoffs between control and alignment.


Pilot Studies

Catastrophe Prevention Pilot (December 2025)

Result: GO decision - benchmark produces meaningful signal

Validated that the benchmark framework can distinguish between effective and ineffective prevention techniques:

  • Baseline: 32% (control condition)
  • Probe intervention: 32% (failed - no improvement)
  • Fine-tuning intervention: 18% (succeeded - significant improvement)

Location: ~/Documents/catastrophe-prevention-pilot/

Psychometric Variance Pilot (December 2025)

Result: Hypothesis falsified (r = -0.29, p = 0.49)

Initial hypothesis about psychometric variance and AI capabilities was falsified. This pilot saved potentially 4 years of research on a dead-end direction by testing assumptions early.

Location: 01-Research/experiments/archive/psychometric-variance-pilot/


Future Work

Paper 2: How many caught examples do probes actually need? Paper 3: Does fine-tuning help alignment or teach hiding? Paper 4: What should AI labs do in practice?


All work is pre-publication and subject to revision. For daily progress updates, see my research blog.