I’m an AI safety researcher focused on few-shot catastrophe prevention — understanding what happens when we catch AI misbehavior and how to best use that information to prevent future catastrophes.
Research Focus
My research asks: When we catch AI doing something dangerous, what’s the most effective way to use that information?
I’m building systematic benchmarks to compare different prevention techniques (probes, fine-tuning, monitoring) and understand the fundamental tradeoffs between control and alignment.
Current Work
I’m currently working on Paper 1: A benchmark framework for evaluating catastrophe prevention techniques. This work builds on pilot studies that validated the research direction and identified a 4-paper research arc.
Background
- AI Safety Research (Few-Shot Catastrophe Prevention)
- Teaching Fellow at University of Warwick
- SFHEA (Senior Fellow of the Higher Education Academy) candidate
Contact
- Email: [email protected]
- GitHub: [Your GitHub username]
- Twitter/X: [Coming soon]
This site shares research progress, pilot studies, and mini-articles on AI safety. All work is pre-publication and subject to revision.