Explaining Safety Is Not Enforcing Safety

Evans Tovar
Independent Researcher – AI Safety & Governance
ORCID: 0009-0008-4388-1836
DOI: 10.17605/OSF.IO/AXBND

Abstract

This work reports longitudinal, cross-surface behavioral testing of major consumer AI assistants and documents systematic articulation–application gaps: cases where systems clearly explain safety rules and subsequently violate those same rules under contextual drift.

The study proposes evaluating safety through constraint persistence across context rather than refusal quality in isolation, and introduces epistemic honesty and epistemic friction as governance-relevant safety primitives.

Access

View on OSF
Download PDF

Contact

ai.safety.eftovar@gmail.com