Definition

LLM Audit Failures

LLM audit failures are the recurring ways an AI-generated audit looks convincing but isn't supported by evidence: hallucinated evidence, circular validation, confidence inflation, and agents acting on unverifiable claims.

How it's measured

Canary catches these by refusing to take the model's word: it observes the tool directly, matches claims to documentation, and adversarially refutes findings before asserting them.

Examples

An LLM 'verifies' a repository it never fetched (hallucinated evidence); the same model writes and grades the audit (circular validation); a 95% score sits atop zero captured evidence (confidence inflation).

FAQ

Why can't GPT or Claude do this themselves?
They can describe what an audit should check, but a language model cannot install a tool, intercept its TLS traffic, and prove what it sent. That requires execution and capture — an external observation layer.
Is this a security scanner?
No — it's an integrity layer: it measures whether an audit's claims are evidenced, across data-flow, disclosure and jurisdiction.

Related: Integrity Score · Evidence Coverage · Unsupported Claims · Methodology · Benchmarks