I Used to Count a Résumé as Proof. An AI Showed Me Why I Was Wrong.

Second in a series on the AI Pressure Doctrine. The first post ended with two AI models reaching opposite verdicts on the same proposal. This one is about why they split — and the single fix that made it stop.

When I built the first version of my evidence system, it ranked everything on a six-rung ladder. A résumé sat near the top.

That felt obviously right at the time. A vague assertion is weak. But a document in your hand — a résumé, a case study, a signed deliverable — that's real. Tangible. High on the ladder.

I was wrong. And I didn't figure it out by thinking harder. I figured it out by watching two AI models disagree about the exact same stack of evidence.

The divergence, and what was actually under it

In the first post, I showed you two leading models reading the identical federal proposal and returning opposite verdicts — one said proceed, the other said do not proceed. I told you their built-in risk appetites drove the split. That was true. It just wasn't the whole mechanism.

That proposal's entire case rested on the proposing team's own documents — résumés and an internal past-performance write-up. One model read those self-assertions and treated them as good enough to proceed. The other refused to let the team's own paperwork vouch for the team.

Same documents. Same claims. One model called them proven.

The other called them a guess in a suit.

So the disagreement wasn't really about risk appetite. It came down to a single question neither model could answer consistently: how much is a source allowed to vouch for itself?

And my ladder had left that door wide open. A "direct artifact" scored high — so the model that accepted the résumé wasn't malfunctioning. It was following my own rules. I'd built the loophole. The model just walked through it.

The fix: a source can't promote itself

So I tore the ladder down and rebuilt it around one principle: evidence is ranked by who is standing behind it, not by how solid it looks.

Four levels. That's the whole thing.

Level 0 — no evidence. Confident tone, fluent reasoning, "it stands to reason." This is where most AI certainty actually lives, dressed up as fact.
Level 1 — indirect. Circumstantial or adjacent. "Something similar worked before." Suggestive, not load-bearing.
Level 2 — a single artifact, or a self-assertion. The résumé says it. The vendor's own case study claims it. One testimonial. The document makes a claim about itself. This is where most "verified-sounding" evidence really sits — and where most bad decisions start. It is also the ceiling for anything a source says about its own work.
Level 3 — an independent external system of record. Confirmed by a source outside the claimant that the claimant doesn't control: audited financials instead of the company's pitch deck, a regulator or court filing, a credential registry, a background check, a result you measured yourself.

The line the whole system now turns on: a self-assertion can never exceed Level 2. Not if it's formal. Not if it's polished. Not if it's repeated ten times. Repetition and formatting don't promote evidence — only independent corroboration does.

Here's the test that makes it click. Nobody would accept "I hold a Top Secret clearance" just because a résumé says so. You'd check it against the system of record, full stop. That instinct is correct — and it applies to every claim, not only the ones that obviously need checking. "Ten years of experience" deserves the same skepticism as the clearance. We just don't apply it, because the résumé says it so confidently.

The three traps that fool you — and fooled the model

Once you rank evidence by who's behind it, the common ways people and models cheat the ladder become obvious.

The Same-Source Trap. Ten documents from one source are not ten pieces of evidence. They're one source, repeated. A résumé, the cover letter, and the reference the candidate hand-picked all trace back to the same interested party. Agreement within a source isn't corroboration — it's an echo. This is the exact move that split my two models: one mistook the echo for a chorus.

The Aggregation Trap. You can't assemble partial matches into a whole. Person A's experience plus Person B's credential is not one qualified person. Three near-misses don't add up to a hit. Each requirement has to be met by the thing actually responsible for it.

Default-Down. When you're unsure which level a claim belongs to, take the lower one. When the evidence is ambiguous, fail it — don't pass it. "Close enough" is not a match; a threshold missed by any margin is a miss, not a rounding error. Bias toward the stricter read, every time. The cost of wrongly trusting is almost always higher than the cost of wrongly checking.

Watch one claim walk the ladder

Take the real one from that proposal: "Ten years of DoD cyber experience."

The résumé asserts it. → Level 2. A self-assertion.
It's restated in the cover letter. → Still Level 2. Same source. (Same-Source Trap.)
A former manager writes a glowing letter. → Still essentially Level 2 — a source the candidate chose, vouching for the candidate.
An independent record confirms the dates. → Level 3. Now you have something.

Until that last step, you do not have ten years of experience. You have a claim of ten years. And in the actual run, the documents showed eight. The model that "verified" it never checked — it let the claim verify itself.

That silent promotion — a Level 2 quietly wearing a Level 3 costume — is the single move that burns you. It's what one model did and the other caught. It's why they disagreed.

And here's the payoff: once the ladder ranked by source independence instead of surface solidity, the divergence I opened the first post with disappeared. Run the same proposal again and both models land in the same place — every claim capped at Level 2, nothing promoted, the verdict no longer a coin flip. There was simply no longer any room to let a document vouch for itself.

The habit

You don't need my system to use this. You need one question, aimed at every confident answer an AI hands you:

What level is this resting on — and who is standing behind it?

Ask it relentlessly and you'll be startled how often the honest answer is Level 0, or a Level 2 in a Level 3 costume. The model won't flag the difference for you. It will state the self-assertion and the independently verified fact in precisely the same confident voice. That sameness is the whole danger.

Doing this by hand, on every claim, in a real decision, is exhausting — that's the part I automated. But the automation is worthless if you don't carry the instinct first. The instinct is free. Start there.

A document is not proof because you're holding it. It's proof when something the author doesn't control says the same thing.

Where in your own work do you treat a document as proof when it's really just a claim?