I Built a Fraud Detector. It Was Really a Verifiability Detector.
I set out to build something that could read a résumé and tell whether the person was inflating their experience. What I ended up with was a system that could tell whether a person had written down checkable details — and was quietly calling that the same thing as telling the truth.
It isn't. That gap is the whole story, and catching it is the reason the project stopped where it did.
What I was actually trying to do
The goal was a tool that analyzes a résumé or LinkedIn profile and does one of two jobs: help the candidate sharpen it toward their goal, or strip the artistic fluff so a recruiter can see what's underneath. That split mattered later more than I expected. It became two layers:
- A helper layer — parse the profile, isolate the concrete claims, reorganize, cut the padding.
- A classifier layer — judge whether a claim was authentic or inflated, and emit a verdict.
The helper does work you can check mechanically: did a named tool appear in the rewrite that wasn't in the source? Did a claim get split into its atomic parts? Those have right answers. The classifier was trying to do something else entirely — judge authenticity from the text itself. I didn't fully appreciate, going in, that those are different kinds of problem. One is checkable. The other needs a human who can verify the underlying reality. The project taught me the difference by failing along exactly that line.
I pre-registered the kill conditions before I ran anything
The one decision I'm most glad I made: before generating a single test profile, I wrote down what results would make me abandon the classifier. Not after seeing the data — before. The thresholds were explicit. If the system flagged ambiguous, unverifiable cases as fraudulent too often and too confidently, it was done.
This matters because the alternative is the default failure mode of anyone attached to their own build: you run the test, the numbers look rough, and you reach for reasons the test was unfair. Pre-registration takes that move off the table. You don't get to relitigate the threshold after you've seen which side of it you landed on.
The test
I built a small synthetic corpus — 20 profiles, structured as five matched 2×2 sets. Each set held one underlying accomplishment and varied two things independently:
- Substance: real vs. fabricated.
- Register: concrete vs. abstract — the same accomplishment written once with specific tools, systems, and action-to-outcome detail, and once in high-level strategic phrasing with the specifics stripped out.
The matched-pair design is the point. Because the same underlying accomplishment appeared in both a concrete and an abstract version, any difference in how the classifier treated them couldn't be explained by the substance. It could only be the writing.
What broke
Two things showed up.
The clean result, the one that tripped a line I'd drawn in advance: on the ambiguous, genuinely-unresolvable cases — the ones where no honest system should be confident — the classifier returned the negative verdict at high confidence 28% of the time, against a pre-registered ceiling of 20%. It wasn't just wrong on hard cases. It was confidently wrong on cases defined by the absence of enough information to be confident about.
The second thing was the one that named the real problem. On the matched real pairs — same accomplishment, two registers — the abstract version drew the negative verdict more often than its concrete twin. The system was penalizing real work for being described in high-level language, and it was doing it independent of whether the work was real.
A note on honesty here, because it's the kind of thing this whole project was about: an earlier summary of mine put a specific percentage on that register effect. When I checked the arithmetic against the sample size, the number couldn't be what the summary claimed — the denominator didn't support that precision. So I'm not going to print it. The direction is solid and it's backed by the design: abstract phrasing of real work got flagged more than concrete phrasing of the same work. The magnitude, at this sample size, is directional, not a clean measured rate. Saying that plainly costs me a crisp number and buys back the credibility of every other claim here.
For completeness: detection of the fabricated profiles itself held up on this synthetic set — the fully abstract fakes were caught every time, the concrete fakes most of the time. The system could spot invention. What it couldn't do was the thing I actually built it for.
The diagnosis: it was scoring the wrong variable
Put the two findings together and the mechanism is plain. The classifier rewarded language that exposed checkable detail — named systems, specific actions, traceable outcomes — and penalized language that didn't. That's why abstract real work got flagged and concrete fakes sometimes slid through. The system wasn't detecting fraud. It was detecting verifiability, and reporting it as truth. That's the same move I flagged in the evidence-ladder post — a claim ranked by how solid it looks instead of who stands behind it. The difference is that there I caught an AI doing it; here I'd built a system that did it by design.
Those aren't the same variable, and the cases where they diverge are exactly the cases that matter. A senior operator who describes ten years of work in strategic abstractions. A non-native English speaker who doesn't reach for the idiomatic concrete phrasing. Someone in a domain where the real work genuinely is high-level. All of them write "unverifiable-looking" prose while doing entirely real work — and a verifiability detector wearing a fraud-detector label flags every one of them. The bias against abstract register wasn't a bug layered on top of a working detector. It was the detector. The thing only ever measured how much checkable detail you surfaced.
This is the part I find most worth sharing, because it isn't specific to résumés. Any system that judges truth from text alone is constrained by what's checkable in the text. When it hits something it can't verify, it doesn't return "I can't verify this" — it returns its best guess dressed in confident prose, and the guess correlates with surface verifiability rather than underlying fact. The ceiling isn't a tuning problem. It's structural.
Why I retired it instead of fixing it
The tempting move was to tune. Adjust the thresholds, reweight the features, soften the confidence on ambiguous cases until the numbers came back inside the lines. I'd pre-registered against exactly that, and I'm glad I had, because the pull to do it was real.
Tuning wouldn't have fixed anything. It would have hidden the 28% breach without changing what the system was actually measuring. The classifier would still have been a verifiability detector; it would just have been a quieter one, harder to catch next time. The honest conclusion was that the core premise — that authenticity is reliably readable from language structure — hadn't survived its first controlled test. So the classifier got shelved.
The helper layer, the part that does mechanically checkable work, stands on its own and lost nothing. That's the clean half. Only the half that tried to judge the unverifiable went away.
The boundaries, stated plainly
Everything above rests on a deliberately modest evidence base, and the post is worth nothing if I don't draw that line where it actually sits:
- The corpus was synthetic — profiles I constructed, not real-world résumés with verified outcomes. This shows the classifier discriminated by register on profiles built to test that. It does not establish how résumé screeners behave in the wild.
- The sample was small — 20 profiles, five underlying real accomplishments. Small-n results are directional. The 28% breach is a clean count against a stated threshold; the register effect is a direction, not a precise rate.
- No significance test is established at this sample size. The pre-registration called for one; at this n, the raw threshold breach is what I have, and I'm not going to dress it up as more.
- It was a single validation pass. Reproduction across repeated runs wasn't established.
None of that weakens the finding I'm actually making, because the finding is modest: on a controlled test I designed, a system I built to detect fraud was demonstrably reacting to writing register instead — and that's enough to retire the premise and go re-test it properly, which is a different post than this one.
What I'd keep
The durable lesson isn't about résumés. It's a question to ask before building any AI feature that judges something: is the thing I want to measure actually checkable from the input, or am I about to build a detector for whatever proxy is checkable and call it the real thing?
My classifier could only see verifiability, so it measured verifiability and labeled it truth. The failure was honest and it was cheap, because the test was small, the kill conditions were set in advance, and the arithmetic got checked instead of trusted. That's the version of being wrong I'll take every time — the kind you catch on a 20-profile synthetic corpus, before it's deciding anything about a real person.