The few that acknowledge AI get it wrong in predictable ways. Here’s what those ways are — and what we do instead.
The market
What they measure
Pattern recall, speed
Role of AI
Banned or ignored
Scoring
Automated test cases
Why it falls short
Like testing a carpenter by taking away their power tools and handing them a hand saw. You'll learn something — but not whether they can build your house. And once AI can solve any of these puzzles in seconds, the assessment loses its discriminating power entirely.
What they measure
Communication, real-time coding
Role of AI
Usually banned
Scoring
Interviewer judgment
Why it falls short
Subjective and uncalibrated. Burns expensive senior-engineer time per candidate. Penalizes candidates who'd reach for AI on the real job. And rule-following becomes the implicit signal — candidates who follow the no-AI rule lose to candidates who quietly work around it.
What they measure
Whether AI was used
Role of AI
Adversary to detect
Scoring
Stylometric heuristics
Why it falls short
An arms race the platform always loses. AI-generated code doesn't have a detectable accent that survives prompting style and post-editing. False positives penalize fast typists; false negatives miss lightly edited AI output. And even if detection were perfect — the candidate used the same tools they'd use on the job. That's not cheating. That's engineering in 2026.
What they measure
Prompt quality as proxy
Role of AI
Present but passively observed
Scoring
Qualitative rubrics on process
Why it falls short
Measures process, not output. Anyone can produce working code in 30 minutes with AI — so the rubric gets pushed onto prompt style and conversational tone, which don't reliably correlate with judgment. You end up hiring on vibes from the debrief.
The question isn’t “did they use AI?”
The question is — when the AI was wrong, did they catch it?
What we do instead
Give candidates a real IDE and a real AI assistant. Plant known flaws in the AI’s context. Measure who catches them.
What we measure
Whether the candidate caught what the AI got wrong
Role of AI
Deliberately adversarial — defends planted flaws
Scoring
Quantitative: every score traces to a specific planted flaw
Why this works
Because we planted the flaws, we have the answer key. The candidate either identified the bug and articulated why it was wrong, or they didn’t. No rubric interpretation. No interviewer calibration. Just evidence — graded against the engineering judgment that matters most on the job: pushing back on confident-but-wrong AI output.
Create an account and send your first assessment in minutes.
Start for free