The measurement model
What we measure
Screen measures seven engineering competencies through four composable assessment components. Each competency is scored independently with per-flaw evidence. No single number. No pass/fail.
The seven competencies
Critique
Engaging deeply with code they didn't write.
Reading for intent, spotting assumptions, identifying failure modes — or skimming and accepting. The first thing a senior engineer does with AI-generated code is push back on it. We measure whether the candidate does the same.
Measured by
How scoring works
Two passes when a candidate submits a session.
Candidate submits
Inline PR comments, flaw-detection findings, a design document, or a debugging writeup.
Per-component scoring
Codex-as-judge compares the submission against the ground-truth manifest — planted flaws, rubric criteria, or scenario probes.
Cross-component aggregation
Per-component measurements project onto the seven skill axes — weighted scores with collected evidence for each.
Every score traces back to a specific planted flaw, rubric criterion, or scenario probe. You never see a number without being able to ask “where did this come from?” and get a concrete answer.
Our philosophy
Measurement, not verdict.
Screen does not produce a PASS/FAIL label. We surface raw measurements plus percentile context across sessions. You see where a candidate lands on each competency — and you decide what “good enough” means for your team, your role, and your level.
Different teams value different things. A startup scaling fast might weight system reasoning heavily. A security-focused team might care most about skepticism. Screen gives you the data. You make the call.