The measurement model

What we measure

Screen measures seven engineering competencies through four composable assessment components. Each competency is scored independently with per-flaw evidence. No single number. No pass/fail.

7
Competencies
4
Components
0
Pass/fail labels

The seven competencies

01Competency

Critique

Engaging deeply with code they didn't write.

Reading for intent, spotting assumptions, identifying failure modes — or skimming and accepting. The first thing a senior engineer does with AI-generated code is push back on it. We measure whether the candidate does the same.

Measured by

PR ReviewBug Bash

How scoring works

Two passes when a candidate submits a session.

Step 1

Candidate submits

Inline PR comments, flaw-detection findings, a design document, or a debugging writeup.

Step 2

Per-component scoring

Codex-as-judge compares the submission against the ground-truth manifest — planted flaws, rubric criteria, or scenario probes.

Step 3

Cross-component aggregation

Per-component measurements project onto the seven skill axes — weighted scores with collected evidence for each.

Every score traces back to a specific planted flaw, rubric criterion, or scenario probe. You never see a number without being able to ask “where did this come from?” and get a concrete answer.

Our philosophy

Measurement, not verdict.

Screen does not produce a PASS/FAIL label. We surface raw measurements plus percentile context across sessions. You see where a candidate lands on each competency — and you decide what “good enough” means for your team, your role, and your level.

Different teams value different things. A startup scaling fast might weight system reasoning heavily. A security-focused team might care most about skepticism. Screen gives you the data. You make the call.

Ready to see what your candidates actually know?

Create an account and send your first assessment in minutes.

Try the demo