Which NBME Form Is Most Accurate? A 5,039-Student Analysis
Every student asks the same question during dedicated: which practice exam should I actually trust? The answer matters — you make go/no-go decisions, push or pull your test date, and manage a lot of anxiety based on these numbers. So we went to the data.
We analyzed 5,039 verified, document-confirmed Step 2 CK score reports collected between 2022 and 2026, and measured how closely each major practice assessment tracks the real three-digit score. This is the kind of question that only a large, clean dataset can answer — and the results are clear.
The short answer
NBME Form 14 is the most accurate single predictor of Step 2 CK, with a Pearson correlation of r = 0.92 and 80% of predictions within ±5–7 points. UWSA 2 (r = 0.89) and NBME Form 13 (r = 0.88) follow closely. The Free 120 is the weakest of the major four (r = 0.85) — better used as a stamina check than a score oracle.
The accuracy ranking
“Accuracy” here means two things: correlation (how tightly the practice score moves with the real score) and typical error (how far off a single prediction usually is). A form can correlate well yet be biased — systematically reading a few points high or low — so we report both.
| Rank | Practice exam | Correlation (r) | Typical accuracy | Bias |
|---|---|---|---|---|
| 1 | NBME Form 14 | 0.92 | ±5–7 points | Neutral |
| 2 | UWSA 2 | 0.89 | ±6–8 points | Overpredicts ~3 pts |
| 3 | NBME Form 13 | 0.88 | ±6–9 points | Neutral |
| 4 | Free 120 | 0.85 | ±8–10 points | Slightly high near ceiling |
Why NBME Form 14 wins
NBME forms are written and equated by the same organization that builds the real exam, so the question style, difficulty calibration, and scoring are native to the USMLE. Form 14 in particular sits in the sweet spot of the current dataset: it is recent enough to reflect today’s content blueprint, and it has accumulated enough verified outcomes for the relationship to stabilize. In practice, a Form 14 score is the closest thing to a dress rehearsal of the real three-digit number.
It is not magic — it is calibration. When we hold out Form 14 results and predict the real score from them alone, the residuals are small and roughly symmetric. That is exactly what you want from a benchmark: no systematic lean, tight spread.
UWSA 2: accurate, but it leans high
UWSA 2 is the most-loved assessment in the community for good reason — it is brutal, long, and a superb test of whether you can hold focus across eight blocks. Its correlation (r = 0.89) is excellent. The catch is bias: in our data, UWSA 2 reads about three points above the real score on average. A 258 on UWSA 2 more often becomes a ~255 on test day than a 258.
That does not make it worse than an NBME — it makes it predictable. Once you know a tool overpredicts by a fixed amount, you can correct for it. Our predictor applies that bias correction automatically, so you do not have to do the subtraction in your head.
The Free 120 is a stamina check, not a score
The official Free 120 is the most realistic preview of interface and question feel, which is why every student should do it. But as a score predictor it is the weakest of the major four (r = 0.85), and the percentage-correct-to-three-digit conversion gets noisy near the top end. Use it to confirm your timing and endurance in the final week — not to set your expectations to the decimal.
The real lesson: never trust one form
Here is the most important finding, and it is counterintuitive. No single form, not even Form 14, is as accurate as a combination of forms. Each assessment carries its own measurement noise — a lucky content draw, a bad night’s sleep, a form that ran easy. When you average two or three recent assessments, those form-specific quirks partially cancel, and the prediction tightens.
This is exactly why our model is an 3-method ensemble rather than a lookup table. It blends K-nearest-neighbor matching against the 5,039-student dataset, bias-corrected weighted averaging, and per-form regression — so a single noisy data point cannot dominate your forecast. The practical takeaway for your own prep:
- Take more than one form. Two assessments beat one; three beat two (with diminishing returns after that).
- Weight recency. A form from two weeks out tells you more than one from two months out.
- Correct for known bias. Mentally shave a few points off UWSA 2; trust NBME forms closer to face value.
- Read the range, not the point. A prediction is a probability band — plan around the floor of it for competitive specialties.
How we measured this
Every figure above comes from our internal dataset of 5,039 verified Step 2 CK outcomes — practice scores paired with the real reported score, anonymized at submission and screened for duplicates and outliers. Correlations are Pearson coefficients computed per assessment; “typical accuracy” is the band that contains roughly 80% of predictions. You can read the full data-handling and validation process on our methodology page and the per-exam correlation tables on our Step 2 accuracy page.
Predict your real Step 2 CK score
Enter your NBME, UWSA, and Free 120 scores and let the 3-method ensemble do the bias correction and averaging for you — free, instant, no account.
Predict my score →Frequently asked questions
What is the most accurate NBME form for Step 2 CK?
In our dataset of 5,039 verified outcomes, NBME Form 14 is the single most accurate predictor of the real Step 2 CK score, with a Pearson correlation of r = 0.92 and 80% of predictions landing within ±5–7 points. NBME Form 13 (r = 0.88) and UWSA 2 (r = 0.89) are close behind.
Is NBME Form 14 more accurate than UWSA 2?
Statistically, yes — Form 14 correlates slightly higher (r = 0.92 vs 0.89). The bigger difference is bias: UWSA 2 tends to overpredict the real score by about 3 points on average, while NBME forms track closer to the true value. UWSA 2 is still excellent for testing 8-block stamina.
How many practice forms should I take before Step 2 CK?
Accuracy improves when you combine forms. A single form has real measurement noise; averaging two or three recent assessments (e.g. NBME Form 14 + UWSA 2 + Free 120) cancels out form-specific quirks and tightens the prediction. Take your most accurate forms last, within two weeks of test day.
Does a higher correlation mean my score is guaranteed?
No. Even r = 0.92 leaves residual variance. Treat every prediction as a range, not a point — a forecast of 250 with ±6 points means most students with your profile score between 244 and 256. Test-day factors (stress, fatigue, content luck) move individuals within that band.