In the train/test tutorial we scored the home-win classifier with one number: accuracy at a 0.50 threshold. But that number hides a decision. “Call it a home win when the predicted probability clears 0.50” is a choice — slide that cutoff and you trade missed wins for false alarms. The ROC curve shows the whole trade-off at once, and the area under it (AUC) collapses the curve into a single, threshold-free score: the probability the model rates a random actual home win above a random home loss.

We build both from scratch in pure numpy, on the same tutorial-71 dataset (net-rating gap → home win), and cross-check AUC two independent ways. Offline, on bundled CSVs.

Go deeper with the free textbook: Chapter 29: Evaluating Models — Accuracy, Precision, Recall, and ROC at DataField.dev.

Train the classifier and get predicted probabilities

Reuse the tutorial-71 logistic regression and the tutorial-72 train/test split. The model outputs a probability for each held-out game; that probability — not a hard 0/1 label — is what the ROC curve sweeps over.

python

import numpy as np
# ... build x_all (standardized net-rating gap) and y_all (home win) as in tutorial 71 ...

rng = np.random.default_rng(74)
idx = rng.permutation(len(x_all))
cut = int(0.75 * len(x_all))
tr, te = idx[:cut], idx[cut:]

def sigmoid(z): return 1.0 / (1.0 + np.exp(-z))

w, b, lr = 0.0, 0.0, 0.3
for _ in range(600):                       # train on the training games only
    p = sigmoid(w*x_all[tr] + b)
    err = p - y_all[tr]
    w -= lr*np.mean(err*x_all[tr]); b -= lr*np.mean(err)

scores = sigmoid(w*x_all[te] + b)          # predicted P(home win) on held-out games
y_te = y_all[te]

Keep the raw scores — the ROC curve needs the probabilities, not thresholded labels.

Sweep the threshold to trace the ROC curve

A clever trick avoids looping over every possible cutoff: sort the games from highest predicted score to lowest, then walk down the list. Each step “accepts” one more game as a predicted win. The running count of real wins accepted is the true-positive tally; the running count of real losses accepted is the false-positive tally. Divide each by its total and you have the curve.
python
```
def roc_points(y_true, y_score):
    P = y_true.sum()                       # total real positives (home wins)
    N = len(y_true) - P                    # total real negatives (losses)
    order = np.argsort(-y_score)           # highest score first
    y_sorted = y_true[order]
    tp = np.cumsum(y_sorted)               # true positives caught so far
    fp = np.cumsum(1 - y_sorted)           # false positives so far
    tpr = np.concatenate([[0], tp / P])    # start the curve at (0, 0)
    fpr = np.concatenate([[0], fp / N])
    return fpr, tpr

fpr, tpr = roc_points(y_te, scores)
```
As the threshold drops from 1 toward 0, both rates climb from 0 to 1. A model that ranks wins above losses shoots the true-positive rate up first, bowing the curve toward the top-left corner.

Compute AUC two ways — and check they agree

The area under the curve is just the sum of trapezoid strips between successive points (the “from scratch” version of an integral). Then we verify it with a completely different identity: AUC equals the share of all win/loss pairs in which the model gave the win the higher score — the Mann-Whitney interpretation.
python
```
# trapezoid area under the ROC curve
auc_trap = float(np.sum(np.diff(fpr) * (tpr[1:] + tpr[:-1]) / 2.0))

# cross-check: P(score(win) > score(loss)) over every win/loss pair
pos = scores[y_te == 1]
neg = scores[y_te == 0]
wins = (pos[:, None] > neg[None, :]).sum()
ties = (pos[:, None] == neg[None, :]).sum()
auc_rank = float((wins + 0.5*ties) / (len(pos) * len(neg)))
```
Two unrelated routes to the same number is the kind of cross-check that catches bugs the moment they appear.
Read the score
ROC / AUC on held-out games
```
Held-out test games: 308  (174 home wins, 134 losses)
Accuracy at threshold 0.50: 68.2%

AUC (area under ROC, trapezoid): 0.755
AUC (rank identity, cross-check): 0.755

Reading it: 0.50 = coin flip, 1.00 = perfect ranking.
AUC 0.755 means a random actual home win outranks a random loss 76% of the time.
```
Data: Bundled Basketball-Reference net ratings + 1,231 game results; logistic regression, ROC/AUC on a held-out test set, retrieved June 2026

On 308 held-out games the curve bows well above the diagonal, and both methods return AUC = 0.755 — identical to three decimals, exactly as the theory promises. Read it plainly: a random actual home win outranks a random loss about 76% of the time. Accuracy at the 0.50 cutoff was 68.2%, but AUC says something the single number can't — the model orders games well across every threshold, not just at one.
Why AUC beats a single accuracy

Accuracy depends on the threshold and on how balanced the classes are; AUC depends on neither. It asks only whether the model ranks wins above losses, which is what you actually want when you'll later tune the cutoff for your own purpose — a confident favorites filter (high threshold) or a wide net (low threshold). 0.50 is a coin flip, 1.00 is perfect ranking, and 0.755 is a genuinely useful classifier built from a single feature. It's the standard headline score for a binary classifier for exactly this reason.

Troubleshooting

My AUC is below 0.5

That means your scores rank losses above wins — the labels or the sign of the score are flipped. An AUC of 0.2 is just a 0.8 model with its predictions inverted. Check that y_true is 1 for home wins and that you sorted by -y_score (descending).

The two AUC numbers don't match

Small gaps come from tied scores. The rank formula credits ties as half (the 0.5*ties term); the trapezoid handles them as vertical/horizontal steps. With continuous probabilities ties are rare, so the two should agree to several decimals — here they match exactly at 0.755.

My curve looks jagged, not smooth

That's correct — an empirical ROC on a few hundred games is a staircase, one step per game. It only looks smooth with huge samples. The area under the staircase is still the exact AUC.

Challenge yourself

Find the threshold that maximizes Youden's J (true-positive rate minus false-positive rate) — the point on the curve farthest from the diagonal — and compare it to the default 0.50. Then add a second feature (say, rest-day difference) to the logistic regression and see whether AUC climbs above 0.755; AUC is the fair way to judge whether a feature actually helps. Finally, average the AUC across the five folds from the cross-validation tutorial for a more stable estimate of how well this model really ranks.

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py, sdt_nba.py.

ROC Curves and AUC From Scratch

What you'll build

Train the classifier and get predicted probabilities

Sweep the threshold to trace the ROC curve

Compute AUC two ways — and check they agree

Read the score

Why AUC beats a single accuracy

Troubleshooting

Challenge yourself

Get the code

More Basketball tutorials

Pull Your First NBA Data with nba_api

Build a Team Net-Rating Dashboard Table

Draw an NBA Shot Chart with matplotlib