ROC Curves and AUC From Scratch

BasketballAdvancedPython~6 min read

What you'll build

Build a ROC curve and its area (AUC) from scratch in pure numpy for the tutorial-71 home-win classifier: sweep the threshold to trace true-positive vs false-positive rate, then compute AUC two independent ways (trapezoid area and the rank identity) and watch them agree at 0.755.

Build a ROC curve and its area (AUC) from scratch in pure numpy for the tutorial-71 home-win classifier: sweep the threshold to trace true-positive vs false-positive rate, then compute AUC two independent ways (trapezoid area and the rank identity) and watch them agree at 0.755.
Data: Bundled Basketball-Reference net ratings + 1,231 game results; logistic regression, ROC/AUC on a held-out test set, retrieved June 2026

In the train/test tutorial we scored the home-win classifier with one number: accuracy at a 0.50 threshold. But that number hides a decision. “Call it a home win when the predicted probability clears 0.50” is a choice — slide that cutoff and you trade missed wins for false alarms. The ROC curve shows the whole trade-off at once, and the area under it (AUC) collapses the curve into a single, threshold-free score: the probability the model rates a random actual home win above a random home loss.

We build both from scratch in pure numpy, on the same tutorial-71 dataset (net-rating gap → home win), and cross-check AUC two independent ways. Offline, on bundled CSVs.

Go deeper with the free textbook: Chapter 29: Evaluating Models — Accuracy, Precision, Recall, and ROC at DataField.dev.

  1. Train the classifier and get predicted probabilities

    Reuse the tutorial-71 logistic regression and the tutorial-72 train/test split. The model outputs a probability for each held-out game; that probability — not a hard 0/1 label — is what the ROC curve sweeps over.

    python
    import numpy as np
    # ... build x_all (standardized net-rating gap) and y_all (home win) as in tutorial 71 ...
    
    rng = np.random.default_rng(74)
    idx = rng.permutation(len(x_all))
    cut = int(0.75 * len(x_all))
    tr, te = idx[:cut], idx[cut:]
    
    def sigmoid(z): return 1.0 / (1.0 + np.exp(-z))
    
    w, b, lr = 0.0, 0.0, 0.3
    for _ in range(600):                       # train on the training games only
        p = sigmoid(w*x_all[tr] + b)
        err = p - y_all[tr]
        w -= lr*np.mean(err*x_all[tr]); b -= lr*np.mean(err)
    
    scores = sigmoid(w*x_all[te] + b)          # predicted P(home win) on held-out games
    y_te = y_all[te]

    Keep the raw scores — the ROC curve needs the probabilities, not thresholded labels.

  2. Sweep the threshold to trace the ROC curve

    A clever trick avoids looping over every possible cutoff: sort the games from highest predicted score to lowest, then walk down the list. Each step “accepts” one more game as a predicted win. The running count of real wins accepted is the true-positive tally; the running count of real losses accepted is the false-positive tally. Divide each by its total and you have the curve.

    python
    def roc_points(y_true, y_score):
        P = y_true.sum()                       # total real positives (home wins)
        N = len(y_true) - P                    # total real negatives (losses)
        order = np.argsort(-y_score)           # highest score first
        y_sorted = y_true[order]
        tp = np.cumsum(y_sorted)               # true positives caught so far
        fp = np.cumsum(1 - y_sorted)           # false positives so far
        tpr = np.concatenate([[0], tp / P])    # start the curve at (0, 0)
        fpr = np.concatenate([[0], fp / N])
        return fpr, tpr
    
    fpr, tpr = roc_points(y_te, scores)

    As the threshold drops from 1 toward 0, both rates climb from 0 to 1. A model that ranks wins above losses shoots the true-positive rate up first, bowing the curve toward the top-left corner.

  3. Compute AUC two ways — and check they agree

    The area under the curve is just the sum of trapezoid strips between successive points (the “from scratch” version of an integral). Then we verify it with a completely different identity: AUC equals the share of all win/loss pairs in which the model gave the win the higher score — the Mann-Whitney interpretation.

    python
    # trapezoid area under the ROC curve
    auc_trap = float(np.sum(np.diff(fpr) * (tpr[1:] + tpr[:-1]) / 2.0))
    
    # cross-check: P(score(win) > score(loss)) over every win/loss pair
    pos = scores[y_te == 1]
    neg = scores[y_te == 0]
    wins = (pos[:, None] > neg[None, :]).sum()
    ties = (pos[:, None] == neg[None, :]).sum()
    auc_rank = float((wins + 0.5*ties) / (len(pos) * len(neg)))

    Two unrelated routes to the same number is the kind of cross-check that catches bugs the moment they appear.

  4. Read the score

    ROC / AUC on held-out games
    Held-out test games: 308  (174 home wins, 134 losses)
    Accuracy at threshold 0.50: 68.2%
    
    AUC (area under ROC, trapezoid): 0.755
    AUC (rank identity, cross-check): 0.755
    
    Reading it: 0.50 = coin flip, 1.00 = perfect ranking.
    AUC 0.755 means a random actual home win outranks a random loss 76% of the time.
    ROC curve for the home-win classifier on held-out games, bowing above the diagonal coin-flip line, with the area under it shaded and an AUC of 0.755
    Data: Bundled Basketball-Reference net ratings + 1,231 game results; logistic regression, ROC/AUC on a held-out test set, retrieved June 2026

    On 308 held-out games the curve bows well above the diagonal, and both methods return AUC = 0.755 — identical to three decimals, exactly as the theory promises. Read it plainly: a random actual home win outranks a random loss about 76% of the time. Accuracy at the 0.50 cutoff was 68.2%, but AUC says something the single number can't — the model orders games well across every threshold, not just at one.

  5. Why AUC beats a single accuracy

    Accuracy depends on the threshold and on how balanced the classes are; AUC depends on neither. It asks only whether the model ranks wins above losses, which is what you actually want when you'll later tune the cutoff for your own purpose — a confident favorites filter (high threshold) or a wide net (low threshold). 0.50 is a coin flip, 1.00 is perfect ranking, and 0.755 is a genuinely useful classifier built from a single feature. It's the standard headline score for a binary classifier for exactly this reason.

Troubleshooting

My AUC is below 0.5

That means your scores rank losses above wins — the labels or the sign of the score are flipped. An AUC of 0.2 is just a 0.8 model with its predictions inverted. Check that y_true is 1 for home wins and that you sorted by -y_score (descending).

The two AUC numbers don't match

Small gaps come from tied scores. The rank formula credits ties as half (the 0.5*ties term); the trapezoid handles them as vertical/horizontal steps. With continuous probabilities ties are rare, so the two should agree to several decimals — here they match exactly at 0.755.

My curve looks jagged, not smooth

That's correct — an empirical ROC on a few hundred games is a staircase, one step per game. It only looks smooth with huge samples. The area under the staircase is still the exact AUC.

Challenge yourself

Find the threshold that maximizes Youden's J (true-positive rate minus false-positive rate) — the point on the curve farthest from the diagonal — and compare it to the default 0.50. Then add a second feature (say, rest-day difference) to the logistic regression and see whether AUC climbs above 0.755; AUC is the fair way to judge whether a feature actually helps. Finally, average the AUC across the five folds from the cross-validation tutorial for a more stable estimate of how well this model really ranks.

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

Download the finished script (74_roc_curve_and_auc.py)

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py, sdt_nba.py.

More Basketball tutorials

A current-standings DataFrame from nba_api, with the proper headers baked in.
Basketball Beginner

Pull Your First NBA Data with nba_api

Pull NBA standings with nba_api, with the browser headers and retry logic stats.nba.com demands. Includes exactly what to do when the endpoint refuses to answer.

~9 min
A ranked net-rating table styled like a real dashboard, exported as an image.
Basketball Intermediate

Build a Team Net-Rating Dashboard Table

Combine offensive and defensive ratings into a ranked net-rating table, then style it into a dashboard-quality figure you can drop into a report.

~8 min
A half-court drawn in matplotlib with a player's makes and misses plotted on it.
Basketball Intermediate

Draw an NBA Shot Chart with matplotlib

Draw a regulation half-court from scratch in matplotlib, then plot a player's makes and misses in court coordinates for a real, shareable shot chart.

~10 min