Is the Difference Real? A Permutation Test

BasketballIntermediatePython~5 min read

What you'll build

A permutation test on the NBA home-scoring edge: shuffle home/away thousands of times, build the null distribution, and read a p-value off it.

A permutation test on the NBA home-scoring edge: shuffle home/away thousands of times, build the null distribution, and read a p-value off it.
Data: Bundled sample (real 2023-24 NBA game results) + permutation, retrieved June 2026

You measured a difference — home teams scored 2.16 more points per game than visitors. The next question is the one that separates analysis from anecdote: is that real, or could chance alone produce a gap that big? The permutation test answers it without a single formula from a stats textbook. You just simulate a world where the effect doesn't exist, thousands of times, and check whether your real result could plausibly have come from it. If it couldn't, the effect is real.

Go deeper with the free textbook: Nonparametric Methods at DataField.dev.

This pairs with Bootstrap a Confidence Interval — bootstrapping asks “how precise is my estimate?”, a permutation test asks “is the effect there at all?”. Both are resampling, both are pure numpy. The data is the bundled nba_home_results.csv (1,231 real 2023-24 games), so it runs offline.

  1. Measure the real effect

    For each game, take the home margin in points (home minus away), then average it. That average is the observed home-scoring edge.

    python
    import numpy as np
    import pandas as pd
    
    df = pd.read_csv("nba_home_results.csv")
    diff = (df["home_pts"] - df["away_pts"]).to_numpy(float)   # per-game home - away
    observed = diff.mean()
    The observed edge
    Games: 1231
    Home avg: 115.29, Away avg: 113.13
    Observed mean (home - away): 2.16 points

    Home teams averaged 2.16 points more. It looks like a home-court effect — but “looks like” isn't evidence. We need to know what chance alone could do.

  2. Build the null world

    Here's the key idea. If home/away truly didn't matter, then within each game the two scores would be interchangeable — it would be pure coincidence which one we labeled “home.” So we simulate that null world by randomly flipping the sign of each game's difference, then recomputing the average. Do it thousands of times and you get the full range of average edges chance can manufacture when there's no real effect.

    python
    rng = np.random.default_rng(2026)
    N = 20000
    
    signs = rng.choice([-1.0, 1.0], size=(N, len(diff)))   # random flip per game
    null_means = (signs * diff).mean(axis=1)               # one fake "edge" per shuffle

    Each of the 20,000 null_means is the home edge you’d have measured in a world where home advantage is a myth and the labels are arbitrary. The whole distribution is centered on zero, as it must be.

  3. Read the p-value

    The p-value is simply: how often did the null world produce an edge as extreme as the real 2.16 (in either direction)?

    python
    p = np.mean(np.abs(null_means) >= abs(observed))
    print("p-value:", p)
    What chance alone can do
    Permutations: 20000
    Null distribution: mean -0.002, std 0.447
    Biggest gap chance alone produced: 2.11 points
    p-value (two-sided): 0.0000
    Observed 2.16 is far outside what chance produces -> the home edge is real.

    The answer: essentially never. Across 20,000 shuffles, the biggest edge chance produced was about 2.11 points — and the real one is 2.16, sitting outside the entire null distribution (p < 0.0001). A gap this size simply doesn’t happen when home and away are interchangeable, so the home-scoring edge is real, not luck.

  4. See it

    The picture is the proof. Histogram the 20,000 null edges and drop a line at the observed value — it lands off in the empty tail, far from anything chance generated.

    python
    import matplotlib.pyplot as plt
    
    fig, ax = plt.subplots(figsize=(9, 5.5))
    ax.hist(null_means, bins=50)
    ax.axvline(observed, color="#1D4E89", lw=2.6)   # the real edge
    ax.set_xlabel("mean (home - away) under the null")
    fig.savefig("permutation_null.png", dpi=144, bbox_inches="tight")
    A histogram of 20,000 permutation null means centered on zero and spanning roughly minus 2 to plus 2 points, with a bold vertical line at the observed plus 2.16 sitting outside the bulk of the distribution
    Data: Bundled sample (real 2023-24 NBA game results) + permutation, retrieved June 2026

    That gap between the line and the histogram is the effect. The wider it is, the more confident you can be that what you measured isn’t noise.

Troubleshooting

My p-value is exactly 0

That means no permutation matched or beat your observed value — report it as “p < 1/N” (here, p < 0.00005), not literally zero. A permutation test can only resolve a p-value as small as one over the number of shuffles; run more permutations if you need a finer bound.

When do I flip signs vs. shuffle labels?

Flipping signs is the paired version, right when each row pairs two conditions (home vs away in the same game). For two independent groups (say, scores in domes vs outdoors), you instead pool all values and randomly reassign group labels. Same logic, different resampling.

Permutation test or bootstrap?

Different questions. A permutation test asks “is there an effect?” by simulating no-effect and computing a p-value. A bootstrap asks “how big is the effect, with what uncertainty?” by resampling your data for a confidence interval. Often you report both.

Challenge yourself

Run the independent-groups version: does scoring differ between two specific teams' games, or between weekend and weekday games? Pool the values, shuffle the group labels thousands of times, and compute the p-value. Then compare your permutation p-value to a classic t-test on the same data — they should land in the same ballpark, but the permutation test made no assumption about the data's shape.

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

Download the finished script (68_permutation_test_significance.py)

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py, sdt_nba.py.

More Basketball tutorials

A current-standings DataFrame from nba_api, with the proper headers baked in.
Basketball Beginner

Pull Your First NBA Data with nba_api

Pull NBA standings with nba_api, with the browser headers and retry logic stats.nba.com demands. Includes exactly what to do when the endpoint refuses to answer.

~9 min
A ranked net-rating table styled like a real dashboard, exported as an image.
Basketball Intermediate

Build a Team Net-Rating Dashboard Table

Combine offensive and defensive ratings into a ranked net-rating table, then style it into a dashboard-quality figure you can drop into a report.

~8 min
A half-court drawn in matplotlib with a player's makes and misses plotted on it.
Basketball Intermediate

Draw an NBA Shot Chart with matplotlib

Draw a regulation half-court from scratch in matplotlib, then plot a player's makes and misses in court coordinates for a real, shareable shot chart.

~10 min