A Correlation Heatmap: Which Team Stats Move Together

BasketballIntermediatePython~5 min read

What you'll build

A correlation matrix of team ratings, drawn as an annotated heatmap.

A correlation matrix of team ratings, drawn as an annotated heatmap.
Data: Bundled sample (NBA team ratings), retrieved June 2026

Hand me a table with six numeric columns and my first question is always the same: which of these move together? A correlation matrix answers it for every pair at once, and a heatmap turns that grid of numbers into something you can read in a glance — warm where stats rise together, cool where one rises as the other falls. Run it on NBA team ratings and the picture makes it obvious which number actually tracks winning.

This builds on Summary Statistics and Distributions. The data is the bundled nba_ratings.csv (per-100-possession team ratings, Basketball-Reference), so it runs offline.

  1. One line builds the matrix

    .corr() computes the Pearson correlation between every pair of numeric columns — a value from −1 (perfect opposite) through 0 (no linear relationship) to +1 (perfect agreement).

    python
    import pandas as pd
    
    df = pd.read_csv("nba_ratings.csv")
    cols = ["W", "ORtg", "DRtg", "NRtg"]
    corr = df[cols].corr().round(2)
    print(corr.to_string())
    The correlation matrix
             W  ORtg  DRtg  NRtg
    W     1.00  0.87 -0.75  0.98
    ORtg  0.87  1.00 -0.39  0.89
    DRtg -0.75 -0.39  1.00 -0.77
    NRtg  0.98  0.89 -0.77  1.00

    Read down the W (wins) column. Net rating correlates with wins at 0.98 — almost perfectly, as you'd hope from a stat built to summarize team quality. Offensive rating sits at 0.87. Defensive rating is −0.75: negative because a lower defensive rating is better, so allowing fewer points per 100 goes with winning more. The diagonal is all 1.00 — every column correlates perfectly with itself.

  2. Turn the grid into a heatmap

    imshow draws the matrix as colored cells. A diverging colormap centered at zero is the right choice, so positive and negative correlations read as opposite colors; we annotate each cell so the exact value is there too.

    python
    import matplotlib.pyplot as plt
    
    fig, ax = plt.subplots(figsize=(6.5, 5.5))
    im = ax.imshow(corr.values, cmap="RdBu_r", vmin=-1, vmax=1)
    ax.set_xticks(range(len(cols))); ax.set_xticklabels(cols)
    ax.set_yticks(range(len(cols))); ax.set_yticklabels(cols)
    for i in range(len(cols)):
        for j in range(len(cols)):
            ax.text(j, i, f"{corr.values[i, j]:.2f}", ha="center", va="center")
    fig.colorbar(im, ax=ax)
    fig.savefig("corr_heatmap.png", dpi=144, bbox_inches="tight")
    Correlation heatmap of NBA team wins, offensive, defensive, and net rating, with strong positive cells in deep red and the defensive-rating row/column in blue
    Data: Bundled sample (NBA team ratings), retrieved June 2026

    Setting vmin=-1, vmax=1 pins the color scale so zero is always the neutral midpoint — without it, matplotlib would stretch the colors to the data's range and exaggerate weak correlations. The deep-red wins/net-rating cell and the blue defensive-rating row tell the whole story at a glance.

  3. Read it honestly

    Two cautions. First, correlation measures only a linear relationship, and it is not causation — net rating correlates with wins because it's essentially a restatement of them, not because it causes them. Second, a near-zero correlation means no linear link, not "no relationship" (a U-shaped pattern can hide at zero). A heatmap is a fast way to find leads worth investigating, not a verdict. For quantifying a single relationship properly, fit a line — see correlation and regression.

Troubleshooting

.corr() throws an error or drops columns

It only works on numeric columns; a text column (like team name) is excluded or, in older pandas, raises. Select the numeric columns explicitly as we did, or call df.corr(numeric_only=True).

The colors exaggerate weak correlations

You didn't fix the scale. Pass vmin=-1, vmax=1 to imshow so the full correlation range maps to the full color range and zero stays neutral. A diverging colormap (like RdBu_r) makes the sign obvious.

The whole heatmap is deep red

Often expected when your columns are near-duplicates (net rating is built from offensive and defensive rating, so they're bound to correlate). Add more distinct stats, or mask the diagonal/upper triangle to focus on the pairs you care about.

Challenge yourself

Add a couple more columns to the matrix (compute pace or a rate stat first), and try masking the upper triangle so each pair shows once: build a boolean mask with numpy.triu and set those cells to nan before plotting. It's the difference between a busy grid and a clean, publishable one.

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

Download the finished script (51_correlation_heatmap.py)

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py, sdt_nba.py.

More Basketball tutorials

A current-standings DataFrame from nba_api, with the proper headers baked in.
Basketball Beginner

Pull Your First NBA Data with nba_api

Pull NBA standings with nba_api, with the browser headers and retry logic stats.nba.com demands. Includes exactly what to do when the endpoint refuses to answer.

~9 min
A ranked net-rating table styled like a real dashboard, exported as an image.
Basketball Intermediate

Build a Team Net-Rating Dashboard Table

Combine offensive and defensive ratings into a ranked net-rating table, then style it into a dashboard-quality figure you can drop into a report.

~8 min
A half-court drawn in matplotlib with a player's makes and misses plotted on it.
Basketball Intermediate

Draw an NBA Shot Chart with matplotlib

Draw a regulation half-court from scratch in matplotlib, then plot a player's makes and misses in court coordinates for a real, shareable shot chart.

~10 min