You have one row per team, but you want a number per division. You have wide columns of stats, but your plotting library wants them long. Raw sports data almost never lands in the shape your question needs, and the gap between those two shapes is where most of my analysis time used to disappear. Four moves close it: groupby, pivot_table, melt, and crosstab. I'll walk through each on real standings, then draw the final pivot as a heatmap.

This picks up where pandas for sports data: 12 operations left off, so I'll assume you're comfortable selecting columns and reading a DataFrame. We'll work entirely offline from a bundled CSV of the real 2023 MLB final standings (sourced from the MLB Stats API, retrieved June 2026), so there's nothing to download and every number you see is the genuine article.

Load the standings

The script ships with sample_standings.csv sitting right next to it, so we build an absolute path from the script's own location and read it. That trick means the code runs the same no matter which folder you launch it from.
python
```
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

HERE = os.path.dirname(os.path.abspath(__file__))
df = pd.read_csv(os.path.join(HERE, "sample_standings.csv"))
```
Each row is one of the 30 teams, with columns for wins (W), losses (L), runs scored (RS), runs allowed (RA), run differential (RunDiff), plus the League and Division each team belongs to. That mix of numbers and categories is exactly what makes it perfect for reshaping practice.
Recap: groupby for one number per group

You met groupby last tutorial; here's the one-line refresher because every other move builds on it. We split the teams into their two leagues, then average three columns within each league. groupby always follows the same rhythm: split into groups, apply a calculation, combine the results back into a table.
python
```
print(df.groupby("League")[["W", "RS", "RA"]].mean().round(1).to_string())
```
Average wins, runs scored, and runs allowed per league
```
           W     RS     RA
League                    
AL      79.9  737.7  740.7
NL      82.1  757.7  754.8
```
Two rows in, two leagues out. The National League averaged 82.1 wins to the American League's 79.9 - a small edge, but the kind of fact you can only see once you've collapsed 30 rows into a per-group summary. Selecting the columns inside the square brackets before .mean() keeps the output tidy; without it, pandas would try to average every numeric column.
pivot_table: a two-dimensional summary

A groupby gives you one grouping key. A pivot_table gives you a full grid - rows, columns, and an aggregate in each cell - which is how spreadsheet pivot tables work. Here we ask for the average of four stats broken down by division.
python
```
pivot = pd.pivot_table(df, index="Division", values=["W", "RunDiff", "RS", "RA"],
                       aggfunc="mean").round(1)
print(pivot.to_string())
```
Average team stats by division
```
               RA     RS  RunDiff     W
Division                               
AL Central  759.2  683.6    -75.6  71.6
AL East     697.6  771.6     74.0  89.8
AL West     765.2  758.0     -7.2  78.2
NL Central  762.0  748.2    -13.8  80.8
NL East     745.6  765.2     19.6  84.8
NL West     756.8  759.8      3.0  80.8
```
Now the story jumps off the page. The AL East averaged 89.8 wins and a +74.0 run differential - the strongest division in baseball that year - while the AL Central averaged just 71.6 wins and a brutal -75.6 differential. index sets the rows, values picks which columns to summarize, and aggfunc chooses how (we used "mean", but "sum", "max", or "count" work the same way). This single table is the backbone of the heatmap we'll draw at the end.
melt: reshape wide to long

Our table is wide: each metric lives in its own column. Many tools - especially plotting libraries like seaborn - prefer long data, where every observation is its own row tagged with what it measures. melt is the verb for that transformation. We take just the team name plus runs scored and allowed, and stack them.
python
```
long = df[["Team", "RS", "RA"]].melt(id_vars="Team", var_name="metric", value_name="runs")
print(long.head(6).to_string())
```
The first six rows, now in long form
```
      Team metric  runs
0   Braves     RS   947
1  Orioles     RS   807
2  Dodgers     RS   906
3     Rays     RS   860
4  Brewers     RS   728
5   Astros     RS   827
```
Read the output carefully: the Braves' 947 runs scored became a single row with metric equal to RS and runs equal to 947. id_vars names the column to keep fixed (the identifier), while var_name and value_name label the two new columns that hold the former column names and their values. The same teams that had one row now have two - one per metric. This is the shape that lets a chart draw scored-versus-allowed bars side by side with one line of code.

crosstab: count across two categories

Sometimes you don't want to average anything - you just want to count how things fall into a grid of categories. crosstab is the specialist for that. First we add a label marking each team as winning or losing, then count those labels within each division.
python
```
df["record"] = np.where(df["W"] > df["L"], "winning", "losing")
print(pd.crosstab(df["Division"], df["record"]).to_string())
```
Winning vs. losing teams per division
```
record      losing  winning
Division                   
AL Central       4        1
AL East          1        4
AL West          2        3
NL Central       2        3
NL East          2        3
NL West          2        3
```
Each cell is a simple count of teams. The AL East had 4 winning teams and just 1 below .500 - lopsided dominance - while the AL Central had the mirror image, 1 winning and 4 losing. np.where is the vectorized if/else: it evaluates the condition for all 30 rows at once and returns the matching label. crosstab then tallies the combinations. Notice each division row sums to 5, which is a nice sanity check that we counted every team exactly once.
Draw the pivot as a heatmap

A pivot table is just a grid of numbers, and the natural picture of a grid is a heatmap. There's one wrinkle: wins live around 80 while run differential swings from -76 to +74, so raw colors would be dominated by the big-magnitude column. We fix that by standardizing each column to a z-score for the color, while still printing the real averages as text in each cell.
python
```
z = (pivot - pivot.mean()) / pivot.std()
fig, ax = plt.subplots(figsize=(7.6, 5))
im = ax.imshow(z.values, cmap="RdBu_r", aspect="auto", vmin=-2, vmax=2)
ax.set_xticks(range(len(pivot.columns)), pivot.columns)
ax.set_yticks(range(len(pivot.index)), pivot.index)
for i in range(pivot.shape[0]):
    for j in range(pivot.shape[1]):
        ax.text(j, i, pivot.values[i, j], ha="center", va="center", fontsize=9, color="#20242B")
ax.set_title("Average team stats by division, 2023")
fig.colorbar(im, ax=ax, shrink=0.75, label="standardized vs. other divisions")
```
Data: Bundled sample (2023 MLB standings), retrieved June 2026

The colors tell the story at a glance: the AL East row glows red for the good stats and the AL Central row runs cold. Standardizing with (pivot - pivot.mean()) / pivot.std() puts every column on the same -2 to +2 scale, so a hot cell always means "well above average for this stat" regardless of the stat's natural range. The nested loop with ax.text overlays the true number on each tile, giving you both the instant visual and the exact value. That combination - honest color, honest labels - is what separates a useful heatmap from a misleading one.

Troubleshooting

FileNotFoundError: sample_standings.csv

The CSV must sit in the same folder as the script. We build the path with os.path.dirname(os.path.abspath(__file__)) precisely so the location is found no matter your working directory. If you copied just the snippet into a notebook, that __file__ trick won't work - point pd.read_csv at the file's full path instead.

DataError or "No numeric types to aggregate"

This means you handed an aggregation a text column. In pivot_table and groupby, only list numeric columns in values (here, W, RS, RA, RunDiff). The Team and Division columns are labels and belong in index, not values.

My melted table has way more rows than I expected

That's melt working as designed - it multiplies your row count by the number of value columns. Two metrics over 30 teams becomes 60 rows. If it ballooned further, you probably forgot to list a column in id_vars, so pandas melted it too. Keep every identifier column in id_vars.

Challenge yourself

Swap aggfunc="mean" for aggfunc="sum" in the pivot and see how the picture changes when you total runs instead of averaging them. Then build a second crosstab that adds normalize="index" so each division's row shows the share of winning teams rather than the raw count - which division had the highest winning percentage of teams? Finally, melt all four stat columns at once and try grouping the long table back up with groupby("metric") to prove you can travel between shapes in both directions.

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py. Or skip the collecting: the Python Foundations for Sports Data bundle has this whole course’s scripts and data in one ZIP.

Group, Pivot, and Reshape: Aggregating Sports Data Like a Pro

What you'll build

Load the standings

Recap: groupby for one number per group

pivot_table: a two-dimensional summary

melt: reshape wide to long

crosstab: count across two categories

Draw the pivot as a heatmap

Troubleshooting

Challenge yourself

Get the code

More Foundations tutorials

Setting Up Python for Sports Analytics: A Complete Beginner's Walkthrough

Pandas for Sports Data: The 12 Operations You'll Use Constantly

Your First Sports Data Visualization with matplotlib