23 tutorials

Foundations

Sport-agnostic skills - Python, pandas, plotting and data cleaning - that every other tutorial builds on.

If you're new, run the Python setup tutorial first, then the twelve-operation pandas tour and the grouping-and-pivoting tutorial; every sport track assumes them. The statistics tutorials — summary stats and distributions, correlation and regression, z-scores — slot in whenever a sport tutorial leans on a concept you haven't met. Everything runs on bundled real data, in your browser if you like — no installs required to follow along.

A working Python setup that imports pandas and matplotlib and prints their versions.

Foundations Beginner

Setting Up Python for Sports Analytics: A Complete Beginner's Walkthrough

Install Python, a code editor, and the core data libraries the right way, then prove your setup works by running a tiny script that imports pandas and matplotlib.

~8 min

A cheat-sheet of the twelve pandas moves you'll reuse in every other tutorial.

Foundations Beginner

Pandas for Sports Data: The 12 Operations You'll Use Constantly

Load, filter, sort, group, and join a real standings table. The twelve pandas operations here cover the vast majority of everyday sports-data work.

~8 min

A clean, labeled bar chart of team run totals - your first real figure.

Foundations Beginner

Your First Sports Data Visualization with matplotlib

Go from a DataFrame to a clean, labeled bar chart with matplotlib. Learn the figure/axes model, titles, and how to save a chart you'd actually publish.

~6 min

A live pull from the public NHL API turned into a tidy standings DataFrame.

Foundations Beginner

How to Read API Documentation: A Sports Data Field Guide

Endpoints, parameters, JSON, status codes - learn to read API docs by pulling live standings from the public NHL API and shaping the response into a DataFrame.

~8 min

A run-differential histogram with the mean and median marked, plus the one-line summary that describes any column.

Foundations Beginner

Summary Statistics and Distributions with pandas

Meet your data before charting it: describe() summarizes a column in one line, and a histogram shows the distribution those numbers only hint at.

~5 min

Two separate tables joined on a shared key into one, then charted.

Foundations Beginner

Merging and Joining Two Datasets with pandas

Combine a record table and a scoring table on a shared key with pandas merge, choose the right join type, and chart a column that lived in neither alone.

~5 min

Per-game rate columns derived from season totals, shown in a dumbbell chart.

Foundations Beginner

Computing New Columns: Rate Stats and Feature Engineering

Season totals hide the rates that matter. Build runs per game, run differential per game, and win percentage with vectorized math, then chart them.

~5 min

Win totals bucketed into tiers with .apply(), counted, and charted.

Foundations Beginner

Apply and Map: Custom Column Logic in pandas

Use .apply() to run any function per row and .map() to translate codes to labels, turning raw win totals into the labeled tiers your analysis needs.

~5 min

Run differential sliced into equal-width bands, counted, and charted as a bar of teams per band.

Foundations Beginner

Binning Continuous Data into Categories with pd.cut()

pd.cut() slices a continuous column into fixed-width bands you define — the companion to qcut() — turning raw run differential into labeled tiers, charted.

~5 min

A tidy, correctly-typed DataFrame rescued from a deliberately messy CSV.

Foundations Intermediate

Cleaning Messy Sports Data: Real-World Fixes for Real-World Files

Mixed types, stray whitespace, duplicate rows, percent signs in numbers - fix the data problems that derail real sports projects, with before-and-after proof.

~8 min

A pivot table and a tidy long-format frame from the same standings data.

Foundations Intermediate

Group, Pivot, and Reshape: Aggregating Sports Data Like a Pro

Go beyond groupby: build pivot tables, reshape between wide and long with melt and stack, and turn raw rows into the summary tables analysts actually present.

~8 min

A team's 10-game rolling run differential over a full season.

Foundations Intermediate

Rolling Averages and Form: Time-Series Basics for Sports

Pull a team's game log, sort by date, and use rolling windows and cumulative sums to measure form and momentum - the foundation of every 'hot streak' chart.

~7 min

Foundations Intermediate

Publication-Ready Charts: matplotlib Styling and Annotations

Turn a default matplotlib chart into a publishable figure: custom colors, direct labels, annotations that call out the story, and a high-res export.

~8 min

A grid of mini-charts, one per division, sharing the same scale.

Foundations Intermediate

Small Multiples: Compare Every Team at Once with Subplots

Learn matplotlib's subplots to build small multiples - a grid of tiny charts that share axes so you can compare many teams or groups at a single glance.

~7 min

A diverging bar chart of every team's run differential, expressed in standard deviations from the mean.

Foundations Intermediate

Standardize Stats with Z-Scores: Compare Columns on One Scale

Convert runs scored, runs allowed, and run differential to z-scores so one number means the same thing in every column, then read a diverging bar chart.

~5 min

An empirical cumulative distribution of run differential with the median read off the curve.

Foundations Intermediate

The ECDF: Read Any Percentile Off a Cumulative Curve

Where a histogram bins and smooths, the ECDF is exact: sort the values, assign each a running share, and read any percentile straight off the staircase.

~5 min

A combo chart with runs scored as bars on the left axis and win percentage as a line on the right.

Foundations Intermediate

Dual-Axis Charts: Plot Two Scales with twinx()

Use ax.twinx() to give a chart a second y-axis so runs scored and win percentage fit on one plot — and learn when a dual axis quietly misleads.

~5 min

Thousands of simulated 162-game seasons for two real teams, showing how far luck alone swings a record.

Foundations Intermediate

Monte Carlo: How Much Does Luck Move a Season's Record?

Treat a team's winning percentage as true talent, then simulate thousands of 162-game seasons with weighted coin flips. The spread is the luck in a record.

~5 min

One NBA team's game-by-game point margin smoothed two ways - a simple 10-game rolling mean and an EWMA - showing the EWMA reacting faster to hot and cold streaks without a fixed window's hard edges.

Foundations Intermediate

Exponentially Weighted Averages: Form That Weights Recent Games More

Smooth one NBA team's margin series two ways — a 10-game rolling mean versus pandas .ewm() — and watch the EWMA react faster to streaks without a hard window.

~5 min

1,231 real NBA game margins flagged for outliers three ways - IQR (Tukey) fences, the modified z-score using the median and MAD, and the fragile plain z-score - with the fences drawn on the distribution and the blowouts highlighted.

Foundations Intermediate

Outlier Detection: IQR Fences and the Modified Z-Score

Flag blowouts in 1,231 real NBA margins three ways — IQR fences, the median/MAD modified z-score, and the fragile plain z-score — and see why robust wins.

~5 min

One chart comparing home-win rate across MLB, NBA, soccer, the NFL, and the NHL.

Foundations Advanced

Same Question, Five Sports: Quantifying Home Advantage Across Leagues

A capstone: ask one question - how big is home advantage? - across five leagues, reusing everything you've learned, and put the answers on a single comparison chart.

~11 min

A scatter of goal differential vs points with a fitted trend line and R-squared.

Foundations Advanced

Correlation and Regression: What Actually Predicts Winning?

Quantify a relationship instead of eyeballing it: compute a correlation, fit a regression line with numpy, and report R-squared on NHL goal differential.

~8 min

A bump chart tracing each team's rank through the season.

Foundations Advanced

A Standings Bump Chart: Visualizing a Season's Twists

Reconstruct the standings week by week from a season of game results and draw a bump chart - the elegant ranked-line visual that shows every rise and fall.

~10 min