20 tutorials

Foundations

Sport-agnostic skills - Python, pandas, plotting and data cleaning - that every other tutorial builds on.

Level:
A run-differential histogram with the mean and median marked, plus the one-line summary that describes any column.
Foundations Beginner

Summary Statistics and Distributions with pandas

Before you chart anything, meet your data. Use describe() to summarize a column in a single line, then draw a histogram to see the distribution those numbers only hint at.

~5 min
Two separate tables joined on a shared key into one, then charted.
Foundations Beginner

Merging and Joining Two Datasets with pandas

Real data rarely arrives in one table. Learn pandas merge/join: combine a record table and a scoring table on a shared key, choose the right join type, and chart a column that lived in neither alone.

~5 min
Win totals bucketed into tiers with .apply(), counted, and charted.
Foundations Beginner

Apply and Map: Custom Column Logic in pandas

Go beyond arithmetic on columns: use .apply() to run any function per row and .map() to translate codes to labels, turning raw numbers into the categories your analysis needs.

~5 min
Run differential sliced into equal-width bands, counted, and charted as a bar of teams per band.
Foundations Beginner

Binning Continuous Data into Categories with pd.cut()

pd.cut() slices a continuous column into fixed, equal-width ranges you define, turning a raw number into labeled tiers. It's the companion to qcut() (equal counts) - here you control the edges and the bands can come out uneven.

~5 min
A diverging bar chart of every team's run differential, expressed in standard deviations from the mean.
Foundations Intermediate

Standardize Stats with Z-Scores: Compare Columns on One Scale

Runs scored, runs allowed, and run differential live on different scales. Convert each to a z-score - standard deviations from the mean - so a single number means the same thing in every column, then read it off a diverging bar chart.

~5 min
An empirical cumulative distribution of run differential with the median read off the curve.
Foundations Intermediate

The ECDF: Read Any Percentile Off a Cumulative Curve

Where a histogram bins and smooths, the ECDF is exact. Sort the values, assign each its running share, and plot a staircase you can read 'what fraction of teams fall below X?' straight off - the most underrated distribution chart there is.

~5 min
A combo chart with runs scored as bars on the left axis and win percentage as a line on the right.
Foundations Intermediate

Dual-Axis Charts: Plot Two Scales with twinx()

Runs scored and win percentage live on wildly different scales. Use ax.twinx() to give a chart a second y-axis so both series fit on one plot - and learn when a dual axis clarifies and when it quietly misleads.

~5 min