15 tutorials

Baseball

MLB Statcast is the richest free data in sports. Pull it with pybaseball and build leaderboards, heatmaps and more.

Start with the pybaseball first-pull tutorial to get real Statcast data flowing, then the Pythagorean expected-standings tutorial to learn the win-expectation toolkit. From there the exit-velocity leaderboard, pitch-location heatmap and spray-chart builds are independent — pick whichever output you want to own first. If an API ever misbehaves, the bundled sample data keeps every tutorial runnable offline.

Your first real Statcast pull, cached, with an exit-velocity histogram.

Baseball Beginner

Pull Your First MLB Data with pybaseball

Install pybaseball, turn on caching, and pull a week of real Statcast data — ending with an exit-velocity histogram that proves the data is really there.

~8 min

A box plot comparing run-differential distributions between the AL and NL.

Baseball Beginner

Box Plots: Comparing Distributions Across Groups

Averages hide spread. Build a box plot comparing the AL and NL run-differential distributions at once — median, quartiles, and outliers in one figure.

~5 min

Wins and losses stacked into a single bar per team to show composition.

Baseball Beginner

Stacked Bar Charts: Win-Loss Composition

Stack wins and losses into one bar per team to show composition — and learn when stacking clarifies a comparison and when it just muddies the picture.

~5 min

A run-differential line shaded green above the league average and red below it.

Baseball Beginner

fill_between: Shade Above and Below a Baseline

fill_between() paints the area between two lines; its where= mask colors above- and below-average regions differently, with interpolate=True closing the wedge.

~5 min

A pie and a donut of each division's share of league-wide runs, plus when a bar beats both.

Baseball Beginner

Pie and Donut Charts: Showing Part-to-Whole (and When Not To)

Build a pie and a donut of each division's share of runs, then learn the situations — close values or many slices — where a bar chart is the honest choice.

~5 min

A sortable leaderboard ranking hitters by average and max exit velocity.

Baseball Intermediate

Build a Statcast Exit-Velocity Leaderboard from Scratch

Pull a week of batted-ball data with pybaseball, then group, aggregate, and rank hitters by average and max exit velocity into a clean, publishable leaderboard.

~5 min

A pitch-location heatmap for one pitcher with the strike zone drawn on top.

Baseball Intermediate

Make a Pitch-Location Heatmap in Python

Use a single pitcher's Statcast data to build a 2-D location heatmap, draw the strike zone from the catcher's view, and read what the hot spots tell you.

~8 min

Baseball Intermediate

Profile a Pitcher: Pitch Mix and Velocity with Statcast

Use one pitcher's Statcast season to break down his pitch mix, average velocity, and spin by pitch type - the scouting report you can build yourself in a few lines.

~8 min

A hitter's batted balls plotted on a baseball field, by outcome.

Baseball Intermediate

Build a Hitter's Spray Chart

Turn Statcast hit coordinates into a spray chart on a field you draw yourself, colored by outcome, to see whether a hitter pulls or uses the whole field.

~9 min

A scatter of actual vs. Pythagorean-expected wins that flags 2023's luckiest and unluckiest teams.

Baseball Intermediate

Pythagorean Wins: Expected Standings from Run Differential

Turn runs scored and allowed into expected wins with the Pythagorean formula, then plot which 2023 MLB teams over- and under-performed their run math.

~5 min

A run-differential power ranking, drawn as a lollipop chart.

Baseball Intermediate

Ranking Teams: Build a Power Ranking with rank()

Turn a column into a ranking with rank(), handle ties the right way, and draw a lollipop chart - a cleaner alternative to bars when the order is the story.

~5 min

A percentile-ranked bar chart of every team, colored into four quartile tiers.

Baseball Intermediate

Percentile Ranks and Tiers: Locate a Team in the Distribution

A raw rank says 8th of 30; a percentile rank says 73rd percentile. Use rank(pct=True) and qcut() to score teams and bin them into quartile tiers.

~5 min

A chart of hitters whose results trail what their contact deserved.

Baseball Advanced

Expected vs Actual: Find the Hitters Getting Unlucky

Statcast's expected stats model what should have happened from launch speed and angle. Compare expected to actual wOBA to find who's been unlucky - or living right.

~9 min

A pure-numpy gradient descent that learns the run-differential-to-wins line by stepping downhill on the error surface, with a loss curve and the fitted line - landing on the exact closed-form answer.

Baseball Advanced

Gradient Descent From Scratch: How Models Actually Learn

Fit wins from run differential by stepping downhill on the error surface in pure numpy — the loss falls from 0.97 to 0.11 and lands on polyfit's exact line.

~5 min

A ridge estimator built in pure numpy on three collinear MLB predictors where plain OLS is singular and fails outright, plus the coefficient shrinkage path as the penalty grows.

Baseball Advanced

Ridge Regression From Scratch: Taming Collinear Features

Three exactly collinear MLB predictors make plain OLS unsolvable. Build the closed-form ridge estimator in numpy, watch it succeed, and read the shrinkage path.

~5 min