
Pull Your First MLB Data with pybaseball
Install pybaseball, turn on caching, and pull a week of real Statcast data. End with a histogram of batted-ball exit velocity so you can see the data is genuinely there.
MLB Statcast is the richest free data in sports. Pull it with pybaseball and build leaderboards, heatmaps and more.
No tutorials match that filter.

Install pybaseball, turn on caching, and pull a week of real Statcast data. End with a histogram of batted-ball exit velocity so you can see the data is genuinely there.

Averages hide spread. Build a box plot to compare whole distributions at once - median, quartiles, and outliers - using American League vs. National League run differential.

Stack two series into one bar to show composition rather than just totals. Build a win-loss stacked bar chart and learn when stacking clarifies a comparison and when it muddies it.

fill_between() paints the area between two lines, and its where= argument masks the fill so you can color above-average and below-average regions differently in a single chart - with interpolate=True closing the wedge exactly at the crossover.

A pie answers one question well: what share of the whole is each slice? Build a pie and a donut of each division's share of runs scored, then learn the situation - close or many values - where a humble bar chart is the honest choice.

Pull a week of batted-ball data with pybaseball, then group, aggregate, and rank hitters by average and max exit velocity into a clean, publishable leaderboard.

Use a single pitcher's Statcast data to build a 2-D location heatmap, draw the strike zone from the catcher's view, and read what the hot spots tell you.

Use one pitcher's Statcast season to break down his pitch mix, average velocity, and spin by pitch type - the scouting report you can build yourself in a few lines.

Turn Statcast hit-coordinate data into a spray chart on a field you draw yourself, colored by outcome, to see whether a hitter pulls the ball or uses the whole field.

Run differential predicts wins better than a team's actual record does. Use the Pythagorean formula to turn runs scored and allowed into expected wins, then plot who over- and under-performed.

Turn a column into a ranking with rank(), handle ties the right way, and draw a lollipop chart - a cleaner alternative to bars when the order is the story.

A raw rank says '8th of 30'; a percentile rank says '73rd percentile' - a 0-1 position in the distribution that travels across pools of different sizes. Use rank(pct=True) and qcut() to score teams and bin them into tiers.

Statcast's expected stats model what should have happened from launch speed and angle. Compare expected to actual wOBA to find who's been unlucky - or living right.