Build a Statcast Exit-Velocity Leaderboard from Scratch

BaseballIntermediatePython~5 min read

What you'll build

A sortable leaderboard ranking hitters by average and max exit velocity.

A sortable leaderboard ranking hitters by average and max exit velocity.
Data: Baseball Savant via pybaseball, retrieved June 2026

A leaderboard is the most honest chart in sports: it just asks who is best at this one thing? The "one thing" here is exit velocity - how hard hitters strike the ball - measured over a week of real Statcast data. Build it and you'll have practiced the three pandas moves that power almost every leaderboard you'll ever make: groupby, agg, and a join to turn cryptic player IDs into actual names.

This builds directly on your first pybaseball pull. If you ran that one, the data we need is already cached, so this will be fast.

  1. Pull a week of batted balls

    We turn on caching first so we never re-download the same days, then grab a week of the 2023 season. Every row is one pitch; we only care about pitches that were put in play, which is why we drop the rows with no exit velocity.

    python
    import pybaseball as pyb
    pyb.cache.enable()
    
    data = pyb.statcast("2023-06-01", "2023-06-07")
    batted = data.dropna(subset=["launch_speed"]).copy()

    The column launch_speed is exit velocity in miles per hour. batted now holds only the swings that produced a measurable hit.

  2. Group by hitter and aggregate

    This is the heart of every leaderboard. groupby("batter") splits the data into one bucket per hitter; agg then computes several numbers for each bucket at once. We count the batted balls, take the average exit velocity, and grab each hitter's hardest-hit ball.

    python
    board = (batted.groupby("batter")
             .agg(bbe=("launch_speed", "size"),
                  avg_ev=("launch_speed", "mean"),
                  max_ev=("launch_speed", "max")))

    The tuple syntax - ("launch_speed", "mean") - reads as "make a column from launch_speed using mean." It's the cleanest way to build several summary columns in one pass.

  3. Require a real sample size

    Without a minimum, a hitter with one lucky 115 mph rocket would top the board. We require at least a dozen batted balls, then sort and keep the top 15.

    python
    MIN_BBE = 12
    board = (board.query("bbe >= @MIN_BBE")
             .sort_values("avg_ev", ascending=False)
             .head(15)
             .round(1))

    The @MIN_BBE inside query lets you reference a Python variable from the query string - handy when you want to tweak the threshold in one place.

  4. Turn player IDs into names

    Statcast identifies hitters by an MLBAM ID number, not a name. pybaseball ships a reverse lookup that translates a list of IDs in one call; we then merge the names back onto the leaderboard.

    python
    names = pyb.playerid_reverse_lookup(board.index.tolist(), key_type="mlbam")
    names["player"] = names["name_first"].str.title() + " " + names["name_last"].str.title()
    board = board.merge(names.set_index("key_mlbam")["player"],
                        left_index=True, right_index=True)

    The first call builds a lookup table the first time you run it, so it can take a few seconds; after that it's cached.

  5. Read the leaderboard

    Here's the finished table. These are real hitters and real exit velocities from that week - the kind of names you'd expect near the top, which is a good sign the pipeline is correct.

    The leaderboard
                   player  bbe  avg_ev  max_ev
    1      Josh Donaldson   15    93.5   111.0
    2   William Contreras   23    93.4   108.8
    3         Javier Báez   20    93.3   107.0
    4        Austin Riley   24    92.7   111.5
    5      Fernando Tatís   34    91.5   110.1
    6     Joshua Palacios   13    91.5   107.7
    7        Bryce Harper   30    91.1   110.7
    8         Lane Thomas   27    90.1   109.8
    9       Jace Peterson   19    90.0   107.1
    10     Kyle Schwarber   31    89.9   116.2
    11        Joey Wiemer   29    89.6   111.9
    12        Luis García   26    89.5   106.2
    13       Jose Trevino   12    89.5   103.4
    14          Juan Soto   28    89.4   110.0
    15        Trea Turner   40    89.2   111.7

    Notice the two stories in the numbers: avg_ev rewards consistent hard contact, while max_ev shows raw peak power. Kyle Schwarber's 116 mph max stands out even though his average sits mid-pack.

  6. Draw it as a chart

    A horizontal bar chart reads exactly like a leaderboard - longest bar on top. We sort ascending so matplotlib stacks the best hitter at the top, and label each bar with its value.

    python
    import matplotlib.pyplot as plt
    
    plot_df = board.sort_values("avg_ev")
    fig, ax = plt.subplots(figsize=(8.4, 6.2))
    bars = ax.barh(plot_df["player"], plot_df["avg_ev"], color="#B23A3A")
    ax.bar_label(bars, fmt="%.1f", padding=4)
    ax.set_xlim(plot_df["avg_ev"].min() - 4, plot_df["avg_ev"].max() + 3)
    ax.set_xlabel("average exit velocity (mph)")
    fig.savefig("leaderboard.png", dpi=144, bbox_inches="tight")
    Horizontal bar chart of MLB hitters ranked by average exit velocity, June 1-7 2023
    Data: Baseball Savant via pybaseball, retrieved June 2026

    Setting set_xlim to start near the lowest value zooms in on the part of the axis where the differences actually live - without it, every bar would look nearly identical.

Troubleshooting

The first pull is slow or shows a progress bar

That's normal. statcast() downloads one request per day, so a week is seven requests the first time. Because we called pyb.cache.enable(), the second run reads from disk and is nearly instant. If it feels stuck, give it 20-30 seconds before worrying.

KeyError: 'batter' or an empty leaderboard

This usually means the date range returned no games (an off-day, or a future date). Pick dates inside a regular season and check len(data) is non-zero before grouping.

SettingWithCopyWarning when you add a column

Pandas is warning that you might be editing a view of another DataFrame. We avoided it by calling .copy() right after dropna(). If you see it, add .copy() where you first slice the data.

Challenge yourself

Rank by max_ev instead of avg_ev and see how much the leaderboard reshuffles - peak power and consistent power are different skills. Then add a fourth column for the share of each hitter's batted balls that were "barrels" (95+ mph and a launch angle between 26° and 30°) and sort by that. Who climbs?

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

Download the finished script (07_build_a_statcast_exit_velocity_leaderboard.py)

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py.

More Baseball tutorials

Your first real Statcast pull, cached, with an exit-velocity histogram.
Baseball Beginner

Pull Your First MLB Data with pybaseball

Install pybaseball, turn on caching, and pull a week of real Statcast data. End with a histogram of batted-ball exit velocity so you can see the data is genuinely there.

~8 min
A pitch-location heatmap for one pitcher with the strike zone drawn on top.
Baseball Intermediate

Make a Pitch-Location Heatmap in Python

Use a single pitcher's Statcast data to build a 2-D location heatmap, draw the strike zone from the catcher's view, and read what the hot spots tell you.

~8 min