Pull Your First Match Data with StatsBomb Open Data

SoccerBeginnerPython~7 min read

What you'll build

Event data for a real match, summarized into a shots-and-events breakdown.

Event data for a real match, summarized into a shots-and-events breakdown.
Data: StatsBomb Open Data, retrieved June 2026

Most soccer data online stops where it gets interesting: who scored, who won, final whistle. StatsBomb hands you the opposite - a record of every single thing that happened on the pitch, pass by pass, tackle by tackle, shot by shot, each one stamped with a location and a time. And they give a generous chunk of it away for free. We'll install statsbombpy, poke through the free matches on offer, and pick one of the most famous games ever played - the 2018 World Cup final - to load in full. The thing I want to land for you is the idea the whole soccer side of this site stands on: in event data, every row is one event. Get comfortable with that and pass maps, shot maps, and xG tables all stop being mysterious.

This is the gateway to the soccer side of this site. Once you can load events, you can build pass maps, shot maps, and xG tables - all of which start exactly here. If you're newer to pandas, the twelve core pandas operations tutorial will make every step below feel familiar.

One thing up front, and it's not optional: StatsBomb releases this data under a free license that requires attribution. Any chart or table you publish from it must credit StatsBomb. We'll bake that credit into our chart at the end, and I'll remind you again there - please don't skip it.

  1. Install statsbombpy and import it

    StatsBomb's open data lives in a public GitHub repository as a pile of JSON files. You could download and parse them by hand, but the statsbombpy package does all of that for you and hands back tidy pandas DataFrames. Install it once from your terminal:

    python
    pip install statsbombpy

    Then import the sb object - that's your entry point to everything - alongside matplotlib for the chart we'll draw later.

    python
    import matplotlib.pyplot as plt
    from statsbombpy import sb

    The first time you call any sb function it reaches out to GitHub over the network, so you'll want to be online. Results are cached after that.

  2. See what's available

    Before you can load a match you need to know what StatsBomb has released. The competitions() call returns one row per competition-season - a particular league or tournament in a particular year. There are dozens of them, from the men's World Cup to the Women's Super League, and the list grows over time.

    python
    comps = sb.competitions()
    cols = ["competition_name", "season_name", "competition_id", "season_id"]
    print(f"{len(comps)} competition-seasons available. A few:")
    print(comps[cols].drop_duplicates("competition_name").head(8))
    A sample of free competitions
    80 competition-seasons available. A few:
               competition_name season_name  competition_id  season_id
    0             1. Bundesliga   2023/2024               9        281
    2    African Cup of Nations        2023            1267        107
    3          Champions League   2018/2019              16          4
    21             Copa America        2024             223        282
    22             Copa del Rey   1983/1984              87         84
    25  FA Women's Super League   2023/2024              37        281
    29       FIFA U20 World Cup        1979            1470        274
    30           FIFA World Cup        2022              43        106

    The two columns that matter most are competition_id and season_id. Those number pairs are how you ask for a specific tournament. Notice the bottom row: the FIFA World Cup carries competition_id 43. That's the one we want - we just need the right season alongside it.

  3. Pick the 2018 World Cup final

    The 2018 men's World Cup is competition 43, season 3. We pass that pair to matches() to get every game in the tournament, then filter down to the single match whose competition_stage is the Final. Using .iloc[0] grabs that one row as a pandas Series so we can read individual fields off it.

    python
    matches = sb.matches(competition_id=43, season_id=3)
    final = matches[matches["competition_stage"] == "Final"].iloc[0]
    match_id = int(final["match_id"])
    print(f"The final: {final['home_team']} {final['home_score']}"
          f"-{final['away_score']} {final['away_team']} (match_id {match_id})")
    Output
    The final: France 4-2 Croatia (match_id 8658)

    There it is: France 4, Croatia 2. The match_id printed alongside - 8658 - is the key we'll use to pull the event data. Every match in the StatsBomb universe has a unique id like this, and once you have it, the door is open.

  4. Load every event

    This is the moment the data gets interesting. One call to sb.events(match_id) returns a DataFrame where each row is a single event - one pass, one shot, one tackle, one throw-in. A full match has thousands of them. Let's load the final and peek at a few, keeping only the columns that read cleanly and dropping rows with no associated player (those are administrative events like the half starting).

    python
    events = sb.events(match_id)
    print("Events in the match:", len(events))
    print(events[["minute", "team", "type", "player"]].dropna(subset=["player"]).head(8))
    The first events of the match
    Events in the match: 2978
        minute     team  type            player
    6        0  Croatia  Pass   Mario Mandžukić
    7        0  Croatia  Pass  Marcelo Brozović
    8        0  Croatia  Pass       Luka Modrić
    9        0  Croatia  Pass     Šime Vrsaljko
    10       0  Croatia  Pass   Danijel Subašić
    11       0   France  Pass    Raphaël Varane
    12       0  Croatia  Pass      Ivan Perišić
    13       0  Croatia  Pass      Ivan Rakitić

    Read those rows like a stenographer's transcript of the opening seconds. Mario Mandžukić plays a pass, then Marcelo Brozović, then Luka Modrić - Croatia is knocking the ball around in the first minute, and the data captured every touch. That mental model is the whole game: an event DataFrame is a chronological list of everything that happened, and the type column tells you what each thing was.

  5. Summarize the event types

    Nearly 3,000 events is a lot to stare at. The fastest way to understand the shape of a match is to ask: how many of each type of event were there? The value_counts() method does exactly that - it tallies how often each distinct value appears in the type column. We keep the twelve most common and sort them so the chart reads cleanly.

    python
    counts = events["type"].value_counts().head(12).sort_values()

    It will be no surprise that passes dominate - soccer is mostly passing - followed by ball receipts, pressures, and carries. Shots, the events everyone remembers, are rare by comparison. Let's turn that tally into a horizontal bar chart so the proportions are obvious at a glance.

    python
    fig, ax = plt.subplots(figsize=(8, 5.4))
    ax.barh(counts.index, counts.values, color="#3A7D44")
    ax.bar_label(ax.containers[0], padding=3, fontsize=9)
    ax.set_title("Event types in France v Croatia, 2018 WC final")
    ax.set_xlabel("number of events")
    fig.savefig("events_by_type.png", dpi=144, bbox_inches="tight")
    Horizontal bar chart of event types in the 2018 World Cup final, with passes far and away the most common
    Data: StatsBomb Open Data, retrieved June 2026

    And here is the attribution payoff: because we built a published chart from StatsBomb data, the figure credits StatsBomb Open Data in its caption. Whenever you share a chart made from this source - a blog post, a tweet, a slide - that credit has to ride along. It's the price of admission, and it's a fair one for data this good.

Troubleshooting

ModuleNotFoundError: No module named 'statsbombpy'

The package isn't installed in the environment you're running. Run pip install statsbombpy in the same terminal you launch Python from. If you use virtual environments, make sure it's activated first - installing into the wrong environment is the most common cause of this error.

A ConnectionError or timeout on the first call

The first sb call downloads from GitHub, so you need a live internet connection. A corporate proxy or firewall can also block it. Try again on an unrestricted network, and confirm you can reach github.com in a browser.

An IndexError on .iloc[0]

This means the filter returned no rows - usually a typo in the stage name. The value must match exactly, including capitalization: it's "Final", not "final". Print matches["competition_stage"].unique() to see the exact strings available for that tournament.

Challenge yourself

Swap in a different match and re-run everything. Pick any competition from the list in step 2, find a match id with sb.matches(), and load its events - no other code needs to change, which is the whole point of a clean pipeline. Then go one level deeper: filter the events to just type == "Shot" and print which players took them. That short list is the seed of a shot map, which is exactly where the pass map tutorial picks up.

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

Download the finished script (12_pull_your_first_match_data_with_statsbomb.py)

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py.

More Soccer tutorials

A team's completed passes drawn as arrows on a proper pitch with mplsoccer.
Soccer Intermediate

Draw a Pass Map with mplsoccer

Filter a match's passes from StatsBomb event data and draw them as arrows on a correctly-proportioned pitch using mplsoccer, with StatsBomb attribution.

~7 min
Both teams' shots on a pitch, sized by xG and marked for goals.
Soccer Intermediate

Build a Match Shot Map with Expected Goals

Plot every shot from a real match on a pitch with mplsoccer, sizing each by its expected-goals value and highlighting goals - the single most useful soccer viz there is.

~8 min