Your learning path
You don't need a stats degree or a finance job - just curiosity and an hour here and there. Here's the order I'd learn things in, with honest time estimates.
The foundations, in order
Do these five first, in this order. They're sport-agnostic and everything else builds on them.
Install your tools
Set up Python, a code editor, and the core libraries. About 30 minutes, once.
Beginner · ~8 min
Learn the 12 pandas moves
The handful of operations that show up in every project. An afternoon.
Beginner · ~8 min
Make your first chart
Turn a table into a clean, labeled figure with matplotlib. An hour.
Beginner · ~6 min
Read an API's docs
Learn to pull live data by example, using the public NHL API. An hour.
Beginner · ~8 min
Clean a messy file
The unglamorous skill that makes everything else possible. An hour or two.
Intermediate · ~8 min
Then pick a sport
Once the foundations feel comfortable, choose whichever sport you actually care about - you'll learn faster on data you enjoy. Each sport's hub lists its tutorials from beginner to advanced.
- Baseball – start with Pull Your First MLB Data with pybaseball
- Basketball – start with Pull Your First NBA Data with nba_api
- Soccer – start with Pull Your First Match Data with StatsBomb Open Data
- Football – start with Pull Your First NFL Data with nfl_data_py
- Hockey – start with Pull Your First NHL Data from the Public NHL API
Finish with the capstone
When you've done a few sports, tackle the capstone, Same Question, Five Sports: Quantifying Home Advantage Across Leagues. It reuses everything - APIs, cleaning, aggregation, plotting - to answer one question across five leagues.
What you'll need installed
- Python 3.10 or newer – the language everything is written in.
- A code editor – VS Code is free and excellent.
- Spreadsheet software (optional) – handy for eyeballing CSVs; Excel, Numbers, or free LibreOffice all work.
The very first tutorial walks you through all of it. See you there.