Why this site exists
Writes and runs every tutorial here.
I'm C. B. Zakarian, and I write SportsDataTutorials. There's no team behind the byline - it's me, my editor, and a folder full of Python scripts that I actually run before anything goes on the page.
I started this site because of how badly I floundered when I first tried to learn this stuff. I'd find a tutorial, copy the code, and it would die on line three with a version error nobody warned me about - or worse, it would "work" but quietly print numbers that couldn't possibly be right, and I had no way to tell. So the rule here is simple: nothing is hand-waved. Every line of code on every page was run against a real data source, and every chart and table you see is the genuine output of that run. If a result looks weird, it's because the data is weird, not because I faked a clean answer.
That honesty cuts both ways. When the NBA's stats API blocks my build machine - which it does, because it doesn't like requests coming from data centers - I don't paper over it. I tell you exactly what happens, show the fallback I use, and explain how the same script pulls live data when you run it from your own laptop. I'd rather lose a little polish than pretend something works when it doesn't.
How I think about teaching this
I don't believe you learn analytics from a textbook or a 40-hour course. You learn it by shipping small, real things: pull a week of data, make one honest chart, notice three things you didn't expect, then do it again on a question you actually care about. So every tutorial here is built around one finished artifact - a leaderboard, a shot chart, a model - that you'll have working by the end. I try to write the guide I wish someone had handed me: opinions included, the gotchas called out before they bite you, and no pretending the messy parts don't exist.
Where the data comes from
Everything here runs on free, well-respected public sources:
- Baseball – Statcast via
pybaseball(Baseball Savant), with caching turned on so I'm a good guest. - Basketball – the NBA's stats API via
nba_api, with Basketball-Reference as a documented fallback. - Soccer – StatsBomb Open Data (attribution required, and given) and Understat's expected-goals data.
- American football – nflverse play-by-play.
- Hockey – the NHL's public API.
In the example code I cache requests, send polite headers, and keep request rates reasonable - partly so you pick up good habits, and partly because these sources are free and I'd like them to stay that way for the next person.
Found a mistake?
Tell me. I'd genuinely rather hear it than not. Email me at contact@sportsdatatutorials.com and I'll fix it.