Apply and Map: Custom Column Logic in pandas

FoundationsBeginnerPython~5 min read

What you'll build

Win totals bucketed into tiers with .apply(), counted, and charted.

Win totals bucketed into tiers with .apply(), counted, and charted.
Data: Bundled sample (2023 MLB standings), retrieved June 2026

Arithmetic on columns gets you per-game rates and differentials, but a lot of real analysis needs something arithmetic can't do: turn a number into a category, or a code into a label. "Is this an elite team or a rebuild?" "Is AL or American League?" The two tools for this are .apply() — run any function you write on every value — and .map() — translate values through a dictionary. Get the difference between them straight and you'll reach for the right one without thinking. To make it stick, I'll sort all 30 teams into win tiers and relabel their leagues, then chart how many teams land in each tier.

This builds on Computing New Columns. The data is the bundled sample_standings.csv (real 2023 MLB standings), so it runs offline.

  1. .apply() runs your function on every row

    When the logic is more than a formula — an if/elif ladder, say — write a function and hand it to .apply(). pandas calls it once per value and collects the results into a new column.

    python
    import pandas as pd
    
    df = pd.read_csv("sample_standings.csv")
    
    def win_tier(w):
        if w >= 95: return "Elite (95+)"
        if w >= 85: return "Contender (85-94)"
        if w >= 75: return "Above .500 (75-84)"
        if w >= 65: return "Below .500 (65-74)"
        return "Rebuilding (<65)"
    
    df["Tier"] = df["W"].apply(win_tier)

    Anything a Python function can decide, .apply() can turn into a column — thresholds, string parsing, lookups, you name it.

  2. .map() translates values through a dictionary

    When you just need to swap each value for another, a dict and .map() are cleaner than a function.

    python
    df["Lg"] = df["League"].map({"AL": "American", "NL": "National"})
    print(df.sort_values("W", ascending=False)[["Team", "W", "Lg", "Tier"]].head(8).to_string())
    New Tier and Lg columns, best teams first
           Team    W        Lg               Tier
    0    Braves  104  National        Elite (95+)
    1   Orioles  101  American        Elite (95+)
    2   Dodgers  100  National        Elite (95+)
    3      Rays   99  American        Elite (95+)
    4   Brewers   92  National  Contender (85-94)
    5    Astros   90  American  Contender (85-94)
    6  Phillies   90  National  Contender (85-94)
    7   Rangers   90  American  Contender (85-94)
    
    Teams per tier:
    Tier
    Above .500 (75-84)    12
    Contender (85-94)      7
    Elite (95+)            4
    Rebuilding (<65)       4
    Below .500 (65-74)     3

    The 2023 Braves (104 wins), Orioles, Dodgers, and Rays all land in "Elite," with their leagues spelled out. One caution with .map(): any value not in your dictionary becomes NaN, so a typo'd key silently blanks the cell — useful to know, easy to get burned by.

  3. Count the categories and chart them

    Once a column is categorical, value_counts() tallies it, and a bar chart shows the league's shape.

    python
    import matplotlib.pyplot as plt
    
    order = ["Elite (95+)", "Contender (85-94)", "Above .500 (75-84)",
             "Below .500 (65-74)", "Rebuilding (<65)"]
    counts = df["Tier"].value_counts().reindex(order)
    fig, ax = plt.subplots(figsize=(8, 5))
    bars = ax.bar(range(len(counts)), counts.values)
    ax.set_xticks(range(len(counts))); ax.set_xticklabels(counts.index, rotation=20, ha="right")
    ax.bar_label(bars)
    fig.savefig("tier_counts.png", dpi=144, bbox_inches="tight")
    Bar chart of the number of MLB teams in each win tier in 2023, with the middle 'Above .500' tier tallest and the extremes smaller
    Data: Bundled sample (2023 MLB standings), retrieved June 2026

    The league bunches in the middle — a dozen teams in the "Above .500" band — thinning out toward the elite and the rebuilding extremes. Note the reindex(order): value_counts() sorts by frequency, but we want the tiers in their natural order, so we reindex to impose it.

Troubleshooting

.apply() is slow on a big dataset

.apply() runs Python once per row, which is far slower than vectorized math. On 30 teams it's instant; on millions of rows, prefer a vectorized alternative like pd.cut() for binning or np.select() for multi-condition logic. Reach for .apply() when the logic genuinely can't be vectorized.

My mapped column is full of NaN

.map() returns NaN for any value missing from the dictionary. Check your keys match the data exactly (case and whitespace included). To leave unmatched values unchanged instead, use .replace() rather than .map().

Should I use a lambda or a named function?

Either works in .apply(). A one-liner like .apply(lambda w: "win" if w > 81 else "loss") is fine; a multi-line if/elif ladder reads better as a named function, as above.

Challenge yourself

Re-create the Tier column with pd.cut(df["W"], bins=[0,65,75,85,95,200], labels=[...]) and confirm it matches your .apply() version — a good lesson in when the vectorized tool is the better tool. Then .map() each division to its league or region and count teams per group.

Get the code

Here's the complete, working script for this tutorial. It runs exactly as shown.

Download the finished script (50_apply_and_map_custom_columns.py)

This script imports a small shared helper (and reads any bundled sample data) that live next to it in /downloads/ — grab these into the same folder so it runs as-is: sdt_common.py.

More Foundations tutorials