DE101-PD06 — Finding Insights with Pandas (From Data to Decisions)

~120 minutes

🎯 Goal of This Chapter

This chapter teaches how to think, not just how to code.

You will learn:

  • How to ask good questions
  • How to extract insights from data
  • How real companies analyze performance
  • How to communicate findings clearly

⚽ Mock Dataset — FIFA World Cup Player Stats

We will use a simplified dataset inspired by FIFA World Cup matches,
with focus on Lionel Messi.

import pandas as pd

data = {
    "player": ["Messi", "Messi", "Messi", "Mbappe", "Mbappe", "Neymar", "Modric", "Messi"],
    "team": ["Argentina", "Argentina", "Argentina", "France", "France", "Brazil", "Croatia", "Argentina"],
    "match": [1, 2, 3, 1, 2, 1, 1, 4],
    "goals": [1, 2, 1, 2, 1, 1, 0, 1],
    "assists": [0, 1, 0, 0, 1, 0, 1, 0],
    "shots": [4, 6, 5, 7, 5, 4, 2, 3],
    "minutes": [90, 90, 90, 90, 90, 90, 90, 90],
    "position": ["FW", "FW", "FW", "FW", "FW", "FW", "MF", "FW"]
}

df = pd.DataFrame(data)
df

🧠 20 Insight Questions (with Answers)


Q1️⃣ How many matches did each player play?

df.groupby("player")["match"].count()

📌 Insight: Player usage & reliability.


Q2️⃣ Total goals per player?

df.groupby("player")["goals"].sum().sort_values(ascending=False)

📌 Insight: Who is the main scorer?


Q3️⃣ Average goals per match?

df.groupby("player")["goals"].mean()

📌 Insight: Consistency vs volume.


Q4️⃣ Who contributed most (goals + assists)?

df.assign(contribution=df["goals"] + df["assists"]) \
  .groupby("player")["contribution"].sum()

📌 Insight: True attacking impact.


Q5️⃣ Messi’s total World Cup impact?

df[df["player"] == "Messi"][["goals", "assists"]].sum()

📌 Insight: Player-specific deep dive.


Q6️⃣ Shots-to-goals efficiency?

df.assign(efficiency=df["goals"] / df["shots"]) \
  .groupby("player")["efficiency"].mean()

📌 Insight: Finishing quality.


Q7️⃣ Which team scored the most goals?

df.groupby("team")["goals"].sum()

📌 Insight: Team dominance.


Q8️⃣ Messi’s share of Argentina goals?

total_arg_goals = df[df["team"] == "Argentina"]["goals"].sum()
messi_goals = df[df["player"] == "Messi"]["goals"].sum()

messi_goals / total_arg_goals

📌 Insight: Team dependency on star player.


Q9️⃣ Goals by position?

df.groupby("position")["goals"].sum()

📌 Insight: Tactical contribution.


Q1️⃣0️⃣ Average shots per match per player?

df.groupby("player")["shots"].mean()

📌 Insight: Offensive involvement.


Q1️⃣1️⃣ Which match had the most goals?

df.groupby("match")["goals"].sum().sort_values(ascending=False)

📌 Insight: High-impact games.


Q1️⃣2️⃣ Messi trend over matches?

df[df["player"] == "Messi"].sort_values("match")[["match", "goals"]]

📌 Insight: Performance trajectory.


Q1️⃣3️⃣ Player with best goals per 90 minutes?

df.assign(goals_per_90=df["goals"] / df["minutes"] * 90) \
  .groupby("player")["goals_per_90"].mean()

📌 Insight: Fair comparison.


Q1️⃣4️⃣ Assist leaders?

df.groupby("player")["assists"].sum().sort_values(ascending=False)

📌 Insight: Playmaking ability.


Q1️⃣5️⃣ Messi vs Mbappe comparison?

df[df["player"].isin(["Messi", "Mbappe"])] \
  .groupby("player")[["goals", "assists", "shots"]].mean()

📌 Insight: Superstar comparison.


Q1️⃣6️⃣ Which team relies most on one player?

team_totals = df.groupby("team")["goals"].sum()
player_totals = df.groupby(["team", "player"])["goals"].sum()

(player_totals / team_totals).sort_values(ascending=False)

📌 Insight: Risk concentration.


Q1️⃣7️⃣ Match-by-match contribution table?

df.assign(contribution=df["goals"] + df["assists"]) \
  .pivot_table(
      index="match",
      columns="player",
      values="contribution",
      aggfunc="sum"
  )

📌 Insight: Game-level impact.


Q1️⃣8️⃣ Who is most consistent scorer?

df.groupby("player")["goals"].std()

📌 Insight: Low variance = consistency.


Q1️⃣9️⃣ Identify underperformers (high shots, low goals)

df.assign(conversion=df["goals"] / df["shots"]) \
  .sort_values("conversion")

📌 Insight: Optimization opportunity.


Q2️⃣0️⃣ Executive Summary Table

df.groupby("player").agg(
    matches=("match", "count"),
    goals=("goals", "sum"),
    assists=("assists", "sum"),
    shots=("shots", "sum")
)

📌 Insight: One-table decision view.


🧠 How Companies Use This

  • Google → Experiment impact
  • Meta → Creator performance
  • OpenAI → Model evaluation metrics
  • Sports Analytics → Strategy decisions

Same logic, different domain.


🏁 Final Lesson

Insight is not about charts. Insight is about asking the right questions.

Pandas is your microscope.


🚀 Next Steps

  • Add visualization
  • Turn insights into dashboards
  • Feed features into ML models

You now think like a Data Engineer + Analyst + AI Researcher 🧠🔥

Previous