5 Questions Asked From The Kaggle IPL Data Set

Mandula Thrimanne
6 min readAug 21, 2021
Photo by Maninderjeet Singh Sidhu on Unsplash

Why are questions more important than answers? Among many answers to that question, one really stuck out for me: questions update answers whose quality can decay over time. In simpler terms, questions are those unsolicited updates that keep popping up on your computer when you just want to get on with your day. Questions are uncomfortable because they force us to look beyond what we already know and in some cases, completely dismiss what we thought we knew. If the Wright brothers didn’t ask themselves, “How do birds have such balance and control when they fly?”, they would have never developed the ‘Wing Warping’ concept which ultimately led them to build the first ever powered, sustained, and controlled airplane. I can most certainly say that most (if not all) inventions throughout history occurred simply because a human being was brave enough to scratch their head and ask why some things are the way they are and whether those things could be made better or more efficient.

The IPL Kaggle data set was one of those data sets which I always found very interesting without ever knowing where to start exploring. That’s when I decided to take a step back and look at the data for the absolute cricket fanatic I am, rather than a guy who wants to analyze numbers. So, I decided to ask myself a simple question “if you could ask any question from this data set, what would you like to ask?”. Well, if you are curious to know the questions that I came up with, below are some, along with the answers I got by utilizing the ‘groupby’ function in Python Pandas.

1. What are the most common ways of dismissal per each team?

As expected, the highest proportion of dismissals were accounted for being “out caught”. This is an unsurprising answer to this question in the context of T20 games since the batsmen are expected to score more runs in a shorter time frame which often leads to mistimed and unorthodox hitting which ultimately leads to more air time of the ball. But something that I found interesting in this answer was that more players got dismissed from run outs than getting out for leg before wicket. I believe this a trend we see in the shorter format of the game due to its high intensity and high pressure. Whereas in the longer format of the game, batsmen tend to play more defensive shots which creates an environment where there’s a higher chance of trapping yourself in front of the wicket than hitting the ball in the air.

2. Which team has scored the most number of boundaries?

Mumbai Indians, RCB, and Kings XI Punjab lead the pack with overall boundaries. But when you average boundaries across matches, all teams average around 14 fours and 5/6 sixes per match. CSK managed to squeeze in with the pack that led in the overall boundaries to join the club of teams with an average of 6 sixes per match. Looking at the data, it’s safe to say that in average, close to a 100 runs per match are accounted for boundaries.

3. What’s the impact of the toss on each team?

The purpose of asking this question was to find out how the match result is affected by the toss for each team. Assuming that all coins used for tosses were fair coins, we have the toss win % line right where we expect it to be: around the 0.5 range even though some (CSK and RR) have been luckier than others (SRH and RCB). To identify the teams that are most affected by the toss from the above visualization is to find the ones with the largest gap between the ‘toss win & match win %’ line and the ‘toss loss % & match loss %’ line. The team that pops out here is MI with a staggering ~0.2 (20%) between the two lines which indicates that MIs have a higher chance of winning a match when they win the toss and on the other hand, losing a match when they lose a toss.

4. What is the most common ball to go for a boundary or to get a wicket?

Ever heard of “Nelson” in cricket? If your team is at 111, everyone except the opposition wants you to drop one down and change sides just to take the score to 112, just to get pass that wicked 111. This is because 111 is considered as an unlucky number in cricket, and since “I am not superstitious, but a little ‘stitious”, I wanted to check whether there’s a particular ball in IPL that has a higher chance of going for a boundary or for getting out. It’s quite hard to explain on why the most common balls to go for a boundary are what they are, but the most common ball to get out being the last ball of the inning makes a whole lot of sense simply because of the low risk high reward scenario. Facing the last ball of an inning is like going all out on your limited internet connection on the last day of the month (assuming your quota gets renewed every month of course). Finish your internet or embarrassingly get out without scoring? Doesn’t make much of a difference.

5. Which team has won by the highest margins per game?

From the above visualized data, RCB seems to have nailed the “go big or go home” phrase when it comes to beating their opponents, even though they are yet to get their hands on the silverware. Trailing behind RCB, we have the two most successful franchises in the IPL history: CSK & MI with almost identical numbers for runs per match and a tie for wickets per match.

quick snapshot of the columns in the two data sets

The above five questions were some that I personally found curious to find answers. The IPL Kaggle data set consists of two files: a ball by ball data set which has data points for each ball bowled in the IPL from 2008 to 2020, and another data set on match details which reveals data points at a match level. Both these files contain many more interesting stories hidden inside them that are waiting to be revealed. If you also identify yourself as a person with a passion for story telling using data, while also being a total cricket fanatic, take a crack at it and give life to some interesting stories hidden in this data set.

--

--

Mandula Thrimanne

Data Analyst | Storyteller | "Best way of learning about anything is by doing"