7 Examples of Python Pandas groupby Function

A Real-Life Example with Retail Sales Transactions

Sponsored
AiNexaVerse NewsWeekly AI Tools in your email!

The groupby() function in Pandas is an essential tool for data analysis. It allows you to group data based on one or more columns and perform aggregate operations on each group. This post will explain the groupby() function in detail using a real-life example of retail sales transactions. We'll demonstrate seven examples of how the groupby() function can be used on this dataset.

 

Dataset Overview

 

Imagine we have a dataset containing sales transactions from a retail store. The dataset includes the following columns:

 

  • TransactionID: The unique identifier for each transaction.

  • Date: The date of the transaction.

  • Store: The store where the transaction took place.

  • Product: The product sold.

  • Quantity: The quantity of the product sold.

  • Price: The price of the product.

 

Here is the sample data for sales_data.csv:

 

TransactionID,Date,Store,Product,Quantity,Price

1,2024-01-01,Store A,Laptop,1,1000

2,2024-01-01,Store A,Tablet,2,500

3,2024-01-02,Store B,Laptop,3,1000

4,2024-01-02,Store B,Tablet,4,500

5,2024-01-03,Store A,Smartphone,5,700

6,2024-01-03,Store A,Tablet,6,500

7,2024-01-04,Store B,Smartphone,7,700

8,2024-01-04,Store B,Tablet,8,500

9,2024-01-05,Store A,Laptop,9,1000

10,2024-01-05,Store A,Smartphone,10,700

 

Let's load this data into a Pandas DataFrame and explore how to use the groupby() function.

 

import pandas as pd

 

# Load the data into a DataFrame

df = pd.read_csv('sales_data.csv')

print(df)

Example 1: Total Quantity Sold by Store

 

Let's calculate the total quantity of products sold by each store.

 

# Group by 'Store' and calculate the total quantity sold

total_quantity_by_store = df.groupby('Store')['Quantity'].sum()

print(total_quantity_by_store)

 

Explanation:

  • df.groupby('Store')['Quantity'].sum(): Groups the DataFrame by the Store column and calculates the sum of the Quantity column for each group.

 

Output:

 

Example 2: Total Sales by Store

 

Let's calculate the total sales (revenue) by each store. The total sales for each transaction can be calculated by multiplying the Quantity by the Price.

 

# Add a new column for total sales

df['TotalSales'] = df['Quantity'] * df['Price']

 

# Group by 'Store' and calculate the total sales

total_sales_by_store = df.groupby('Store')['TotalSales'].sum()

print(total_sales_by_store)

 

Explanation:

  • df['TotalSales'] = df['Quantity'] * df['Price']: Adds a new column TotalSales to the DataFrame by multiplying Quantity and Price.

  • df.groupby('Store')['TotalSales'].sum(): Groups the DataFrame by the Store column and calculates the sum of the TotalSales column for each group.

 

Output:

 

Example 3: Average Price by Product

 

Let's calculate the average price of each product.

 

# Group by 'Product' and calculate the average price

average_price_by_product = df.groupby('Product')['Price'].mean()

print(average_price_by_product)

 

Explanation:

  • df.groupby('Product')['Price'].mean(): Groups the DataFrame by the Product column and calculates the mean of the Price column for each group.

 

Output:

Example 4: Count of Transactions by Store

 

Let's count the number of transactions that took place in each store.

 

# Group by 'Store' and count the number of transactions

transactions_by_store = df.groupby('Store')['TransactionID'].count()

print(transactions_by_store)

 

Explanation:

  • df.groupby('Store')['TransactionID'].count(): Groups the DataFrame by the Store column and counts the number of TransactionID entries for each group.

 

Output:

 

Example 5: Total Quantity Sold by Store and Product

 

Let's calculate the total quantity of each product sold in each store.

 

# Group by 'Store' and 'Product', and calculate the total quantity sold

total_quantity_by_store_product = df.groupby(['Store', 'Product'])['Quantity'].sum()

print(total_quantity_by_store_product)

 

Explanation:

  • df.groupby(['Store', 'Product'])['Quantity'].sum(): Groups the DataFrame by the Store and Product columns and calculates the sum of the Quantity column for each group.

 

Output:

 Example 6: Total Sales by Date

 

Let's calculate the total sales (revenue) for each date.

 

# Add a new column for total sales

df['TotalSales'] = df['Quantity'] * df['Price']

# Group by 'Date' and calculate the total sales

total_sales_by_date = df.groupby('Date')['TotalSales'].sum()

print(total_sales_by_date)

 

Explanation:

  • df.groupby('Date')['TotalSales'].sum(): Groups the DataFrame by the Date column and calculates the sum of the TotalSales column for each group.

 

Output:

 

Example 7: Average Quantity Sold by Product and Store

 

Let's calculate the average quantity sold per transaction for each product in each store.

 

# Group by 'Product' and 'Store', and calculate the average quantity sold per transaction

average_quantity_by_product_store = df.groupby(['Product', 'Store'])['Quantity'].mean()

print(average_quantity_by_product_store)

 

Explanation:

  • df.groupby(['Product', 'Store'])['Quantity'].mean(): Groups the DataFrame by the Product and Store columns and calculates the mean of the Quantity column for each group.

 

Output:

Summary

The groupby() function in Pandas is a powerful tool for summarising and analyzing data by grouping it based on one or more columns and performing aggregate operations on each group. In this post, we demonstrated seven examples of how to use the groupby() function on a dataset of retail sales transactions. These examples included calculating total quantities sold, total sales, average prices, and counts of transactions, among others.

 

By mastering the groupby() function, you can effectively analyze and gain insights from your data, making it an essential skill for any data analyst.

 

Ready for More Python Fun?

Subscribe to our newsletter now and get a free Python cheat sheet! Dive deeper into Python programming with more exciting projects and tutorials designed just for beginners.

Keep exploring, keep coding, and enjoy your journey into data analytics with Pandas