- CodeCraft by Dr. Christine Lee
- Posts
- 7 Examples of Python Pandas groupby Function
7 Examples of Python Pandas groupby Function
A Real-Life Example with Retail Sales Transactions
|
The groupby()
function in Pandas is an essential tool for data analysis. It allows you to group data based on one or more columns and perform aggregate operations on each group. This post will explain the groupby()
function in detail using a real-life example of retail sales transactions. We'll demonstrate seven examples of how the groupby()
function can be used on this dataset.
Dataset Overview
Imagine we have a dataset containing sales transactions from a retail store. The dataset includes the following columns:
TransactionID
: The unique identifier for each transaction.Date
: The date of the transaction.Store
: The store where the transaction took place.Product
: The product sold.Quantity
: The quantity of the product sold.Price
: The price of the product.
Here is the sample data for sales_data.csv
:
TransactionID,Date,Store,Product,Quantity,Price
1,2024-01-01,Store A,Laptop,1,1000
2,2024-01-01,Store A,Tablet,2,500
3,2024-01-02,Store B,Laptop,3,1000
4,2024-01-02,Store B,Tablet,4,500
5,2024-01-03,Store A,Smartphone,5,700
6,2024-01-03,Store A,Tablet,6,500
7,2024-01-04,Store B,Smartphone,7,700
8,2024-01-04,Store B,Tablet,8,500
9,2024-01-05,Store A,Laptop,9,1000
10,2024-01-05,Store A,Smartphone,10,700
Let's load this data into a Pandas DataFrame and explore how to use the groupby()
function.
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('sales_data.csv')
print(df)
Example 1: Total Quantity Sold by Store
Let's calculate the total quantity of products sold by each store.
# Group by 'Store' and calculate the total quantity sold
total_quantity_by_store = df.groupby('Store')['Quantity'].sum()
print(total_quantity_by_store)
Explanation:
df.groupby('Store')['Quantity'].sum(
): Groups the DataFrame by theStore
column and calculates the sum of theQuantity
column for each group.
Output:
Example 2: Total Sales by Store
Let's calculate the total sales (revenue) by each store. The total sales for each transaction can be calculated by multiplying the Quantity
by the Price
.
# Add a new column for total sales
df['TotalSales'] = df['Quantity'] * df['Price']
# Group by 'Store' and calculate the total sales
total_sales_by_store = df.groupby('Store')['TotalSales'].sum()
print(total_sales_by_store)
Explanation:
df['TotalSales'] = df['Quantity'] * df['Price']
: Adds a new columnTotalSales
to the DataFrame by multiplyingQuantity
andPrice
.df.groupby('Store')['TotalSales'].sum()
: Groups the DataFrame by theStore
column and calculates the sum of theTotalSales
column for each group.
Output:
Example 3: Average Price by Product
Let's calculate the average price of each product.
# Group by 'Product' and calculate the average price
average_price_by_product = df.groupby('Product')['Price'].mean()
print(average_price_by_product)
Explanation:
df.groupby('Product')['Price'].mean()
: Groups the DataFrame by theProduct
column and calculates the mean of thePrice
column for each group.
Output:
Example 4: Count of Transactions by Store
Let's count the number of transactions that took place in each store.
# Group by 'Store' and count the number of transactions
transactions_by_store = df.groupby('Store')['TransactionID'].count()
print(transactions_by_store)
Explanation:
df.groupby('Store')['TransactionID'].count(
): Groups the DataFrame by theStore
column and counts the number ofTransactionID
entries for each group.
Output:
Example 5: Total Quantity Sold by Store and Product
Let's calculate the total quantity of each product sold in each store.
# Group by 'Store' and 'Product', and calculate the total quantity sold
total_quantity_by_store_product = df.groupby(['Store', 'Product'])['Quantity'].sum()
print(total_quantity_by_store_product)
Explanation:
df.groupby(['Store', 'Product'])['Quantity'].sum(
): Groups the DataFrame by theStore
andProduct
columns and calculates the sum of theQuantity
column for each group.
Output:
Example 6: Total Sales by Date
Let's calculate the total sales (revenue) for each date.
# Add a new column for total sales
df['TotalSales'] = df['Quantity'] * df['Price']
# Group by 'Date' and calculate the total sales
total_sales_by_date = df.groupby('Date')['TotalSales'].sum()
print(total_sales_by_date)
Explanation:
df.groupby('Date')['TotalSales'].sum()
: Groups the DataFrame by theDate
column and calculates the sum of theTotalSales
column for each group.
Output:
Example 7: Average Quantity Sold by Product and Store
Let's calculate the average quantity sold per transaction for each product in each store.
# Group by 'Product' and 'Store', and calculate the average quantity sold per transaction
average_quantity_by_product_store = df.groupby(['Product', 'Store'])['Quantity'].mean()
print(average_quantity_by_product_store)
Explanation:
df.groupby(['Product', 'Store'])['Quantity'].mean()
: Groups the DataFrame by theProduct
andStore
columns and calculates the mean of theQuantity
column for each group.
Output:
Summary
The groupby()
function in Pandas is a powerful tool for summarising and analyzing data by grouping it based on one or more columns and performing aggregate operations on each group. In this post, we demonstrated seven examples of how to use the groupby()
function on a dataset of retail sales transactions. These examples included calculating total quantities sold, total sales, average prices, and counts of transactions, among others.
By mastering the groupby()
function, you can effectively analyze and gain insights from your data, making it an essential skill for any data analyst.
Ready for More Python Fun?
Subscribe to our newsletter now and get a free Python cheat sheet! Dive deeper into Python programming with more exciting projects and tutorials designed just for beginners.
Keep exploring, keep coding, and enjoy your journey into data analytics with Pandas