- CodeCraft by Dr. Christine Lee
- Posts
- 7 Real-Life Examples of Python's pivot_table
7 Real-Life Examples of Python's pivot_table
A Powerful Tool for Summarising and Aggregating Data
|
The pivot_table()
function in Pandas is a powerful tool for summarising and aggregating data. It allows you to create pivot tables, similar to those in spreadsheet software, to analyze your data efficiently. In this post, we'll explore seven real-life examples of how to use the pivot_table()
function, with detailed explanations for each code snippet.
What is a Pivot Table?
A pivot table is a data summarisation tool commonly used in data analysis and reporting. It allows you to reorganize and summarise selected columns and rows of data to obtain a desired report. Pivot tables are especially useful for exploring and analyzing large datasets, enabling you to see patterns and trends at a glance.
Understanding the pivot_table()
Function
The pivot_table()
function in Pandas has several parameters that you can use to customise your pivot table. Here are some of the key parameters:
data
: The DataFrame to use for creating the pivot table.values
: The column(s) to aggregate.index
: The column(s) to group by in the rows.columns
: The column(s) to group by in the columns.aggfunc
: The aggregate function to apply to the grouped data (e.g.,sum
,mean
,count
).
Dataset Overview
We'll use a sample dataset containing sales transactions from a retail store. The dataset includes the following columns:
TransactionID
: The unique identifier for each transaction.Date
: The date of the transaction.Store
: The store where the transaction took place.Product
: The product sold.Quantity
: The quantity of the product sold.Price
: The price of the product.
Here is the sample data for sales_data.csv
:
TransactionID,Date,Store,Product,Quantity,Price
1,2024-01-01,Store A,Laptop,1,1000
2,2024-01-01,Store A,Tablet,2,500
3,2024-01-02,Store B,Laptop,3,1000
4,2024-01-02,Store B,Tablet,4,500
5,2024-01-03,Store A,Smartphone,5,700
6,2024-01-03,Store A,Tablet,6,500
7,2024-01-04,Store B,Smartphone,7,700
8,2024-01-04,Store B,Tablet,8,500
9,2024-01-05,Store A,Laptop,9,1000
10,2024-01-05,Store A,Smartphone,10,700
Let's load this data into a Pandas DataFrame and explore how to use the pivot_table()
function.
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('sales_data.csv')
print(df)
Example 1: Total Quantity Sold by Store and Product
Let's create a pivot table to calculate the total quantity of each product sold in each store.
# Create a pivot table for total quantity sold by store and product
pivot_quantity = pd.pivot_table(df, values='Quantity', index='Store', columns='Product', aggfunc='sum')
print(pivot_quantity)
Explanation:
values='Quantity'
: We're interested in theQuantity
column.index='Store'
: We're grouping by theStore
column in the rows.columns='Product'
: We're grouping by theProduct
column in the columns.aggfunc='sum'
: We're summing the quantities.
Output:
Example 2: Average Price by Store and Product
Let's create a pivot table to calculate the average price of each product in each store.
# Create a pivot table for average price by store and product
pivot_price = pd.pivot_table(df, values='Price', index='Store', columns='Product', aggfunc='mean')
print(pivot_price)
Explanation:
values='Price'
: We're interested in thePrice
column.index='Store'
: We're grouping by theStore
column in the rows.columns='Product'
: We're grouping by theProduct
column in the columns.aggfunc='mean'
: We're calculating the mean price.
Output:
Example 3: Total Sales by Store and Product
Let's create a pivot table to calculate the total sales (revenue) for each product in each store. The total sales for each transaction can be calculated by multiplying the Quantity
by the Price
.
# Add a new column for total sales
df['TotalSales'] = df['Quantity'] * df['Price']
# Create a pivot table for total sales by store and product
pivot_sales = pd.pivot_table(df, values='TotalSales', index='Store', columns='Product', aggfunc='sum')
print(pivot_sales)
Explanation:
df['TotalSales'] = df['Quantity'] * df['Price']
: Adds a new columnTotalSales
to the DataFrame by multiplyingQuantity
andPrice
.values='TotalSales'
: We're interested in theTotalSales
column.index='Store'
: We're grouping by theStore
column in the rows.columns='Product'
: We're grouping by theProduct
column in the columns.aggfunc='sum'
: We're summing the total sales.
Output:
Example 4: Count of Transactions by Store and Product
Let's create a pivot table to count the number of transactions for each product in each store.
# Create a pivot table for the count of transactions by store and product
pivot_count = pd.pivot_table(df, values='TransactionID', index='Store', columns='Product', aggfunc='count')
print(pivot_count)
Explanation:
values='TransactionID'
: We're counting theTransactionID
column.index='Store'
: We're grouping by theStore
column in the rows.columns='Product'
: We're grouping by theProduct
column in the columns.aggfunc='count'
: We're counting the number of transactions.
Output:
Example 5: Maximum Quantity Sold by Store and Product
Let's create a pivot table to find the maximum quantity sold for each product in each store.
# Create a pivot table for the maximum quantity sold by store and product
pivot_max_quantity = pd.pivot_table(df, values='Quantity', index='Store', columns='Product', aggfunc='max')
print(pivot_max_quantity)
Explanation:
values='Quantity'
: We're interested in theQuantity
column.index='Store'
: We're grouping by theStore
column in the rows.columns='Product'
: We're grouping by theProduct
column in the columns.aggfunc='max'
: We're finding the maximum quantity sold.
Output:
Example 6: Minimum Price by Store and Product
Let's create a pivot table to find the minimum price of each product in each store.
# Create a pivot table for the minimum price by store and product
pivot_min_price = pd.pivot_table(df, values='Price', index='Store', columns='Product', aggfunc='min')
print(pivot_min_price)
Explanation:
values='Price'
: We're interested in thePrice
column.index='Store'
: We're grouping by theStore
column in the rows.columns='Product'
: We're grouping by theProduct
column in the columns.aggfunc='min'
: We're finding the minimum price.
Output:
Example 7: Average Total Sales by Store and Product
Let's create a pivot table to calculate the average total sales for each product in each store.
# Add a new column for total sales
df['TotalSales'] = df['Quantity'] * df['Price']
# Create a pivot table for the average total sales by store and product
pivot_avg_sales = pd.pivot_table(df, values='TotalSales', index='Store', columns='Product', aggfunc='mean')
print(pivot_avg_sales)
Explanation:
values='TotalSales'
: We're interested in theTotalSales
column.index='Store'
: We're grouping by theStore
column in the rows.columns='Product'
: We're grouping by theProduct
column in the columns.aggfunc='mean'
: We're calculating the mean of total sales.
Output:
Summary
The pivot_table()
function in Pandas is a versatile tool for summarising and aggregating data. In this post, we demonstrated seven real-life examples of how to use the pivot_table()
function on a dataset of retail sales transactions. These examples included calculating total quantities sold, average prices, total sales, count of transactions, maximum and minimum quantities, and average total sales, among others.
By mastering the pivot_table()
function, you can effectively analyze and gain insights from your data, making it an essential skill for any data analyst.
Ready for More Python Fun?
Subscribe to our newsletter now and get a free Python cheat sheet! Dive deeper into Python programming with more exciting projects and tutorials designed just for beginners.
Keep exploring, keep coding, and enjoy your journey into data analytics with Pandas!