- CodeCraft by Dr. Christine Lee
- Posts
- 7 Real-Life Examples with Python's crosstab
7 Real-Life Examples with Python's crosstab
Powerful Tool for Summarising Data and Computing Cross-tabulations
|
The crosstab()
function in Pandas is a powerful tool for summarising data and computing cross-tabulations. It allows you to calculate the frequency distribution of variables and perform other summary statistics. In this post, we'll explore seven real-life examples of how to use the crosstab()
function, with detailed explanations for each code snippet.
What is a Crosstab?
A crosstab, or cross-tabulation, is a method to quantitatively analyze the relationship between multiple variables. It displays the frequency distribution of variables in a tabular format, making it easier to observe patterns and correlations in your data.
Understanding the crosstab()
Function
The crosstab()
function in Pandas has several parameters that you can use to customise your cross-tabulation. Here are some of the key parameters:
index
: The column(s) to group by in the rows.columns
: The column(s) to group by in the columns.values
: The column to aggregate.aggfunc
: The aggregate function to apply to the grouped data (e.g.,sum
,mean
,count
).margins
: Add row/column margins (subtotals).
Dataset Overview
We'll use a sample dataset containing sales transactions from a retail store. The dataset includes the following columns:
TransactionID
: The unique identifier for each transaction.Date
: The date of the transaction.Store
: The store where the transaction took place.Product
: The product sold.Quantity
: The quantity of the product sold.Price
: The price of the product.
Here is the sample data for sales_data.csv
:
TransactionID,Date,Store,Product,Quantity,Price
1,2024-01-01,Store A,Laptop,1,1000
2,2024-01-01,Store A,Tablet,2,500
3,2024-01-02,Store B,Laptop,3,1000
4,2024-01-02,Store B,Tablet,4,500
5,2024-01-03,Store A,Smartphone,5,700
6,2024-01-03,Store A,Tablet,6,500
7,2024-01-04,Store B,Smartphone,7,700
8,2024-01-04,Store B,Tablet,8,500
9,2024-01-05,Store A,Laptop,9,1000
10,2024-01-05,Store A,Smartphone,10,700
Let's load this data into a Pandas DataFrame and explore how to use the crosstab()
function.
import pandas as pd
# Load the data into a DataFrame
df = pd.read_csv('sales_data.csv')
print(df)
Example 1: Frequency of Products Sold by Store
Let's create a crosstab to calculate the frequency of each product sold in each store.
# Create a crosstab for frequency of products sold by store
crosstab_frequency = pd.crosstab(df['Store'], df['Product'])
print(crosstab_frequency)
Explanation:
pd.crosstab(df['Store'], df['Product'])
: Creates a crosstab withStore
as the rows andProduct
as the columns, showing the frequency of each product sold in each store.
Output:
Example 2: Total Quantity Sold by Store and Product
Let's create a crosstab to calculate the total quantity of each product sold in each store.
# Create a crosstab for total quantity sold by store and product
crosstab_quantity = pd.crosstab(df['Store'], df['Product'], values=df['Quantity'], aggfunc='sum')
print(crosstab_quantity)
Explanation:
values=df['Quantity']
: We're interested in the Quantity column.aggfunc='sum'
: We're summing the quantities.
Output:
Example 3: Total Sales by Store and Product
Let's create a crosstab to calculate the total sales (revenue) for each product in each store.
# Add a new column for total sales
df['TotalSales'] = df['Quantity'] * df['Price']
# Create a crosstab for total sales by store and product
crosstab_sales = pd.crosstab(df['Store'], df['Product'], values=df['TotalSales'], aggfunc='sum')
print(crosstab_sales)
Explanation:
df['TotalSales'] = df['Quantity'] * df['Price']
: Adds a new columnTotalSales
to the DataFrame by multiplyingQuantity
andPrice
.values=df['TotalSales']
: We're interested in theTotalSales
column.aggfunc='sum'
: We're summing the total sales.
Output:
Example 4: Average Quantity Sold by Store and Product
Let's create a crosstab to calculate the average quantity sold per transaction for each product in each store.
# Create a crosstab for average quantity sold by store and product
crosstab_avg_quantity = pd.crosstab(df['Store'], df['Product'], values=df['Quantity'], aggfunc='mean')
print(crosstab_avg_quantity)
Explanation:
values=df['Quantity']
: We're interested in theQuantity
column.aggfunc='mean':
We're calculating the mean quantity sold.
Output:
Example 5: Minimum and Maximum Price by Store and Product
Let's create a crosstab to find the minimum and maximum price of each product in each store.
# Create a crosstab for minimum price by store and product
crosstab_min_price = pd.crosstab(df['Store'], df['Product'], values=df['Price'], aggfunc='min')
print(crosstab_min_price)
# Create a crosstab for maximum price by store and product
crosstab_max_price = pd.crosstab(df['Store'], df['Product'], values=df['Price'], aggfunc='max')
print(crosstab_max_price)
Explanation:
aggfunc='min'
: Finds the minimum price.aggfunc='max'
: Finds the maximum price.
Output:
Example 6: Adding Row and Column Totals
Let's create a crosstab for total quantity sold by store and product, and include row and column totals.
# Create a crosstab with row and column totals
crosstab_with_totals = pd.crosstab(df['Store'], df['Product'], values=df['Quantity'], aggfunc='sum', margins=True)
print(crosstab_with_totals)
Explanation:
margins=True
: Adds row and column totals (subtotals).
Output:
Example 7: Normalising the Crosstab
Let's create a normalised crosstab to show the proportion of each product sold in each store.
# Create a normalised crosstab for the proportion of products sold by store
crosstab_normalised = pd.crosstab(df['Store'], df['Product'], normalize='index')
print(crosstab_normalised)
Explanation:
normalize='index'
: Normalises the crosstab by the row totals, showing the proportion of each product sold in each store.
Output:
Summary
The crosstab()
function in Pandas is a versatile tool for summarising and analysing data. In this post, we demonstrated seven real-life examples of how to use the crosstab()
function on a dataset of retail sales transactions. These examples included calculating frequencies, total quantities, total sales, average quantities, minimum and maximum prices, adding row and column totals, and normalising the crosstab.
By mastering the crosstab()
function, you can effectively analyse and gain insights from your data, making it an essential skill for any data analyst.
Ready for More Python Fun?
Subscribe to our newsletter now and get a free Python cheat sheet! Dive deeper into Python programming with more exciting projects and tutorials designed just for beginners.
Keep exploring, keep coding, and enjoy your journey into data analytics with Pandas.