7 Real-Life Examples with Python's crosstab

Powerful Tool for Summarising Data and Computing Cross-tabulations

Sponsored
AiNexaVerse NewsWeekly AI Tools in your email!

The crosstab() function in Pandas is a powerful tool for summarising data and computing cross-tabulations. It allows you to calculate the frequency distribution of variables and perform other summary statistics. In this post, we'll explore seven real-life examples of how to use the crosstab() function, with detailed explanations for each code snippet.

 

What is a Crosstab?

A crosstab, or cross-tabulation, is a method to quantitatively analyze the relationship between multiple variables. It displays the frequency distribution of variables in a tabular format, making it easier to observe patterns and correlations in your data.

 

Understanding the crosstab() Function

 

The crosstab() function in Pandas has several parameters that you can use to customise your cross-tabulation. Here are some of the key parameters:

 

  • index: The column(s) to group by in the rows.

  • columns: The column(s) to group by in the columns.

  • values: The column to aggregate.

  • aggfunc: The aggregate function to apply to the grouped data (e.g., sum, mean, count).

  • margins: Add row/column margins (subtotals).

 

Dataset Overview

We'll use a sample dataset containing sales transactions from a retail store. The dataset includes the following columns:

 

  • TransactionID: The unique identifier for each transaction.

  • Date: The date of the transaction.

  • Store: The store where the transaction took place.

  • Product: The product sold.

  • Quantity: The quantity of the product sold.

  • Price: The price of the product.

 

Here is the sample data for sales_data.csv:

 

TransactionID,Date,Store,Product,Quantity,Price

1,2024-01-01,Store A,Laptop,1,1000

2,2024-01-01,Store A,Tablet,2,500

3,2024-01-02,Store B,Laptop,3,1000

4,2024-01-02,Store B,Tablet,4,500

5,2024-01-03,Store A,Smartphone,5,700

6,2024-01-03,Store A,Tablet,6,500

7,2024-01-04,Store B,Smartphone,7,700

8,2024-01-04,Store B,Tablet,8,500

9,2024-01-05,Store A,Laptop,9,1000

10,2024-01-05,Store A,Smartphone,10,700

 

Let's load this data into a Pandas DataFrame and explore how to use the crosstab() function.

 

import pandas as pd

 

# Load the data into a DataFrame

df = pd.read_csv('sales_data.csv')

print(df)

 

Example 1: Frequency of Products Sold by Store

 

Let's create a crosstab to calculate the frequency of each product sold in each store.

 

# Create a crosstab for frequency of products sold by store

crosstab_frequency = pd.crosstab(df['Store'], df['Product'])

print(crosstab_frequency)

 

Explanation:

  • pd.crosstab(df['Store'], df['Product']): Creates a crosstab with Store as the rows and Product as the columns, showing the frequency of each product sold in each store.

 

Output:

Example 2: Total Quantity Sold by Store and Product

 

Let's create a crosstab to calculate the total quantity of each product sold in each store.

 

# Create a crosstab for total quantity sold by store and product

crosstab_quantity = pd.crosstab(df['Store'], df['Product'], values=df['Quantity'], aggfunc='sum')

print(crosstab_quantity)

 

Explanation:

  • values=df['Quantity']: We're interested in the Quantity column.

  • aggfunc='sum': We're summing the quantities.

 

Output:

Example 3: Total Sales by Store and Product

 

Let's create a crosstab to calculate the total sales (revenue) for each product in each store.

 

# Add a new column for total sales

df['TotalSales'] = df['Quantity'] * df['Price']

 

# Create a crosstab for total sales by store and product

crosstab_sales = pd.crosstab(df['Store'], df['Product'], values=df['TotalSales'], aggfunc='sum')

print(crosstab_sales)

 

Explanation:

  • df['TotalSales'] = df['Quantity'] * df['Price']: Adds a new column TotalSales to the DataFrame by multiplying Quantity and Price.

  • values=df['TotalSales']: We're interested in the TotalSales column.

  • aggfunc='sum': We're summing the total sales.

 

Output:

 

Example 4: Average Quantity Sold by Store and Product

 

Let's create a crosstab to calculate the average quantity sold per transaction for each product in each store.

 

# Create a crosstab for average quantity sold by store and product

crosstab_avg_quantity = pd.crosstab(df['Store'], df['Product'], values=df['Quantity'], aggfunc='mean')

print(crosstab_avg_quantity)

 

Explanation:

  • values=df['Quantity']: We're interested in the Quantity column.

  • aggfunc='mean': We're calculating the mean quantity sold.

Output:

 

Example 5: Minimum and Maximum Price by Store and Product

 

Let's create a crosstab to find the minimum and maximum price of each product in each store.

 

# Create a crosstab for minimum price by store and product

crosstab_min_price = pd.crosstab(df['Store'], df['Product'], values=df['Price'], aggfunc='min')

print(crosstab_min_price)

 

# Create a crosstab for maximum price by store and product

crosstab_max_price = pd.crosstab(df['Store'], df['Product'], values=df['Price'], aggfunc='max')

print(crosstab_max_price)

 

Explanation:

  • aggfunc='min': Finds the minimum price.

  • aggfunc='max': Finds the maximum price.

Output:

Example 6: Adding Row and Column Totals

 

Let's create a crosstab for total quantity sold by store and product, and include row and column totals.

 

# Create a crosstab with row and column totals

crosstab_with_totals = pd.crosstab(df['Store'], df['Product'], values=df['Quantity'], aggfunc='sum', margins=True)

print(crosstab_with_totals)

 

Explanation:

  • margins=True: Adds row and column totals (subtotals).

 

Output:

 

Example 7: Normalising the Crosstab

 

Let's create a normalised crosstab to show the proportion of each product sold in each store.

 

# Create a normalised crosstab for the proportion of products sold by store

crosstab_normalised = pd.crosstab(df['Store'], df['Product'], normalize='index')

print(crosstab_normalised)

 

Explanation:

  • normalize='index': Normalises the crosstab by the row totals, showing the proportion of each product sold in each store.

Output:

Summary

The crosstab() function in Pandas is a versatile tool for summarising and analysing data. In this post, we demonstrated seven real-life examples of how to use the crosstab() function on a dataset of retail sales transactions. These examples included calculating frequencies, total quantities, total sales, average quantities, minimum and maximum prices, adding row and column totals, and normalising the crosstab.

 

By mastering the crosstab() function, you can effectively analyse and gain insights from your data, making it an essential skill for any data analyst.

 

Ready for More Python Fun?

Subscribe to our newsletter now and get a free Python cheat sheet! Dive deeper into Python programming with more exciting projects and tutorials designed just for beginners.

 

Keep exploring, keep coding, and enjoy your journey into data analytics with Pandas.