- CodeCraft by Dr. Christine Lee
- Posts
- 12 More Essential Data Analytics Functions for Beginners (Part 2)
12 More Essential Data Analytics Functions for Beginners (Part 2)
Getting Started with Pandas
Welcome back to our Pandas series! In the first part, we covered 12 essential Pandas functions. Now, we're going to dive deeper and explore 12 more functions that will help you manipulate and analyse data more effectively. These functions are crucial for anyone looking to become proficient in data analysis with Pandas. Let's get started!
Previous lessons:
1. merge()
: Combining DataFrames
The merge()
function combines two DataFrames based on a common column(s).
Example:
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Score': [85, 90, 88]})
# Merge DataFrames on 'ID'
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)
Explanation:
pd.merge(df1, df2, on='ID', how='inner')
: Merges df1 and df2 on the 'ID' column using an inner join.
Output:
2. concat()
: Concatenating DataFrames
The concat()
function concatenates multiple DataFrames along a particular axis.
Example:
# Create two DataFrames
df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})
df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [88, 92]})
# Concatenate DataFrames
concat_df = pd.concat([df1, df2])
print(concat_df)
Explanation:
pd.concat([df1, df2])
: Concatenates df1 and df2 along the default axis (rows).
Output:
3. pivot_table()
: Creating Pivot Tables
The pivot_table()
function creates a pivot table from a DataFrame. The pivot_table()
function in Pandas is a powerful tool for summarizing and aggregating data.
Pivot tables are commonly used in business analytics, finance, and sales to create reports and summaries that help in decision-making. By using the pivot_table()
function, you can transform raw data into meaningful summaries and uncover valuable insights.
Example:
import pandas as pd
# Create a sample DataFrame
data = {
'TransactionID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-04', '2023-01-04', '2023-01-05', '2023-01-05'],
'Store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A'],
'Product': ['Laptop', 'Tablet', 'Laptop', 'Tablet', 'Smartphone', 'Tablet', 'Smartphone', 'Tablet', 'Laptop', 'Smartphone'],
'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'],
'Quantity': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Price': [1000, 500, 1000, 500, 700, 500, 700, 500, 1000, 700]
}
df = pd.DataFrame(data)
print(df)
# Create a pivot table for total quantity sold by store and product
pivot_quantity = pd.pivot_table(df, values='Quantity', index='Store', columns='Product', aggfunc='sum')
print(pivot_quantity)
Explanation:
pd.pivot_table(df, values='Quantity', index='Store', columns='Product', aggfunc='sum'): Creates a pivot table with
Store
as the index,Product
as the columns, and the sum ofQuantity
as the values.
Interpreting the Results:
Store A:
Sold 10 units of Laptop.
Sold 15 units of Smartphone.
Sold 8 units of Tablet.
Store B:
Sold 3 units of Laptop.
Sold 7 units of Smartphone.
Sold 12 units of Tablet.
Output:
4. apply()
: Applying Functions to Data
The apply()
function applies a function along an axis of the DataFrame.
Example:
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})
# Apply a function to each column
df = df.apply(lambda x: x * 2)
print(df)
Explanation:
df.apply(lambda x: x * 2)
: Applies a lambda function that multiplies each element by 2.
Output:
5. map()
: Applying Functions Element-wise
The map()
function applies a function element-wise across the entire DataFrame.
Example:
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})
# Apply a function to each element
df = df.map(lambda x: x * 2)
print(df)
Explanation:
df.map(lambda x: x * 2)
: Applies a lambda function that multiplies each element by 2 across the entire DataFrame.
Output:
6. map()
: Mapping Values in a Series
The map()
function maps values in a Series using a function or a dictionary.
Example:
# Create a Series
s = pd.Series(['cat', 'dog', 'bird'])
# Map values to new values
s = s.map({'cat': 'kitten', 'dog': 'puppy', 'bird': 'chick'})
print(s)
Explanation:
s.map({'cat': 'kitten', 'dog': 'puppy', 'bird': 'chick'})
: Maps the values in the Series to new values based on a dictionary.The
map()
function is useful when you want to replace the values in a Series with new values based on a given dictionary or function.The dictionary
{'cat': 'kitten', 'dog': 'puppy', 'bird': 'chick'}
specifies that 'cat
' should be replaced with 'kitten
', 'dog
' should be replaced with 'puppy
', and 'bird
' should be replaced with 'chick
'.The map() function returns a new Series with the transformed values.
Output:
7. groupby()
: Grouping Data
The groupby()
function groups the DataFrame using a column or columns and performs an aggregate function on each group.
The groupby()
function in Pandas is a powerful tool for summarizing and analyzing data by grouping it based on one or more columns and performing aggregate operations on each group.
By using the groupby()
function, you can easily generate insightful summaries that can aid in decision-making and strategy development.
Example:
import pandas as pd
# Create a sample DataFrame
data = {
'TransactionID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-04', '2023-01-04', '2023-01-05', '2023-01-05'],
'Store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A'],
'Product': ['Product 1', 'Product 2', 'Product 1', 'Product 2', 'Product 1', 'Product 2', 'Product 1', 'Product 2', 'Product 1', 'Product 2'],
'Quantity': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Price': [10, 20, 10, 20, 10, 20, 10, 20, 10, 20]
}
df = pd.DataFrame(data)
print(df)
# Group by 'Store' and calculate the total quantity sold
total_quantity_by_store = df.groupby('Store')['Quantity'].sum()
print(total_quantity_by_store)
Explanation:
df.groupby('Store')['Quantity'].sum()
: Groups the DataFrame by theStore
column and calculates the sum of theQuantity
column for each group.
Interpreting the Results:
Store A: Sold a total of 33 products.
Store B: Sold a total of 22 products.
Output:
8. reset_index()
: Resetting the Index
The reset_index()
function resets the index of the DataFrame.
Example:
# Create a DataFrame with an index
df = pd.DataFrame({'A': [1, 2, 3]}, index=['a', 'b', 'c'])
# Reset the index
df_reset = df.reset_index()
print(df_reset)
Explanation:
df.reset_index()
: Resets the index of the DataFrame.
Output:
9. set_index()
: Setting a Column as Index
The set_index()
function sets a column as the index of the DataFrame.
Example:
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})
# Set 'B' as the index
df.set_index('B', inplace=True)
print(df)
Explanation:
df.set_index('B', inplace=True)
: Sets the 'B' column as the index of the DataFrame.
Output:
10. dropna()
: Removing Missing Values
The dropna()
function removes missing values from the DataFrame.
Example:
# Create a DataFrame with missing values
df = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6]})
# Drop rows with missing values
df_clean = df.dropna()
print(df_clean)
Explanation:
df.dropna()
: Drops rows that contain missing values from the DataFrame.
Output:
11. crosstab()
: Computing Cross-Tabulations
The crosstab()
function computes a simple cross-tabulation of two or more factors. The crosstab()
function in Pandas is a powerful tool for summarizing and analyzing categorical data.
Cross-tabulations are commonly used in various fields such as marketing, sales analysis, and social sciences to analyze categorical data and uncover relationships between different factors.
By using the crosstab()
function, you can easily generate insightful summaries that can aid in decision-making and strategy development.
Example:
import pandas as pd
# Create a sample DataFrame
data = {
'CustomerID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Product': ['Laptop', 'Laptop', 'Tablet', 'Smartphone', 'Smartphone', 'Tablet', 'Laptop', 'Smartphone', 'Tablet', 'Laptop'],
'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'],
'Quantity': [1, 2, 1, 1, 2, 1, 1, 3, 2, 2]
}
df = pd.DataFrame(data)
print(df)
# Compute cross-tabulation of Product and Quantity
crosstab_result = pd.crosstab(df['Product'], df['Quantity'])
print(crosstab_result)
Explanation:
pd.crosstab(df['Product'], df['Quantity'])
: This computes a cross-tabulation of theProduct
andQuantity
columns. It counts the occurrences of each quantity for each product, which is how many times each product was purchased in specific quantities.
Interpreting the Results
Laptop:
Quantity 1: Purchased 2 times
Quantity 2: Purchased 2 times
Quantity 3: Purchased 0 times
Smartphone:
Quantity 1: Purchased 1 time
Quantity 2: Purchased 1 time
Quantity 3: Purchased 1 time
Tablet:
Quantity 1: Purchased 2 times
Quantity 2: Purchased 1 time
Quantity 3: Purchased 0 times
Output:
12. value_counts()
: Counting Unique Values
The value_counts()
function counts the unique values in a Series.
Example:
# Create a Series
s = pd.Series(['Python', 'Java', 'a', 'c', 'b', 'a'])
# Count unique values
value_counts = s.value_counts()
print(value_counts)
Explanation:
s.value_counts()
: Counts the unique values in the Series.
Output:
Conclusion
Congratulations! You’ve now learned 12 more essential Pandas functions that will help you manipulate and analyse data more effectively. By understanding these functions and practicing with real-life examples, you’ll become proficient in using Pandas for data analysis.
Recommendation and Inspiration
Ready for More Python Fun?
Subscribe to our newsletter now and get a free Python cheat sheet! Dive deeper into Python programming with more exciting projects and tutorials designed just for beginners.
Keep exploring, keep coding, and enjoy your journey into data analytics with Pandas!