CodeCraft by Dr. Christine Lee
Posts
12 More Essential Data Analytics Functions for Beginners (Part 2)

12 More Essential Data Analytics Functions for Beginners (Part 2)

Getting Started with Pandas

Welcome back to our Pandas series! In the first part, we covered 12 essential Pandas functions. Now, we're going to dive deeper and explore 12 more functions that will help you manipulate and analyse data more effectively. These functions are crucial for anyone looking to become proficient in data analysis with Pandas. Let's get started!

Previous lessons:

1. `merge()`: Combining DataFrames

The merge() function combines two DataFrames based on a common column(s).

Example:

import pandas as pd

# Create two DataFrames

df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})

df2 = pd.DataFrame({'ID': [1, 2, 4], 'Score': [85, 90, 88]})

# Merge DataFrames on 'ID'

merged_df = pd.merge(df1, df2, on='ID', how='inner')

print(merged_df)

Explanation:

pd.merge(df1, df2, on='ID', how='inner'): Merges df1 and df2 on the 'ID' column using an inner join.

Output:

2. `concat()`: Concatenating DataFrames

The concat() function concatenates multiple DataFrames along a particular axis.

Example:

# Create two DataFrames

df1 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Score': [85, 90]})

df2 = pd.DataFrame({'Name': ['Charlie', 'David'], 'Score': [88, 92]})

# Concatenate DataFrames

concat_df = pd.concat([df1, df2])

print(concat_df)

Explanation:

pd.concat([df1, df2]): Concatenates df1 and df2 along the default axis (rows).

Output:

3. `pivot_table()`: Creating Pivot Tables

The pivot_table() function creates a pivot table from a DataFrame. The pivot_table() function in Pandas is a powerful tool for summarizing and aggregating data.

Pivot tables are commonly used in business analytics, finance, and sales to create reports and summaries that help in decision-making. By using the pivot_table() function, you can transform raw data into meaningful summaries and uncover valuable insights.

Example:

import pandas as pd

# Create a sample DataFrame

data = {

'TransactionID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-04', '2023-01-04', '2023-01-05', '2023-01-05'],

'Store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A'],

'Product': ['Laptop', 'Tablet', 'Laptop', 'Tablet', 'Smartphone', 'Tablet', 'Smartphone', 'Tablet', 'Laptop', 'Smartphone'],

'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'],

'Quantity': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Price': [1000, 500, 1000, 500, 700, 500, 700, 500, 1000, 700]

}

df = pd.DataFrame(data)

print(df)

# Create a pivot table for total quantity sold by store and product

pivot_quantity = pd.pivot_table(df, values='Quantity', index='Store', columns='Product', aggfunc='sum')

print(pivot_quantity)

Explanation:

pd.pivot_table(df, values='Quantity', index='Store', columns='Product', aggfunc='sum'): Creates a pivot table with Store as the index, Product as the columns, and the sum of Quantity as the values.

Interpreting the Results:

Store A:
- Sold 10 units of Laptop.
- Sold 15 units of Smartphone.
- Sold 8 units of Tablet.
Store B:
- Sold 3 units of Laptop.
- Sold 7 units of Smartphone.
- Sold 12 units of Tablet.

Output:

4. `apply()`: Applying Functions to Data

The apply() function applies a function along an axis of the DataFrame.

Example:

# Create a DataFrame

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})

# Apply a function to each column

df = df.apply(lambda x: x * 2)

print(df)

Explanation:

df.apply(lambda x: x * 2): Applies a lambda function that multiplies each element by 2.

Output:

5. `map()`: Applying Functions Element-wise

The map() function applies a function element-wise across the entire DataFrame.

Example:

# Create a DataFrame

df = pd.DataFrame({'A': [1, 2, 3], 'B': [10, 20, 30]})

# Apply a function to each element

df = df.map(lambda x: x * 2)

print(df)

Explanation:

df.map(lambda x: x * 2): Applies a lambda function that multiplies each element by 2 across the entire DataFrame.

Output:

6. `map()`: Mapping Values in a Series

The map() function maps values in a Series using a function or a dictionary.

Example:

# Create a Series

s = pd.Series(['cat', 'dog', 'bird'])

# Map values to new values

s = s.map({'cat': 'kitten', 'dog': 'puppy', 'bird': 'chick'})

print(s)

Explanation:

s.map({'cat': 'kitten', 'dog': 'puppy', 'bird': 'chick'}): Maps the values in the Series to new values based on a dictionary.
The map() function is useful when you want to replace the values in a Series with new values based on a given dictionary or function.
The dictionary {'cat': 'kitten', 'dog': 'puppy', 'bird': 'chick'} specifies that 'cat' should be replaced with 'kitten', 'dog' should be replaced with 'puppy', and 'bird' should be replaced with 'chick'.
The map() function returns a new Series with the transformed values.

Output:

7. `groupby()`: Grouping Data

The groupby() function groups the DataFrame using a column or columns and performs an aggregate function on each group.

The groupby() function in Pandas is a powerful tool for summarizing and analyzing data by grouping it based on one or more columns and performing aggregate operations on each group.

By using the groupby() function, you can easily generate insightful summaries that can aid in decision-making and strategy development.

Example:

import pandas as pd

# Create a sample DataFrame

data = {

'TransactionID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03', '2023-01-04', '2023-01-04', '2023-01-05', '2023-01-05'],

'Store': ['Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A', 'Store B', 'Store B', 'Store A', 'Store A'],

'Product': ['Product 1', 'Product 2', 'Product 1', 'Product 2', 'Product 1', 'Product 2', 'Product 1', 'Product 2', 'Product 1', 'Product 2'],

'Quantity': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Price': [10, 20, 10, 20, 10, 20, 10, 20, 10, 20]

}

df = pd.DataFrame(data)

print(df)

# Group by 'Store' and calculate the total quantity sold

total_quantity_by_store = df.groupby('Store')['Quantity'].sum()

print(total_quantity_by_store)

Explanation:

df.groupby('Store')['Quantity'].sum(): Groups the DataFrame by the Store column and calculates the sum of the Quantity column for each group.

Interpreting the Results:

Store A: Sold a total of 33 products.
Store B: Sold a total of 22 products.

Output:

8. `reset_index()`: Resetting the Index

The reset_index() function resets the index of the DataFrame.

Example:

# Create a DataFrame with an index

df = pd.DataFrame({'A': [1, 2, 3]}, index=['a', 'b', 'c'])

# Reset the index

df_reset = df.reset_index()

print(df_reset)

Explanation:

df.reset_index(): Resets the index of the DataFrame.

Output:

9. `set_index()`: Setting a Column as Index

The set_index() function sets a column as the index of the DataFrame.

Example:

# Create a DataFrame

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']})

# Set 'B' as the index

df.set_index('B', inplace=True)

print(df)

Explanation:

df.set_index('B', inplace=True): Sets the 'B' column as the index of the DataFrame.

Output:

10. `dropna()`: Removing Missing Values

The dropna() function removes missing values from the DataFrame.

Example:

# Create a DataFrame with missing values

df = pd.DataFrame({'A': [1, 2, None], 'B': [4, None, 6]})

# Drop rows with missing values

df_clean = df.dropna()

print(df_clean)

Explanation:

df.dropna(): Drops rows that contain missing values from the DataFrame.

Output:

11. `crosstab()`: Computing Cross-Tabulations

The crosstab() function computes a simple cross-tabulation of two or more factors. The crosstab() function in Pandas is a powerful tool for summarizing and analyzing categorical data.

Cross-tabulations are commonly used in various fields such as marketing, sales analysis, and social sciences to analyze categorical data and uncover relationships between different factors.

By using the crosstab() function, you can easily generate insightful summaries that can aid in decision-making and strategy development.

Example:

import pandas as pd

# Create a sample DataFrame

data = {

'CustomerID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],

'Product': ['Laptop', 'Laptop', 'Tablet', 'Smartphone', 'Smartphone', 'Tablet', 'Laptop', 'Smartphone', 'Tablet', 'Laptop'],

'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'],

'Quantity': [1, 2, 1, 1, 2, 1, 1, 3, 2, 2]

}

df = pd.DataFrame(data)

print(df)

# Compute cross-tabulation of Product and Quantity

crosstab_result = pd.crosstab(df['Product'], df['Quantity'])

print(crosstab_result)

Explanation:

pd.crosstab(df['Product'], df['Quantity']): This computes a cross-tabulation of the Product and Quantity columns. It counts the occurrences of each quantity for each product, which is how many times each product was purchased in specific quantities.

Interpreting the Results

Laptop:
- Quantity 1: Purchased 2 times
- Quantity 2: Purchased 2 times
- Quantity 3: Purchased 0 times
Smartphone:
- Quantity 1: Purchased 1 time
- Quantity 2: Purchased 1 time
- Quantity 3: Purchased 1 time
Tablet:
- Quantity 1: Purchased 2 times
- Quantity 2: Purchased 1 time
- Quantity 3: Purchased 0 times

Output:

12. `value_counts()`: Counting Unique Values

The value_counts() function counts the unique values in a Series.

Example:

# Create a Series

s = pd.Series(['Python', 'Java', 'a', 'c', 'b', 'a'])

# Count unique values

value_counts = s.value_counts()

print(value_counts)

Explanation:

s.value_counts(): Counts the unique values in the Series.

Output:

Conclusion

Congratulations! You’ve now learned 12 more essential Pandas functions that will help you manipulate and analyse data more effectively. By understanding these functions and practicing with real-life examples, you’ll become proficient in using Pandas for data analysis.

Recommendation and Inspiration

	There's An AI For That \| The #1 AI NewsletterThe #1 AI newsletter. Read and trusted by over 1.7 million readers, including employees at Google, Microsoft, Meta, Salesforce, Intel, Samsung, Zoom, Wix, HubSpot, Nebius, Suno, Zapier, as well as ...

Ready for More Python Fun?

Subscribe to our newsletter now and get a free Python cheat sheet! Dive deeper into Python programming with more exciting projects and tutorials designed just for beginners.

Keep exploring, keep coding, and enjoy your journey into data analytics with Pandas!

12 More Essential Data Analytics Functions for Beginners (Part 2)

Getting Started with Pandas

Previous lessons:

1. merge(): Combining DataFrames

2. concat(): Concatenating DataFrames

3. pivot_table(): Creating Pivot Tables

4. apply(): Applying Functions to Data

5. map(): Applying Functions Element-wise

6. map(): Mapping Values in a Series

7. groupby(): Grouping Data

8. reset_index(): Resetting the Index

9. set_index(): Setting a Column as Index

10. dropna(): Removing Missing Values

11. crosstab(): Computing Cross-Tabulations

12. value_counts(): Counting Unique Values

Conclusion

Recommendation and Inspiration

Ready for More Python Fun?

1. `merge()`: Combining DataFrames

2. `concat()`: Concatenating DataFrames

3. `pivot_table()`: Creating Pivot Tables

4. `apply()`: Applying Functions to Data

5. `map()`: Applying Functions Element-wise

6. `map()`: Mapping Values in a Series

7. `groupby()`: Grouping Data

8. `reset_index()`: Resetting the Index

9. `set_index()`: Setting a Column as Index

10. `dropna()`: Removing Missing Values

11. `crosstab()`: Computing Cross-Tabulations

12. `value_counts()`: Counting Unique Values