How to get statistics for each group (such as count, mean, etc) using pandas GroupBy?
When analyzing data, you often need to calculate summary statistics (e.g., count, mean, standard deviation) for subsets (groups) of your DataFrame. Pandas provides a powerful groupby()
method to split your data into groups and apply various aggregation functions.
Below are several ways to use groupby()
to derive statistics for each group.
1. Basic Aggregations on a Single Column
If you want to compute a simple metric (like mean) of one column, grouped by another column:
import pandas as pd df = pd.DataFrame({ "Category": ["A", "A", "B", "B", "C", "C", "C"], "Value": [10, 15, 5, 8, 12, 20, 7] }) # Compute the mean of 'Value' for each 'Category' grouped_mean = df.groupby("Category")["Value"].mean() print(grouped_mean)
- This returns a Series with the group labels as the index and the computed mean for each group.
2. Multiple Aggregations on a Single Column
If you need several statistics at once (e.g., mean
, count
, std
, etc.) for the same column, pass a list of functions to .agg()
:
grouped_stats = df.groupby("Category")["Value"].agg(["count", "mean", "std"]) print(grouped_stats)
- The result is a DataFrame with separate columns for each aggregation function.
3. Multiple Aggregations on Multiple Columns
When you have multiple columns and each needs different aggregation functions, you can use a dictionary or list of lists approach:
df2 = pd.DataFrame({ "Category": ["A", "A", "B", "B", "C", "C", "C"], "Value1": [10, 15, 5, 8, 12, 20, 7], "Value2": [2, 4, 6, 1, 9, 3, 5] }) grouped_stats_multi = df2.groupby("Category").agg({ "Value1": ["count", "mean"], "Value2": ["mean", "std"] }) print(grouped_stats_multi)
- Key: The dictionary maps each column to a list of aggregate functions.
- The resulting DataFrame has a MultiIndex on the columns, indicating both the original column name and the aggregation function used.
4. Applying Custom Functions
You can also define and apply custom aggregation functions:
import numpy as np def range_of_values(x): return x.max() - x.min() custom_agg = df2.groupby("Category")["Value1"].agg(["mean", range_of_values]) print(custom_agg)
- Here, we define
range_of_values
to compute the difference between the max and min in each group. - You can include any custom function that takes a Series (a group of values) and returns a single number.
5. Descriptive Statistics at Once
If you want a full summary (like .describe()
) per group, you can do:
desc_stats = df2.groupby("Category")["Value1"].describe() print(desc_stats)
- This shows count, mean, std, min, quartiles, and max for
Value1
within each group.
Next Steps: Strengthen Your Python Fundamentals
A thorough grasp of Python basics—data structures, list comprehensions, functions, best practices—can greatly streamline your Pandas workflows. If you want to sharpen these essentials, consider Grokking Python Fundamentals by DesignGurus.io. This course helps you write cleaner, more efficient code in both Python and data analysis contexts.
With these groupby()
patterns in your toolkit, you’ll be able to rapidly compute crucial metrics for data subsets, whether you’re doing exploratory data analysis or building production-grade pipelines. Happy analyzing!