The groupby method in pandas is one of the most powerful tools for aggregating and summarizing data. It allows you to split data into groups based on some criteria, perform computations on these groups, and combine the results into a new DataFrame.
Key Steps of groupby
-
Split: The data is divided into groups based on a key or multiple keys (column values).
- Apply: A function is applied to each group independently.
- Combine: The results are combined into a single data structure.
Syntax
df.groupby(by, axis=0, level=None, as_index=True)
by: The column(s) or index level(s) to group by.axis: Defaults to 0 (rows). You can group by columns ifaxis=1.level: For MultiIndex, group by a specific level.as_index: IfTrue, group labels become the index in the result. Set toFalseto retain the original index.
Example
import pandas as pd
df = pd.read_csv('../DataSets/usedcars.csv')
df.groupby(by=['color','model']).describe()