How to apply a function to two columns of Pandas dataframe?
When you want to create or modify a column based on two existing columns, you can use DataFrame.apply()
with axis=1
. This allows you to operate on each row as a Series, giving you easy access to multiple column values simultaneously.
Below are a few approaches:
1. Using df.apply()
with axis=1
import pandas as pd df = pd.DataFrame({ "col1": [1, 2, 3], "col2": [10, 20, 30] }) def add_two_cols(row): return row["col1"] + row["col2"] df["col3"] = df.apply(add_two_cols, axis=1) print(df)
axis=1
: This tellsapply()
to process the DataFrame row by row rather than column by column.add_two_cols
: Accepts a row (a Pandas Series) and returns the desired calculation based oncol1
andcol2
.
In practice, you could also use a lambda function:
df["col3"] = df.apply(lambda row: row["col1"] + row["col2"], axis=1)
2. Vectorized Operations (When Possible)
If your operation is a straightforward arithmetic or logical function, you can often use vectorized operations directly on the columns:
df["col3"] = df["col1"] + df["col2"]
- This is faster than using
apply()
because Pandas (and NumPy) can optimize vectorized operations. - However, for more complex logic,
apply(axis=1)
may be necessary.
3. Using df.assign()
with a Function
Another way to add the new column without modifying the original DataFrame in place is via df.assign()
:
def combine_cols(row): return row["col1"] * row["col2"] df = df.assign(col3=df.apply(combine_cols, axis=1))
df.assign()
returns a new DataFrame, allowing you to chain multiple transformations if desired.
4. Tips & Performance Considerations
- Vectorization: Always check if your task can be expressed in a vectorized way for efficiency (e.g., arithmetic, string operations, boolean comparisons).
- Row-by-Row: For more complicated operations requiring logic across columns in each row,
apply(axis=1)
is perfectly fine. Just note that it’s slower for large datasets compared to vectorized solutions. - Clarity: If your logic is more than a simple expression, consider defining a named function rather than a single-line lambda for readability.
Strengthen Your Python & Data Skills
If you’d like to expand your knowledge of Python fundamentals, including how to write and structure functions that play well with libraries like Pandas, check out Grokking Python Fundamentals by DesignGurus.io. It’s designed to help you master modern Python techniques—making data transformations, file handling, and best coding practices simpler and more efficient.
With the above methods—particularly apply(axis=1)
—you can easily work with multiple columns within each row, whether for simple arithmetic or more involved transformations in Pandas.