How to add a new column to an existing DataFrame?
Adding a column to a Pandas DataFrame is a common task—whether you’re transforming existing columns, creating a constant column, or appending external data. Below, you’ll find several approaches, along with practical tips and performance considerations.
1. Assigning with Bracket Notation
The most common and straightforward method is to assign the new column directly to the DataFrame using bracket notation:
import pandas as pd df = pd.DataFrame({ "A": [1, 2, 3], "B": [4, 5, 6] }) # 1. Creating a new column from existing columns df["C"] = df["A"] + df["B"] print(df) # A B C # 0 1 4 5 # 1 2 5 7 # 2 3 6 9 # 2. Creating a constant column df["D"] = 100 print(df) # A B C D # 0 1 4 5 100 # 1 2 5 7 100 # 2 3 6 9 100
- Pros: Very readable, simple, and direct.
- Cons: By default, this appends the new column at the end of the DataFrame.
2. Using df.assign()
The assign()
method creates a new DataFrame while adding or modifying columns. It can be chained with other DataFrame methods for a functional programming style:
df2 = df.assign(E=df["B"] * 2, F=lambda x: x["A"] + x["C"]) print(df2) # A B C D E F # 0 1 4 5 100 8 6 # 1 2 5 7 100 10 9 # 2 3 6 9 100 12 12
- Pros: Perfect for method chaining (e.g.,
df.pipe(...).assign(...).rename(...)
). - Cons: Returns a new DataFrame instead of modifying
df
in place (which can be beneficial or not, depending on your use case).
3. Using df.insert()
If you want to insert a column at a specific position (instead of appending at the end), use df.insert(loc, column, value)
:
df.insert(1, "X", [10, 20, 30]) # Insert at index 1 print(df) # A X B C D # 0 1 10 4 5 100 # 1 2 20 5 7 100 # 2 3 30 6 9 100
- loc (int): The integer position where the new column will be placed.
- Pros: Precisely controls the column position.
- Cons: Modifies
df
in place (this might be desired or not, depending on your workflow).
4. Adding Columns in Bulk
If you need to add multiple new columns, you can assign a dictionary of columns:
new_cols = { "G": df["A"] * 3, "H": df["B"] + df["C"], } df = df.assign(**new_cols) print(df) # A X B C D G H # 0 1 10 4 5 100 3 9 # 1 2 20 5 7 100 6 12 # 2 3 30 6 9 100 9 15
- By unpacking (
**new_cols
), each key innew_cols
becomes a new column, and each value becomes the column’s data.
5. Performance & Best Practices
- Adding Columns in a Loop: Creating columns one by one can be slow if you’re doing it frequently on large DataFrames. Consider building a dictionary or multiple Series and then assigning them all at once if performance is critical.
- In-Place vs. Copy:
- Methods like
df["col"] = ...
ordf.insert()
modify the existing DataFrame in place. - Methods like
df.assign()
return a new DataFrame, which can be safer in functional pipelines.
- Methods like
Learn More About Pandas & Python
If you’re looking to deepen your data manipulation skills and Python proficiency, here are some recommended courses from DesignGurus.io:
-
Grokking Python Fundamentals
Dive into Python 3 essentials, including best practices for writing clean, maintainable code—key for large data projects. -
Grokking the Coding Interview: Patterns for Coding Questions
Ideal if you’re preparing for technical interviews, focusing on pattern-based solutions to common coding challenges.
If you want to learn about designing and scaling large software systems:
- Grokking System Design Fundamentals
Discover the core principles behind scalable, distributed systems—vital knowledge for senior or architect-level roles.
Final Thoughts
To add a new column to an existing DataFrame:
- Bracket Notation:
df["NewCol"] = data
(simple, direct). df.assign()
: Returns a new DataFrame—useful in chained expressions.df.insert()
: Place the column at a specific position in the DataFrame.- Multiple Columns: Add many at once by passing a dictionary to
df.assign()
.
Select the approach that best aligns with your coding style and data workflow. Happy coding!