Logo

How to add a new column to an existing DataFrame?

Adding a column to a Pandas DataFrame is a common task—whether you’re transforming existing columns, creating a constant column, or appending external data. Below, you’ll find several approaches, along with practical tips and performance considerations.

1. Assigning with Bracket Notation

The most common and straightforward method is to assign the new column directly to the DataFrame using bracket notation:

import pandas as pd df = pd.DataFrame({ "A": [1, 2, 3], "B": [4, 5, 6] }) # 1. Creating a new column from existing columns df["C"] = df["A"] + df["B"] print(df) # A B C # 0 1 4 5 # 1 2 5 7 # 2 3 6 9 # 2. Creating a constant column df["D"] = 100 print(df) # A B C D # 0 1 4 5 100 # 1 2 5 7 100 # 2 3 6 9 100
  1. Pros: Very readable, simple, and direct.
  2. Cons: By default, this appends the new column at the end of the DataFrame.

2. Using df.assign()

The assign() method creates a new DataFrame while adding or modifying columns. It can be chained with other DataFrame methods for a functional programming style:

df2 = df.assign(E=df["B"] * 2, F=lambda x: x["A"] + x["C"]) print(df2) # A B C D E F # 0 1 4 5 100 8 6 # 1 2 5 7 100 10 9 # 2 3 6 9 100 12 12
  1. Pros: Perfect for method chaining (e.g., df.pipe(...).assign(...).rename(...)).
  2. Cons: Returns a new DataFrame instead of modifying df in place (which can be beneficial or not, depending on your use case).

3. Using df.insert()

If you want to insert a column at a specific position (instead of appending at the end), use df.insert(loc, column, value):

df.insert(1, "X", [10, 20, 30]) # Insert at index 1 print(df) # A X B C D # 0 1 10 4 5 100 # 1 2 20 5 7 100 # 2 3 30 6 9 100
  1. loc (int): The integer position where the new column will be placed.
  2. Pros: Precisely controls the column position.
  3. Cons: Modifies df in place (this might be desired or not, depending on your workflow).

4. Adding Columns in Bulk

If you need to add multiple new columns, you can assign a dictionary of columns:

new_cols = { "G": df["A"] * 3, "H": df["B"] + df["C"], } df = df.assign(**new_cols) print(df) # A X B C D G H # 0 1 10 4 5 100 3 9 # 1 2 20 5 7 100 6 12 # 2 3 30 6 9 100 9 15
  • By unpacking (**new_cols), each key in new_cols becomes a new column, and each value becomes the column’s data.

5. Performance & Best Practices

  • Adding Columns in a Loop: Creating columns one by one can be slow if you’re doing it frequently on large DataFrames. Consider building a dictionary or multiple Series and then assigning them all at once if performance is critical.
  • In-Place vs. Copy:
    • Methods like df["col"] = ... or df.insert() modify the existing DataFrame in place.
    • Methods like df.assign() return a new DataFrame, which can be safer in functional pipelines.

Learn More About Pandas & Python

If you’re looking to deepen your data manipulation skills and Python proficiency, here are some recommended courses from DesignGurus.io:

  1. Grokking Python Fundamentals
    Dive into Python 3 essentials, including best practices for writing clean, maintainable code—key for large data projects.

  2. Grokking the Coding Interview: Patterns for Coding Questions
    Ideal if you’re preparing for technical interviews, focusing on pattern-based solutions to common coding challenges.

If you want to learn about designing and scaling large software systems:

Final Thoughts

To add a new column to an existing DataFrame:

  1. Bracket Notation: df["NewCol"] = data (simple, direct).
  2. df.assign(): Returns a new DataFrame—useful in chained expressions.
  3. df.insert(): Place the column at a specific position in the DataFrame.
  4. Multiple Columns: Add many at once by passing a dictionary to df.assign().

Select the approach that best aligns with your coding style and data workflow. Happy coding!

CONTRIBUTOR
TechGrind