How do I create an empty Pandas DataFrame, then add rows, one by one?
While it’s generally more efficient to gather data in a Python list or dictionary and then create a Pandas DataFrame all at once, sometimes you need to build a DataFrame incrementally (e.g., reading data in a loop or from an API). Below are a few ways to achieve this with code examples.
1. Create an Empty DataFrame with Defined Columns, Then Use loc
If you already know the names of the columns, you can define them when creating an empty DataFrame, and then use df.loc[]
to add rows:
import pandas as pd # Step 1: Create an empty DataFrame with column names df = pd.DataFrame(columns=["Name", "Age", "City"]) # Step 2: Add rows by index using loc df.loc[0] = ["Alice", 25, "New York"] df.loc[1] = ["Bob", 30, "Chicago"] # Display the DataFrame print(df)
- The line
df.loc[0] = [...]
creates a new row at index0
and populates its values. - Each subsequent row can be added at the next integer index (or any other label you choose).
- Pros: Straightforward to read and reason about.
- Cons: Can be slow if you’re adding many rows in a loop, because each addition can potentially involve copying data under the hood.
2. Using concat()
Iteratively
For older code, you might see df.append(...)
, but DataFrame.append()
has been deprecated in recent versions of Pandas. The current recommended approach is using pd.concat()
:
import pandas as pd df = pd.DataFrame(columns=["Name", "Age", "City"]) # Example row to add row_to_add = pd.DataFrame([["Charlie", 35, "Los Angeles"]], columns=["Name", "Age", "City"]) # Concatenate the new row df = pd.concat([df, row_to_add], ignore_index=True) print(df)
- Create a one-row DataFrame (or multiple rows) matching the same columns.
- Concat the new row(s) onto the existing DataFrame.
- Pros: Explicit control; easy to handle multiple rows at once.
- Cons: Iterating row by row with
concat()
can be relatively slow for large datasets. For small or medium datasets, it’s often fine.
3. Building a List, Then Creating the DataFrame at the End (Best Practice for Large Data)
Although this method isn’t strictly “row by row” in the final DataFrame, it’s often the most efficient approach for larger data sets:
import pandas as pd rows_list = [] # Suppose you're reading or generating data in a loop for i in range(3): # Collect data as a dict row_dict = { "Name": f"Person_{i}", "Age": 20 + i, "City": "City_" + str(i), } rows_list.append(row_dict) # Create DataFrame at the end df = pd.DataFrame(rows_list) print(df)
- Accumulate rows (dicts or lists) in a list (
rows_list
). - Create the DataFrame once using
pd.DataFrame(rows_list)
. - Pros: Much faster and more memory-efficient for large datasets or repeated operations.
- Cons: You don’t see an immediate “live” DataFrame update row by row.
4. Insert Rows with df.loc[len(df)]
If you don’t want to specify the index each time, you can always use the current length of the DataFrame as the new index:
import pandas as pd df = pd.DataFrame(columns=["Name", "Age", "City"]) df.loc[len(df)] = ["Diana", 28, "Houston"] df.loc[len(df)] = ["Ethan", 33, "Seattle"] print(df)
len(df)
returns the number of existing rows, so it appends a new row at the “end” of the DataFrame.- Pros: Convenient when you just want to stack new rows in sequence.
- Cons: Still can be slow in large loops, similar to the first method.
Performance Considerations
- Appending rows one by one is usually fine for small datasets or simple scripts.
- For high-volume or performance-sensitive tasks, you’ll want to collect data in an external list or dictionary, then create your DataFrame once.
- Avoid chain indexing or partial copying if you plan to repeatedly modify the DataFrame in place; it can lead to warnings or performance overhead.
Level Up Your Data & Python Skills
If you’re looking to become more proficient in Python and Pandas, here are courses from DesignGurus.io you might find helpful:
-
Grokking Python Fundamentals
Covers Python 3 concepts, including data structures, file I/O, object-oriented programming, and more—ideal if you want a well-rounded foundation. -
Grokking the Coding Interview: Patterns for Coding Questions
If you’re preparing for coding interviews, this course teaches you pattern-based approaches to tackle common interview problems efficiently in Python and other languages.
If your career path involves building or designing large-scale systems:
- Grokking System Design Fundamentals
An excellent resource for understanding how distributed systems work under the hood—crucial for senior engineering roles.
Final Thoughts
- For small or simple projects: Adding rows to an empty DataFrame with
loc
orconcat
can be perfectly fine. - For bigger data: Build a list of rows (or dictionaries) and create the DataFrame once at the end for speed and memory efficiency.
Choose the approach that best matches your dataset size, performance needs, and code readability preferences. Happy coding!