How to drop rows of Pandas DataFrame whose value in a certain column is NaN?
Dropping rows with missing or NaN values is a common data-cleaning task in Pandas. Below are some straightforward ways to drop rows specifically where a certain column contains NaN.
1. Using dropna()
with subset
Parameter
The most direct way is to use dropna()
and specify which column (or columns) to look at via the subset
parameter.
import pandas as pd import numpy as np df = pd.DataFrame({ "A": [1, 2, np.nan, 4], "B": [5, np.nan, 7, 8], "C": [9, 10, 11, np.nan] }) # Drop rows where column "A" has NaN df_dropped = df.dropna(subset=["A"]) print(df_dropped)
subset=["A"]
tells Pandas to only look at columnA
when deciding whether to drop a row.- Rows that have NaN in column
A
get dropped, but rows with NaNs in other columns are unaffected.
2. Using a Boolean Condition
Sometimes, you might want a more explicit approach. You can create a boolean mask that identifies rows where the specified column is not NaN, then filter the DataFrame:
df_dropped = df[df["A"].notna()] print(df_dropped)
df["A"].notna()
produces a boolean Series that isTrue
whereA
is not NaN, andFalse
otherwise.- Slicing
df[...]
with that boolean array returns only rows whereA
is non-null.
3. Dropping Multiple Columns’ NaNs
If you want to drop rows that have NaNs in any of several columns, specify multiple columns in subset
:
# Drop rows where columns "A" OR "B" contain NaN df_dropped = df.dropna(subset=["A", "B"]) print(df_dropped)
Now, a row is dropped if it has NaN in either A
or B
.
4. Additional Tips
-
inplace=True
:- You can modify the original DataFrame without creating a new one by using
inplace=True
. For example:df.dropna(subset=["A"], inplace=True)
- However, using
inplace=True
is generally discouraged in larger pipelines where functional approaches (returning a new DataFrame) are more predictable.
- You can modify the original DataFrame without creating a new one by using
-
Keep an Eye on Other Columns:
- Make sure you’re only dropping rows based on the columns you truly don’t want NaNs for. You may have to combine different approaches if your logic is more complex (e.g., dropping rows if two specific columns are both NaN).
-
Performance:
- Dropping rows in large DataFrames can be expensive in memory and computation. Always validate that this operation is necessary, especially if you’re dealing with massive datasets.
Learn More About Pandas and Python
If you want to refine your data manipulation skills and overall Python proficiency, here are some recommended courses from DesignGurus.io:
-
Grokking Python Fundamentals
A deep dive into Python essentials, best practices, and real-world coding techniques that every developer should know. -
Grokking the Coding Interview: Patterns for Coding Questions
If you’re preparing for interviews at tech giants, learn the high-level patterns to tackle coding problems with confidence.
For engineers aiming at system design or data engineering roles:
- Grokking System Design Fundamentals
Learn how to architect robust, scalable, and maintainable systems—vital skills for senior engineering roles.
Final Thoughts
Dropping rows based on NaN values in a particular column is usually done via:
df_dropped = df.dropna(subset=["column_name"])
or by filtering with a boolean mask:
df_dropped = df[df["column_name"].notna()]
Choose the method that best fits your workflow, and you’ll maintain clean, consistent data for analysis or further processing. Happy data wrangling!