How to filter Pandas dataframe using 'in' and 'not in' like in SQL?
In SQL, you might write queries like:
SELECT * FROM my_table WHERE col IN ('val1', 'val2', 'val3');
Or use NOT IN
to exclude those values. In Pandas, you can achieve the same effect with the isin()
method and a simple negation operator.
1. Filtering with “in”
Using isin()
To filter rows where a column has values in a specified list:
import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Charlie", "David"], "Age": [25, 30, 35, 40], "City": ["New York", "Chicago", "San Francisco", "Chicago"] }) # We want rows where 'City' is in ['Chicago', 'New York'] cities = ["Chicago", "New York"] df_in = df[df["City"].isin(cities)] print(df_in)
df["City"].isin(cities)
returns a boolean Series indicating which rows haveCity
in the specified list.- Slicing
df[...]
with that condition returns the filtered rows.
2. Filtering with “not in”
Using ~
(tilde) to Negate
To exclude rows where a column is in some list, use the tilde (~
) operator (the bitwise NOT in Python):
# Exclude rows where 'City' is in ['Chicago', 'New York'] df_not_in = df[~df["City"].isin(cities)] print(df_not_in)
~df["City"].isin(cities)
isTrue
only for rows that are not in the list.- Slicing the DataFrame with this negated boolean mask excludes the unwanted rows.
3. Combining With Other Conditions
You can chain multiple conditions using &
(logical AND) or |
(logical OR). For example:
# Rows where Age > 30 AND City is NOT IN ['Chicago', 'New York'] df_filtered = df[(df["Age"] > 30) & (~df["City"].isin(cities))] print(df_filtered)
Edge Cases and Best Practices
- Missing Values:
isin()
handlesNaN
by returningFalse
in most cases unless your list explicitly includesnp.nan
. - Performance: For extremely large DataFrames, consider whether your data structure allows faster membership tests (e.g., using sets for quick lookups).
- Readability: Keep in mind that chaining multiple conditions can reduce readability. Consider breaking up complex filters into separate steps or using descriptive variable names.
Strengthen Your Python and Data Wrangling Skills
If you’re looking to improve your Python fundamentals—covering everything from language syntax and best practices to file I/O and working efficiently with libraries like Pandas—check out Grokking Python Fundamentals by DesignGurus.io. A solid understanding of Python can make data operations more intuitive, efficient, and error-free.
Also, check out the SQL and database-related courses offered by DesignGurus.io:
-
Grokking SQL for Tech Interviews
A comprehensive guide focusing on SQL query patterns frequently encountered in technical interviews. It covers essential topics such as data retrieval, joins, subqueries, and optimization strategies. -
Grokking Database Fundamentals for Tech Interviews
Goes beyond basic query writing to explain relational database concepts, normalization, indexing, transactions, and other fundamentals critical for interview success.
With isin()
and ~
, you can recreate your favorite SQL-style “IN” and “NOT IN” queries directly in Pandas, keeping your data analysis pipeline smooth and Pythonic!