Logo

How to filter Pandas dataframe using 'in' and 'not in' like in SQL?

In SQL, you might write queries like:

SELECT * FROM my_table WHERE col IN ('val1', 'val2', 'val3');

Or use NOT IN to exclude those values. In Pandas, you can achieve the same effect with the isin() method and a simple negation operator.

1. Filtering with “in”

Using isin()

To filter rows where a column has values in a specified list:

import pandas as pd df = pd.DataFrame({ "Name": ["Alice", "Bob", "Charlie", "David"], "Age": [25, 30, 35, 40], "City": ["New York", "Chicago", "San Francisco", "Chicago"] }) # We want rows where 'City' is in ['Chicago', 'New York'] cities = ["Chicago", "New York"] df_in = df[df["City"].isin(cities)] print(df_in)
  1. df["City"].isin(cities) returns a boolean Series indicating which rows have City in the specified list.
  2. Slicing df[...] with that condition returns the filtered rows.

2. Filtering with “not in”

Using ~ (tilde) to Negate

To exclude rows where a column is in some list, use the tilde (~) operator (the bitwise NOT in Python):

# Exclude rows where 'City' is in ['Chicago', 'New York'] df_not_in = df[~df["City"].isin(cities)] print(df_not_in)
  1. ~df["City"].isin(cities) is True only for rows that are not in the list.
  2. Slicing the DataFrame with this negated boolean mask excludes the unwanted rows.

3. Combining With Other Conditions

You can chain multiple conditions using & (logical AND) or | (logical OR). For example:

# Rows where Age > 30 AND City is NOT IN ['Chicago', 'New York'] df_filtered = df[(df["Age"] > 30) & (~df["City"].isin(cities))] print(df_filtered)

Edge Cases and Best Practices

  1. Missing Values: isin() handles NaN by returning False in most cases unless your list explicitly includes np.nan.
  2. Performance: For extremely large DataFrames, consider whether your data structure allows faster membership tests (e.g., using sets for quick lookups).
  3. Readability: Keep in mind that chaining multiple conditions can reduce readability. Consider breaking up complex filters into separate steps or using descriptive variable names.

Strengthen Your Python and Data Wrangling Skills

If you’re looking to improve your Python fundamentals—covering everything from language syntax and best practices to file I/O and working efficiently with libraries like Pandas—check out Grokking Python Fundamentals by DesignGurus.io. A solid understanding of Python can make data operations more intuitive, efficient, and error-free.

Also, check out the SQL and database-related courses offered by DesignGurus.io:

  1. Grokking SQL for Tech Interviews
    A comprehensive guide focusing on SQL query patterns frequently encountered in technical interviews. It covers essential topics such as data retrieval, joins, subqueries, and optimization strategies.

  2. Grokking Database Fundamentals for Tech Interviews
    Goes beyond basic query writing to explain relational database concepts, normalization, indexing, transactions, and other fundamentals critical for interview success.

With isin() and ~, you can recreate your favorite SQL-style “IN” and “NOT IN” queries directly in Pandas, keeping your data analysis pipeline smooth and Pythonic!

CONTRIBUTOR
TechGrind