How to join (merge) data frames (inner, outer, left, right) in R?
Joining (or merging) data frames in R is a crucial task for data wrangling and analysis. Whether you need an inner, outer, left, or right join, R provides multiple methods to unify data from different sources, enabling you to get the most out of your datasets.
2. Base R merge()
Function
2.1 Inner Join
- Usage:
merge(df1, df2, by = "common_column")
- Result: Returns only matching rows from both data frames, discarding unmatched rows.
2.2 Outer Join (Full)
- Usage:
merge(df1, df2, by = "common_column", all = TRUE)
- Result: Returns all rows from both data frames, matching data where possible. Unmatched rows get
NA
.
2.3 Left Join
- Usage:
merge(df1, df2, by = "common_column", all.x = TRUE)
- Result: Returns all rows from
df1
and only matching rows fromdf2
. Unmatched rows indf2
appear asNA
.
2.4 Right Join
- Usage:
merge(df1, df2, by = "common_column", all.y = TRUE)
- Result: Returns all rows from
df2
and only matching rows fromdf1
. Unmatched rows indf1
appear asNA
.
3. Using dplyr
Joins
After loading dplyr (library(dplyr)
), you can do:
- Inner Join:
df1 %>% inner_join(df2, by = "common_column")
- Outer (Full) Join:
df1 %>% full_join(df2, by = "common_column")
- Left Join:
df1 %>% left_join(df2, by = "common_column")
- Right Join:
df1 %>% right_join(df2, by = "common_column")
This syntax often feels more readable, especially when chaining multiple data transformations.
4. Next Steps and Helpful Resources
If you’re aiming to strengthen your overall coding skills for interviews, check out Grokking the Coding Interview: Patterns for Coding Questions or Grokking Data Structures & Algorithms for Coding Interviews. For understanding the fundamentals of designing scalable, robust systems, try Grokking System Design Fundamentals. For personalized feedback, consider scheduling a Coding Mock Interview with ex-FAANG engineers at DesignGurus.io. And don’t forget to visit the DesignGurus.io YouTube channel for additional video tutorials and insights.