Logo

How to sum a variable by group in R?

A straightforward way to compute sums by group is to use tapply():

# Example data df <- data.frame( group = c("A", "A", "B", "B", "A"), value = c(10, 5, 7, 3, 12) ) # Sum of 'value' grouped by 'group' tapply(df$value, df$group, sum)

tapply() splits df$value by df$group and applies the sum function, returning the sums for each group as a named vector.

Using dplyr

The dplyr package offers a clear, pipe-friendly syntax:

library(dplyr) df %>% group_by(group) %>% summarize(total_value = sum(value))

Here, group_by(group) partitions the data by the group column, and summarize(total_value = sum(value)) calculates the sum in each group.

Using data.table

If you prefer the data.table package:

library(data.table) dt <- as.data.table(df) dt[, .(total_value = sum(value)), by = group]

data.table is known for its efficiency with large datasets.

More Resources

If you’re preparing for technical interviews or want to build solid data manipulation skills, check out:

For mastering system architecture concepts, explore Grokking System Design Fundamentals. If you want personalized feedback, consider a Coding Mock Interview with ex-FAANG engineers. Finally, check out the DesignGurus.io YouTube channel for more tutorials.

CONTRIBUTOR
TechGrind