How to select the first row of each set of rows grouped with a GROUP BY?
Sometimes you need to retrieve just the first row (or another “top” row by some ordering) from each group in a query. While GROUP BY is great for aggregations, it doesn’t natively let you pick a “first row” based on an ordering. Below are several approaches to achieve this in SQL Server, including using window functions, self-joins, and correlated subqueries.
1. Using ROW_NUMBER()
Over a Partition
The most common and clean approach in modern SQL Server is to use a window function—specifically ROW_NUMBER()
—and then filter for the row you want.
Example Schema
Suppose you have a table Orders with these columns:
order_id
customer_id
order_date
total_amount
And you want the oldest order (by order_date
) for each customer_id
.
Query Using ROW_NUMBER()
WITH RankedOrders AS ( SELECT order_id, customer_id, order_date, total_amount, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date ASC ) AS rn FROM Orders ) SELECT order_id, customer_id, order_date, total_amount FROM RankedOrders WHERE rn = 1;
Explanation
- Window Function:
PARTITION BY customer_id
divides the data into groups bycustomer_id
.ORDER BY order_date ASC
assigns a sequential number within each group, sorted by the earliest order date first.
- Filter: In the outer query,
WHERE rn = 1
selects the earliest order (the “first row”) in each partition.
Advantages
- Flexibility: Easily change the ORDER BY to pick the row by different criteria (e.g., maximum
order_date
). - Readability: Clearly shows which row is chosen and why.
2. Using a Correlated Subquery
Another approach is a correlated subquery that finds the minimum (or maximum) of some column, which identifies the “first” row. Then you join back to the table to get the other columns.
Example
SELECT o.order_id, o.customer_id, o.order_date, o.total_amount FROM Orders o WHERE o.order_date = ( SELECT MIN(o2.order_date) FROM Orders o2 WHERE o2.customer_id = o.customer_id );
Explanation
- For each row in Orders (aliased as
o
), we find the minimumorder_date
for that samecustomer_id
in a subquery. - We only select rows where
o.order_date
matches that minimum date—giving us the first order per customer.
Caveats
- If there can be ties in
order_date
for the samecustomer_id
, you might get multiple rows. - This can be less performant for large datasets because the subquery runs per row unless the optimizer can optimize it with an index.
3. Using a Self-Join
You can also join the table to itself and filter rows that are not strictly earlier than any other row in the same group.
SELECT o1.order_id, o1.customer_id, o1.order_date, o1.total_amount FROM Orders o1 LEFT JOIN Orders o2 ON o1.customer_id = o2.customer_id AND o2.order_date < o1.order_date WHERE o2.order_id IS NULL;
Explanation
- LEFT JOIN: We match each row
o1
with other rowso2
having the samecustomer_id
but an earlierorder_date
. - Filter: If no
o2
exists (meaningo2.order_id IS NULL
),o1
must be the earliest order.
Caveats
- Similar performance concerns for large tables.
- More verbose and less intuitive than using window functions.
4. Choosing the Right Approach
-
Row-Number Approach
- Recommended for most modern SQL Server scenarios.
- Very readable and flexible.
-
Correlated Subquery or Self-Join
- Useful if you’re in an environment not supporting advanced window functions (e.g., older SQL versions).
- Check performance on large tables—ensure proper indexing.
-
Tie-Breakers
- If multiple rows can “tie” for first place (e.g., same
order_date
), think about whether you want a single row or multiple rows. The window function approach can be extended with secondary columns inORDER BY
to break ties.
- If multiple rows can “tie” for first place (e.g., same
5. Step Up Your SQL Skills
If you’d like to enhance your SQL proficiency—covering everything from basic queries to advanced concepts like window functions, indexing, and performance tuning—the following courses from DesignGurus.io can help:
-
Grokking SQL for Tech Interviews
Perfect for mastering common interview-style SQL questions, including window function queries and optimization. -
Grokking Database Fundamentals for Tech Interviews
Learn the core principles of database design, indexing, normalization, and transaction management. -
Relational Database Design and Modeling for Software Engineers
Dive deeper into advanced design best practices and real-world modeling scenarios.
6. Mock Interviews and More
- Mock Interviews: Ace your interview with personalized feedback from ex-FAANG engineers.
- Interview BootCamp: For a structured 12-week plan, check out the Interview BootCamp that covers coding, system design, and behavioral strategies.
- Watch and Learn: Visit the DesignGurus YouTube Channel for free deep dives into SQL, system design, and coding topics.
Conclusion
Selecting the “first row” of each group is a classic requirement in SQL. The window function (ROW_NUMBER()
) approach is typically the cleanest and most efficient for modern SQL Server versions. However, you can also use correlated subqueries or self-joins if needed. By understanding all these methods—and knowing when to apply each—you’ll be well-prepared to handle real-world data analysis tasks and nail SQL interview questions.
For more hands-on practice and structured learning, check out the comprehensive database and SQL courses on DesignGurus.io. Happy querying!