Logo

How to select the first row of each set of rows grouped with a GROUP BY?

Sometimes you need to retrieve just the first row (or another “top” row by some ordering) from each group in a query. While GROUP BY is great for aggregations, it doesn’t natively let you pick a “first row” based on an ordering. Below are several approaches to achieve this in SQL Server, including using window functions, self-joins, and correlated subqueries.

1. Using ROW_NUMBER() Over a Partition

The most common and clean approach in modern SQL Server is to use a window function—specifically ROW_NUMBER()—and then filter for the row you want.

Example Schema

Suppose you have a table Orders with these columns:

  • order_id
  • customer_id
  • order_date
  • total_amount

And you want the oldest order (by order_date) for each customer_id.

Query Using ROW_NUMBER()

WITH RankedOrders AS ( SELECT order_id, customer_id, order_date, total_amount, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date ASC ) AS rn FROM Orders ) SELECT order_id, customer_id, order_date, total_amount FROM RankedOrders WHERE rn = 1;

Explanation

  1. Window Function:
    • PARTITION BY customer_id divides the data into groups by customer_id.
    • ORDER BY order_date ASC assigns a sequential number within each group, sorted by the earliest order date first.
  2. Filter: In the outer query, WHERE rn = 1 selects the earliest order (the “first row”) in each partition.

Advantages

  • Flexibility: Easily change the ORDER BY to pick the row by different criteria (e.g., maximum order_date).
  • Readability: Clearly shows which row is chosen and why.

2. Using a Correlated Subquery

Another approach is a correlated subquery that finds the minimum (or maximum) of some column, which identifies the “first” row. Then you join back to the table to get the other columns.

Example

SELECT o.order_id, o.customer_id, o.order_date, o.total_amount FROM Orders o WHERE o.order_date = ( SELECT MIN(o2.order_date) FROM Orders o2 WHERE o2.customer_id = o.customer_id );

Explanation

  • For each row in Orders (aliased as o), we find the minimum order_date for that same customer_id in a subquery.
  • We only select rows where o.order_date matches that minimum date—giving us the first order per customer.

Caveats

  • If there can be ties in order_date for the same customer_id, you might get multiple rows.
  • This can be less performant for large datasets because the subquery runs per row unless the optimizer can optimize it with an index.

3. Using a Self-Join

You can also join the table to itself and filter rows that are not strictly earlier than any other row in the same group.

SELECT o1.order_id, o1.customer_id, o1.order_date, o1.total_amount FROM Orders o1 LEFT JOIN Orders o2 ON o1.customer_id = o2.customer_id AND o2.order_date < o1.order_date WHERE o2.order_id IS NULL;

Explanation

  • LEFT JOIN: We match each row o1 with other rows o2 having the same customer_id but an earlier order_date.
  • Filter: If no o2 exists (meaning o2.order_id IS NULL), o1 must be the earliest order.

Caveats

  • Similar performance concerns for large tables.
  • More verbose and less intuitive than using window functions.

4. Choosing the Right Approach

  1. Row-Number Approach

    • Recommended for most modern SQL Server scenarios.
    • Very readable and flexible.
  2. Correlated Subquery or Self-Join

    • Useful if you’re in an environment not supporting advanced window functions (e.g., older SQL versions).
    • Check performance on large tables—ensure proper indexing.
  3. Tie-Breakers

    • If multiple rows can “tie” for first place (e.g., same order_date), think about whether you want a single row or multiple rows. The window function approach can be extended with secondary columns in ORDER BY to break ties.

5. Step Up Your SQL Skills

If you’d like to enhance your SQL proficiency—covering everything from basic queries to advanced concepts like window functions, indexing, and performance tuning—the following courses from DesignGurus.io can help:

6. Mock Interviews and More

  • Mock Interviews: Ace your interview with personalized feedback from ex-FAANG engineers.
  • Interview BootCamp: For a structured 12-week plan, check out the Interview BootCamp that covers coding, system design, and behavioral strategies.
  • Watch and Learn: Visit the DesignGurus YouTube Channel for free deep dives into SQL, system design, and coding topics.

Conclusion

Selecting the “first row” of each group is a classic requirement in SQL. The window function (ROW_NUMBER()) approach is typically the cleanest and most efficient for modern SQL Server versions. However, you can also use correlated subqueries or self-joins if needed. By understanding all these methods—and knowing when to apply each—you’ll be well-prepared to handle real-world data analysis tasks and nail SQL interview questions.

For more hands-on practice and structured learning, check out the comprehensive database and SQL courses on DesignGurus.io. Happy querying!

CONTRIBUTOR
TechGrind