How can I remove duplicate rows?
Whether you’re cleaning up stale data, merging datasets, or just enforcing business rules, removing duplicates is a common task in SQL Server. Below are several methods you can use, each with its own advantages depending on your scenario.
1. Using a Temporary Table with DISTINCT
SELECT DISTINCT * INTO #TempTable FROM OriginalTable; -- (Optional) Truncate or drop the original table TRUNCATE TABLE OriginalTable; -- Reinsert the cleaned data INSERT INTO OriginalTable SELECT * FROM #TempTable; DROP TABLE #TempTable;
Key Points:
- The
DISTINCT
clause automatically filters out duplicates. - Requires a temporary holding table and additional insert operation.
- Best for smaller tables or one-time cleanups.
2. Using a CTE and ROW_NUMBER()
WITH Deduplicated AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY (SELECT NULL)) AS RowNum FROM YourTable ) DELETE FROM Deduplicated WHERE RowNum > 1;
Key Points:
- Leverages window functions to assign a unique row number for duplicates.
- Keeps the first occurrence of each duplicate set (based on
PARTITION BY
columns) and deletes the rest. - Ideal when you need flexibility in choosing which record to retain.
3. Using a Self-Join and DELETE
DELETE t1 FROM YourTable t1 JOIN YourTable t2 ON t1.Col1 = t2.Col1 AND t1.Col2 = t2.Col2 AND t1.PK < t2.PK;
Key Points:
- Compares rows based on matching columns (e.g.,
Col1
,Col2
) while retaining a single row by checking primary key differences. - Helpful when you have a clear, numeric primary key for ordering.
4. Best Practices for Preventing Duplicates
- Constraints: Use
PRIMARY KEY
orUNIQUE
constraints wherever applicable. - Data Normalization: Ensure your schema design follows normalization rules to avoid duplicate data structures.
- Input Validation: Validate or upsert (update/insert) records in the application layer before writing to the database.
Boost Your SQL Mastery with DesignGurus.io
- Grokking SQL for Tech Interviews – Strengthen your command over SQL queries, learn common interview pitfalls, and practice optimization techniques.
- Grokking Database Fundamentals for Tech Interviews – Gain a thorough understanding of relational schemas, indexing, and normalization strategies.
- Relational Database Design and Modeling for Software Engineers – Dive deeper into large-scale database design principles to build resilient and scalable data solutions.
You can also watch advanced SQL tutorials on the DesignGurus.io YouTube channel or sign up for Mock Interviews conducted by ex-FAANG engineers for personalized feedback on your SQL and database design approach.
Conclusion
Removing duplicates in SQL Server can be accomplished through multiple strategies—using DISTINCT
in a temporary table, leveraging ROW_NUMBER()
in a CTE, or employing a self-join with DELETE
. Whichever method you choose, make sure you implement preventative measures like unique constraints and normalization for data integrity. With the right design choices and consistent enforcement of uniqueness rules, your database will remain clean, efficient, and ready for future growth.