How can I remove duplicate rows?
Whether you’re cleaning up stale data, merging datasets, or just enforcing business rules, removing duplicates is a common task in SQL Server. Below are several methods you can use, each with its own advantages depending on your scenario.
1. Using a Temporary Table with DISTINCT
SELECT DISTINCT *
INTO #TempTable
FROM OriginalTable;
-- (Optional) Truncate or drop the original table
TRUNCATE TABLE OriginalTable;
-- Reinsert the cleaned data
INSERT INTO OriginalTable
SELECT *
FROM #TempTable;
DROP TABLE #TempTable;
Key Points:
- The
DISTINCTclause automatically filters out duplicates. - Requires a temporary holding table and additional insert operation.
- Best for smaller tables or one-time cleanups.
2. Using a CTE and ROW_NUMBER()
WITH Deduplicated AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY (SELECT NULL)) AS RowNum
FROM YourTable
)
DELETE FROM Deduplicated
WHERE RowNum > 1;
Key Points:
- Leverages window functions to assign a unique row number for duplicates.
- Keeps the first occurrence of each duplicate set (based on
PARTITION BYcolumns) and deletes the rest. - Ideal when you need flexibility in choosing which record to retain.
3. Using a Self-Join and DELETE
DELETE t1
FROM YourTable t1
JOIN YourTable t2
ON t1.Col1 = t2.Col1
AND t1.Col2 = t2.Col2
AND t1.PK < t2.PK;
Key Points:
- Compares rows based on matching columns (e.g.,
Col1,Col2) while retaining a single row by checking primary key differences. - Helpful when you have a clear, numeric primary key for ordering.
4. Best Practices for Preventing Duplicates
- Constraints: Use
PRIMARY KEYorUNIQUEconstraints wherever applicable. - Data Normalization: Ensure your schema design follows normalization rules to avoid duplicate data structures.
- Input Validation: Validate or upsert (update/insert) records in the application layer before writing to the database.
Boost Your SQL Mastery with DesignGurus.io
- Grokking SQL for Tech Interviews – Strengthen your command over SQL queries, learn common interview pitfalls, and practice optimization techniques.
- Grokking Database Fundamentals for Tech Interviews – Gain a thorough understanding of relational schemas, indexing, and normalization strategies.
- Relational Database Design and Modeling for Software Engineers – Dive deeper into large-scale database design principles to build resilient and scalable data solutions.
You can also watch advanced SQL tutorials on the DesignGurus.io YouTube channel or sign up for Mock Interviews conducted by ex-FAANG engineers for personalized feedback on your SQL and database design approach.
Conclusion
Removing duplicates in SQL Server can be accomplished through multiple strategies—using DISTINCT in a temporary table, leveraging ROW_NUMBER() in a CTE, or employing a self-join with DELETE. Whichever method you choose, make sure you implement preventative measures like unique constraints and normalization for data integrity. With the right design choices and consistent enforcement of uniqueness rules, your database will remain clean, efficient, and ready for future growth.
Recommended Courses