Logo

How to find duplicate records in MySQL?

Finding duplicate records in MySQL is often necessary for maintaining data integrity and ensuring accuracy. By leveraging grouping functions and subqueries, you can quickly locate and analyze duplicate entries. Below are common methods, along with best practices to help keep your data clean.

1. Using GROUP BY and HAVING

A classic approach is using an aggregate function like COUNT(*), combined with GROUP BY and HAVING:

SELECT column_name, COUNT(*) AS occurrence FROM your_table GROUP BY column_name HAVING COUNT(*) > 1;
  • GROUP BY column_name: Groups rows that have the same value in column_name.
  • HAVING COUNT(*) > 1: Returns only those groups that appear more than once (duplicates).

2. Handling Multiple Columns

If a “duplicate” involves more than one column (for example, (first_name, last_name) must be unique together), simply list both in the GROUP BY clause:

SELECT first_name, last_name, COUNT(*) AS occurrence FROM employees GROUP BY first_name, last_name HAVING COUNT(*) > 1;

This identifies rows where both first_name and last_name combinations are repeated.

3. Correlated Subquery for Detailed Rows

Sometimes you want the actual row data (for instance, id, timestamps, or other columns) instead of just seeing the duplicate value counts. In that case, you can use a correlated subquery:

SELECT t1.* FROM your_table t1 WHERE t1.column_name = ( SELECT t2.column_name FROM your_table t2 WHERE t2.column_name = t1.column_name GROUP BY t2.column_name HAVING COUNT(*) > 1 );

This returns every row in t1 that belongs to a duplicate group in t2.

4. Best Practices

  1. Backup Before Deleting
    Always create a backup or snapshot before deleting or modifying suspected duplicates.
  2. Use Constraints
    If feasible, set a UNIQUE constraint on the relevant columns to prevent new duplicates.
  3. Regular Maintenance
    If duplicates appear often, evaluate your data insertion logic or ETL processes. Clean as you go to save time down the road.

5. Further Learning from DesignGurus.io

Maintaining data quality and mastering SQL go hand-in-hand. To level up your SQL and database skills, check out these recommended courses by DesignGurus.io:

Grokking SQL for Tech Interviews

Learn practical SQL querying techniques, advanced patterns, and performance tips that can help you handle real-world data challenges and excel in technical interviews.

Grokking Database Fundamentals for Tech Interviews

Dive deeper into database design, indexing, normalization, and transactions—vital concepts for building robust, scalable systems.

6. Practice with Mock Interviews

If you’re preparing for interviews at leading tech companies, consider Mock Interviews with ex-FAANG engineers at DesignGurus.io. You’ll receive personalized, real-time feedback on your SQL skills and overall interview strategy.

Conclusion

Identifying duplicates in MySQL is straightforward with the right queries. By combining GROUP BY and HAVING, you can spot repeated values, and with correlated subqueries, you can dig into specific duplicate rows. Following best practices such as data backups, proper constraints, and routine maintenance will help you maintain a cleaner, more efficient database in the long run.

CONTRIBUTOR
TechGrind