How to Eliminate Duplicate Records in SQL

SQL eliminates duplicate records using SELECT DISTINCT for unique results, or ROW_NUMBER with a CTE to delete duplicate rows permanently.

Duplicate records enter databases through import errors, missing unique constraints, and application bugs. Left alone, they inflate report counts, distort analytics, and slow down queries. Whether you need unique output for a dashboard or permanent table cleanup, the right approach depends on how you define “duplicate” and what you want to keep. This article covers the three main strategies — output-only dedup with SELECT DISTINCT, diagnostic grouping with GROUP BY and HAVING, and permanent deletion using ROW_NUMBER() in a CTE — plus the alternatives that handle edge cases.

What Does “Duplicate” Actually Mean In Your SQL Table?

A duplicate is defined by your business rule, not by every column being identical. Two rows with the same email address are duplicates in a user table even if their last_login timestamps differ. Before writing any dedup query, decide which columns form the unique key. That decision determines the partition columns in ROW_NUMBER() and the group columns in GROUP BY. Without a clear definition, every dedup attempt will either miss rows or delete the wrong ones.

Once the key columns are set, the rest of the process follows the same shape regardless of which columns you use.

Finding Duplicates With GROUP BY And HAVING

Before deleting anything, identify which rows are duplicated and how many copies exist. A GROUP BY query with a HAVING clause does that without touching the data.

SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

This returns each duplicate group and its row count. Run it first to confirm your duplicate definition catches the right rows and to get a sense of how much data you are about to clean. Use GROUP BY with all key columns when the duplicate key is composite — for example, GROUP BY customer_id, order_date.

Removing Duplicates From Query Results With SELECT DISTINCT

When you only need unique output — a report, an export, or a dashboard feed — SELECT DISTINCT is the fastest tool. It returns one row per unique combination of the selected columns without modifying the underlying table.

SELECT DISTINCT email, first_name, last_name
FROM users;

The database engine compares every row in the result set and suppresses duplicates. DISTINCT applies to all columns in the SELECT list, so two rows that differ on any returned column are kept as separate rows. On large datasets this can be expensive, but for ad hoc analysis and one-time exports it is the simplest path to clean output.

Eliminating Duplicate Records Permanently With ROW_NUMBER() And A CTE

To delete extra rows from the table itself while keeping one row per duplicate group, the standard modern approach uses ROW_NUMBER() inside a common table expression. This method requires window-function support, which Microsoft introduced in SQL Server 2005 and is now standard in PostgreSQL, MySQL 8+, Oracle, and other major databases.

The pattern assigns a sequential number to each row within a duplicate group, ordered by your keeper rule. Rows with a number greater than 1 are the extras and get deleted.

WITH DuplicateCTE AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY email
            ORDER BY user_id
        ) AS RowNum
    FROM users
)
DELETE FROM DuplicateCTE WHERE RowNum > 1;

The PARTITION BY clause defines the duplicate key — one or more columns separated by commas. The ORDER BY inside the window decides which row survives. In the example above, the row with the smallest user_id stays and newer duplicates are removed. Swap the order to user_id DESC to keep the most recent row instead.

Method	Best For	Key Limitation
SELECT DISTINCT	Unique query output without table changes	Does not modify the table; only affects the result set
ROW_NUMBER() + CTE	Permanent deletion with precise keeper control	Requires window-function support (SQL Server 2005+)
GROUP BY + HAVING	Auditing duplicates before any deletion	Only identifies duplicates; does not remove them
Self-join DELETE	Cleanup without window functions	Complex syntax, easy to delete the wrong rows
Temp-table rebuild	Large tables needing a clean verified copy	Destructive until the new table is validated
GROUP BY + MIN(id) subquery	Simple keeper rule with a single key column	Clunky with composite keys, harder to read
DELETE with NOT IN subquery	Quick one-off cleanup in smaller tables	Slow on large tables; `NOT IN` with NULLs can fail silently

Alternative Patterns — Self-Join And Temp-Table Workflows

When window functions are not available — older database versions, limited permissions on a hosted platform — two alternative patterns get the job done.

Self-join delete. Join the table to itself on the duplicate key columns and delete the row with the higher key value. The ORDER BY keeper logic is replaced by a WHERE clause that compares IDs.

DELETE u1
FROM users u1
INNER JOIN users u2
    ON u1.email = u2.email
    AND u1.user_id > u2.user_id;

This deletes every row where a matching row with a smaller user_id exists — effectively keeping the oldest entry per email.

Temp-table rebuild. Copy unique rows into a staging table, verify the row count, truncate the original table, and reinsert the clean data. Microsoft’s official guidance demonstrates this as a two-step approach: SELECT DISTINCT ... INTO a backup table, then DELETE matching rows from the original and reinsert. The full workflow looks like this:

SELECT DISTINCT * INTO users_backup FROM users;
TRUNCATE TABLE users;
INSERT INTO users SELECT * FROM users_backup;
DROP TABLE users_backup;

The temp-table method is safest when you verify the row count between each step and keep a full backup until the cleanup is confirmed.

Microsoft’s detailed walkthrough of both the duplicate-table and ROW_NUMBER() approaches is available in SQL Server’s official dedup documentation.

Choosing The Right Dedup Method For Your Situation

No single method fits every scenario. The table below maps real-world situations to the recommended approach and explains why.

Situation	Recommended Method	Why
Ad hoc report or export	SELECT DISTINCT	Fastest path to unique output, no table changes
Permanent cleanup with a specific keeper row	ROW_NUMBER() + CTE	Precise control via the ORDER BY clause
Audit duplicates before any changes	GROUP BY + HAVING	Zero risk of accidental deletion
No window function support in your database	Self-join or temp-table rebuild	Works on older MySQL, SQL Server 2000, and restricted environments
Very large table with no backup window	ROW_NUMBER() with batch delete	Can be wrapped in a loop to avoid transaction log bloat
Need a verified clean copy before switching	Temp-table rebuild	Lets you validate row counts and indexes before swapping

Common Mistakes That Derail Dedup Queries

Even experienced developers hit these traps. Run through this checklist before executing any dedup logic on production data.

Defining duplicates by the wrong columns. A row is only a duplicate if its key columns match another row. Adding a timestamp column to the PARTITION BY clause by accident means every row looks unique and nothing gets deleted.
Using SELECT DISTINCT when you meant to delete rows. DISTINCT never touches the table. If the goal is permanent cleanup, DISTINCT only hides the problem temporarily.
Omitting the ORDER BY in ROW_NUMBER(). The ORDER BY determines which row survives. Without it, the query either fails or picks an arbitrary row, which can surprise you on the next run.
Skipping the pre-deletion audit. A GROUP BY ... HAVING COUNT(*) > 1 query costs almost nothing and reveals exactly how many duplicates exist. Running a delete without this check can wipe more rows than expected.
Deleting without a backup on large tables. The temp-table rebuild method is inherently destructive until the new table passes verification. Keep a full backup until the cleanup is confirmed.

Dedup Logic You Can Apply Today

Start with the GROUP BY ... HAVING audit to confirm your duplicate definition. For one-time output, use SELECT DISTINCT. For permanent cleanup with a predictable keeper row, the ROW_NUMBER() CTE pattern is the most precise option. Test the query on a copy of the table or inside a transaction so you can roll back if the result isn’t what you expected. With the duplicate key defined and the keeper rule set, any of these methods produces clean, reliable data.

References & Sources

Microsoft Learn. “How to remove duplicate rows from a SQL Server table by using a script.” Official guidance with ROW_NUMBER() and duplicate-table examples.