Mysql删除某几个字段重复的行

脏数据是一个比较恶心的事情，每次处理都很麻烦。如果只是找以单字段重复的行，那还比较简单，可以看 https://www.yiibai.com/mysql/delete-duplicate-rows.html 即可。但是实际工作中会有这样的情况：“找到按某几个字段的重复的行并且删除”。

举个例子：

上面那个例子里cluster_name和num这两个字段相同才算是一个重复行，而env和dt_date是可以不同的。那么首先先获取到重复的列的方法：

SELECT p1.id 
from `naxxramas_slowsql_status` p1,`naxxramas_slowsql_status` p2 
where p1.dt_date ='20210419' and p1.id<>p2.id and p1.cluster_name = p2.cluster_name and p1.num = p2.num and p1.dt_date = p2.dt_date 
group by p1.cluster_name;

上面那个语句很简单，就是取出id不同但是cluster_name和num又相同的单列，而不是把所有的列都取出来。

那么如果要删除对应的行的语句如下：

DELETE FROM `naxxramas_slowsql_status` 
where id IN (SELECT id FROM 
(SELECT p1.id from `naxxramas_slowsql_status` p1,`naxxramas_slowsql_status` p2 where p1.dt_date ='20210330' and p1.id<>p2.id and p1.cluster_name = p2.cluster_name and p1.num = p2.num and p1.dt_date = p2.dt_date 
group by p1.cluster_name) 
AS temp);

这里要注意一下，不可以这么写：

DELETE FROM `naxxramas_slowsql_status` 
where id IN (
SELECT p1.id 
from `naxxramas_slowsql_status` p1,`naxxramas_slowsql_status` p2 
where p1.dt_date ='20210330' and p1.id<>p2.id and p1.cluster_name = p2.cluster_name and p1.num = p2.num and p1.dt_date = p2.dt_date 
group by p1.cluster_name);

这么写会报错：You can't specify target table 'naxxramas_slowsql_status' for update in FROM clause。这句话的意思是“不能在同一语句中更新select出的同一张表元组的属性值”，所以就要用上面的方法：将select出的结果通过中间表再select一遍。

或者使用left join, 比如要update，使用：

1
2
3

update x set available_material_id = null 
where id not in 
(select id from x where additional_info = 1);

就会报错：You can't specify target table 'x' for update in FROM clause。使用left join的sql就是：

update x left join
       x xx
       on x.id = xx.id and xx.additional_info = 1
    set available_material_id = null
    where xx.id is null;