快捷導(dǎo)航

mysql not in、left join、IS NULL、NOT EXISTS 效率問(wèn)題記錄

更新時(shí)間：2011年12月16日 12:03:56 作者：

mysql not in、left join、IS NULL、NOT EXISTS 效率問(wèn)題記錄，需要的朋友可以參考下。

NOT IN、JOIN、IS NULL、NOT EXISTS效率對(duì)比

語(yǔ)句一：select count(*) from A where A.a not in (select a from B)

語(yǔ)句二：select count(*) from A left join B on A.a = B.a where B.a is null

語(yǔ)句三：select count(*) from A where not exists (select a from B where A.a = B.a)

知道以上三條語(yǔ)句的實(shí)際效果是相同的已經(jīng)很久了，但是一直沒(méi)有深究其間的效率對(duì)比。一直感覺(jué)上語(yǔ)句二是最快的。
今天工作上因?yàn)橐獙?duì)一個(gè)數(shù)千萬(wàn)行數(shù)據(jù)的庫(kù)進(jìn)行數(shù)據(jù)清除，需要?jiǎng)h掉兩千多萬(wàn)行數(shù)據(jù)。大量的用到了以上三條語(yǔ)句所要實(shí)現(xiàn)的功能。本來(lái)用的是語(yǔ)句一，但是結(jié)果是執(zhí)行速度1個(gè)小時(shí)32分，日志文件占用21GB。時(shí)間上雖然可以接受，但是對(duì)硬盤(pán)空間的占用確是個(gè)問(wèn)題。因此將所有的語(yǔ)句一都換成語(yǔ)句二。本以為會(huì)更快。沒(méi)想到執(zhí)行40多分鐘后，第一批50000行都沒(méi)有刪掉，反而讓SQL SERVER崩潰掉了，結(jié)果令人詫異。試了試單獨(dú)執(zhí)行這條語(yǔ)句，查詢(xún)近一千萬(wàn)行的表，語(yǔ)句一用了4秒，語(yǔ)句二卻用了18秒，差距很大。語(yǔ)句三的效率與語(yǔ)句一接近。

第二種寫(xiě)法是大忌，應(yīng)該盡量避免。第一種和第三種寫(xiě)法本質(zhì)上幾乎一樣。

假設(shè)buffer pool足夠大，寫(xiě)法二相對(duì)于寫(xiě)法一來(lái)說(shuō)存在以下幾點(diǎn)不足：
（1）left join本身更耗資源（需要更多資源來(lái)處理產(chǎn)生的中間結(jié)果集）
（2）left join的中間結(jié)果集的規(guī)模不會(huì)比表A小
（3）寫(xiě)法二還需要對(duì)left join產(chǎn)生的中間結(jié)果做is null的條件篩選，而寫(xiě)法一則在兩個(gè)集合join的同時(shí)完成了篩選，這部分開(kāi)銷(xiāo)是額外的

這三點(diǎn)綜合起來(lái)，在處理海量數(shù)據(jù)時(shí)就會(huì)產(chǎn)生比較明顯的區(qū)別（主要是內(nèi)存和CPU上的開(kāi)銷(xiāo)）。我懷疑樓主在測(cè)試時(shí)buffer pool可能已經(jīng)處于飽和狀態(tài)，這樣的話(huà)，寫(xiě)法二的那些額外開(kāi)銷(xiāo)不得不借助磁盤(pán)上的虛擬內(nèi)存，在SQL Server做換頁(yè)時(shí)，由于涉及到較慢的I/O操作因此這種差距會(huì)更加明顯。

關(guān)于日志文件過(guò)大，這也是正常的，因?yàn)閯h除的記錄多嘛?？梢愿鶕?jù)數(shù)據(jù)庫(kù)的用途考慮將恢復(fù)模型設(shè)為simple，或者在刪除結(jié)束后將日志truncate掉并把文件shrink下來(lái)。

因?yàn)橐郧霸?jīng)作過(guò)一個(gè)對(duì)這個(gè)庫(kù)進(jìn)行無(wú)條件刪除的腳本，就是要?jiǎng)h除數(shù)據(jù)量較大的表中的所有數(shù)據(jù)，但是因?yàn)榭蛻?hù)要求，不能使用truncate table，怕破壞已有的庫(kù)結(jié)構(gòu)。所以只能用delete刪，當(dāng)時(shí)也遇到了日志文件過(guò)大的問(wèn)題，當(dāng)時(shí)采用的方法是分批刪除，在SQL2K中用set rowcount @chunk，在SQL2K5中用delete top @chunk。這樣的操作不僅使刪除時(shí)間大大減少，而且讓日志量大大減少，只增長(zhǎng)了1G左右。
但是這次清除數(shù)據(jù)的工作需要加上條件，就是delete A from A where ....后面有條件的。再次使用分批刪除的方法，卻已經(jīng)沒(méi)效果了。
不知您知不知道這是為什么。

mysql not in 和 left join 效率問(wèn)題記錄

首先說(shuō)明該條sql的功能是查詢(xún)集合a不在集合b的數(shù)據(jù)。
not in的寫(xiě)法

復(fù)制代碼代碼如下:

 
select add_tb.RUID 
from (select distinct RUID 
from UserMsg 
where SubjectID =12 
and CreateTime>'2009-8-14 15:30:00' 
and CreateTime<='2009-8-17 16:00:00' 
) add_tb 
where add_tb.RUID 
not in (select distinct RUID 
from UserMsg 
where SubjectID =12 
and CreateTime<'2009-8-14 15:30:00' 
) 

復(fù)制代碼代碼如下:

 
select a.ruid,b.ruid 
from(select distinct RUID 
from UserMsg 
where SubjectID =12 
and CreateTime >= '2009-8-14 15:30:00' 
and CreateTime<='2009-8-17 16:00:00' 
) a left join ( 
select distinct RUID 
from UserMsg 
where SubjectID =12 and CreateTime< '2009-8-14 15:30:00' 
) b on a.ruid = b.ruid 
where b.ruid is null 

復(fù)制代碼代碼如下:

 
select distinct a.RUID 
from UserMsg a 
left join UserMsg b 
on a.ruid = b.ruid 
and b.subjectID =12 and b.createTime < '2009-8-14 15:30:00' 
where a.subjectID =12 
and a.createTime >= '2009-8-14 15:30:00' 
and a.createtime <='2009-8-17 16:00:00' 
and b.ruid is null; 

復(fù)制代碼代碼如下:

 
select distinct a.ruid 
from UserMsg a 
where a.subjectID =12 
and a.createTime >= '2009-8-14 15:30:00' 
and a.createTime <='2009-8-17 16:00:00' 
and not exists ( 
select distinct RUID 
from UserMsg 
where subjectID =12 and createTime < '2009-8-14 15:30:00' 
and ruid=a.ruid 
) 

復(fù)制代碼代碼如下:

 
select a.ruid,b.ruid 
from( select distinct RUID 
from UserMsg 
where CreateTime >= '2009-8-14 15:30:00' 
and CreateTime<='2009-8-17 16:00:00' 
) a left join UserMsg b 
on a.ruid = b.ruid 
and b.createTime < '2009-8-14 15:30:00' 
where b.ruid is null;