快捷導(dǎo)航

MySQL 快速刪除大量數(shù)據(jù)（千萬(wàn)級(jí)別）的幾種實(shí)踐方案詳解

更新時(shí)間：2020年07月27日 12:28:41 作者：CoderBaby

這篇文章主要介紹了MySQL 快速刪除大量數(shù)據(jù)（千萬(wàn)級(jí)別）的幾種實(shí)踐方案詳解，文中通過示例代碼介紹的非常詳細(xì)，對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值，需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧

筆者最近工作中遇見一個(gè)性能瓶頸問題，MySQL表，每天大概新增776萬(wàn)條記錄，存儲(chǔ)周期為7天，超過7天的數(shù)據(jù)需要在新增記錄前老化。連續(xù)運(yùn)行9天以后，刪除一天的數(shù)據(jù)大概需要3個(gè)半小時(shí)（環(huán)境：128G, 32核，4T硬盤），而這是不能接受的。當(dāng)然如果要整個(gè)表刪除，毋庸置疑用

TRUNCATE TABLE就好。

最初的方案（因?yàn)槲搭A(yù)料到刪除會(huì)如此慢），代碼如下（最簡(jiǎn)單和樸素的方法）:

delete from table_name where cnt_date <= target_date

后經(jīng)過研究，最終實(shí)現(xiàn)了飛一般(1秒左右)的速度刪除770多萬(wàn)條數(shù)據(jù)，單張表總數(shù)據(jù)量在4600萬(wàn)上下，優(yōu)化過程的方案層層遞進(jìn)，詳細(xì)記錄如下：

批量刪除（每次限定一定數(shù)量），然后循環(huán)刪除直到全部數(shù)據(jù)刪除完畢；同時(shí)key_buffer_size 由默認(rèn)的8M提高到512M

運(yùn)行效果：刪除時(shí)間大概從3個(gè)半小時(shí)提高到了3小時(shí)

（1）通過limit(具體size 請(qǐng)酌情設(shè)置）限制一次刪除的數(shù)據(jù)量，然后判斷數(shù)據(jù)是否刪除完，附源碼如下（Python實(shí)現(xiàn)）：

def delete_expired_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 delete_sql = "DELETE from table_name where cnt_date<='%s' limit 50000" % day
 query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day
 try: 
  df = pd.read_sql(query_sql, mysqlconn)
  while True:
   if df is None or df.empty:
    break
   mysqlcur.execute(delete_sql)
   mysqlconn.commit()

   df = pd.read_sql(query_sql, mysqlconn)
 except:
  mysqlconn.rollback()

（2）增加key_buffer_size

mysqlcur.execute("SET GLOBAL key_buffer_size = 536870912")

key_buffer_size是global變量，詳情參見Mysql官方文檔：https://dev.mysql.com/doc/refman/5.7/en/server-configuration.html

DELETE QUICK + OPTIMIZETABLE

適用場(chǎng)景：MyISAM Tables

Why: MyISAM刪除的數(shù)據(jù)維護(hù)在一個(gè)鏈表中，這些空間和行的位置接下來(lái)會(huì)被Insert的數(shù)據(jù)復(fù)用。直接的delete后，mysql會(huì)合并索引塊，涉及大量?jī)?nèi)存的拷貝移動(dòng)；而OPTIMIZE TABLE直接重建索引，即直接把數(shù)據(jù)塊情況，再重新搞一份（聯(lián)想JVM垃圾回收算法）。

運(yùn)行效果：刪除時(shí)間大3個(gè)半小時(shí)提高到了1小時(shí)40分

具體代碼如下：

def delete_expired_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 delete_sql = "DELETE QUICK from table_name where cnt_date<='%s' limit 50000" % day
 query_sql = "select srcip from table_name where cnt_date <= '%s' limit 1" % day
 optimize_sql = "OPTIMIZE TABLE g_visit_relation_asset"
 try: 
  df = pd.read_sql(query_sql, mysqlconn)
  while True:
   if df is None or df.empty:
    break
   mysqlcur.execute(delete_sql)
   mysqlconn.commit()

   df = pd.read_sql(query_sql, mysqlconn)
  mysqlcur.execute(optimize_sql)
  mysqlconn.commit()
 except:
  mysqlconn.rollback()

表分區(qū)，直接刪除過期日期所在的分區(qū)（最終方案—秒殺）

MySQL表分區(qū)有幾種方式，包括RANGE、KEY、LIST、HASH，具體參見官方文檔。因?yàn)檫@里的應(yīng)用場(chǎng)景日期在變化，所以不適合用RANGE設(shè)置固定的分區(qū)名稱，HASH分區(qū)更符合此處場(chǎng)景

（1）分區(qū)表定義，SQL語(yǔ)句如下：

ALTER TABLE table_name PARTITION BY HASH(TO_DAYS(cnt_date)) PARTITIONS 7;

TO_DAYS將日期（必須為日期類型，否則會(huì)報(bào)錯(cuò):Constant, random or timezone-dependent expressions in (sub)partitioning function are not allowed）轉(zhuǎn)換為天數(shù)（年月日總共的天數(shù)），然后HASH；建立7個(gè)分區(qū)。實(shí)際上，就是 days MOD 7。

（2）查詢出需要老化的日期所在的分區(qū),SQL語(yǔ)句如下：

"explain partitions select * from g_visit_relation_asset where cnt_date = '%s'" % expired_day

執(zhí)行結(jié)果如下（partitions列即為所在分區(qū)）：

+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | table_name | p1 | ALL | cnt_date_index | NULL | NULL | NULL | 1325238 | 100.00 | Using where |
+----+-------------+------------------+------------+------+----------------+------+---------+------+---------+----------+-------------+
1 row in set, 2 warnings (0.00 sec)

（3）OPTIMIZE or REBUILD partition,SQL語(yǔ)句如下：

"ALTER TABLE g_visit_relation_asset OPTIMIZE PARTITION '%s'" % partition

完整代碼如下【Python實(shí)現(xiàn)】，循環(huán)刪除小于指定日期的數(shù)據(jù)：

def clear_partition_data(mysqlconn, day):
 mysqlcur = mysqlconn.cursor()
 expired_day = day
 query_partition_sql = "explain partitions select * from table_name where cnt_date = '%s'" % expired_day
 # OPTIMIZE or REBUILD after truncate partition
 try: 
  while True:
   df = pd.read_sql(query_partition_sql, mysqlconn)
   if df is None or df.empty:
    break
   partition = df.loc[0, 'partitions']
   if partition is not None:
    clear_partition_sql = "alter table table_name TRUNCATE PARTITION %s" % partition
    mysqlcur.execute(clear_partition_sql)
    mysqlconn.commit()

    optimize_partition_sql = "ALTER TABLE table_name OPTIMIZE PARTITION %s" % partition
    mysqlcur.execute(optimize_partition_sql)
    mysqlconn.commit()
   
   expired_day = (expired_day - timedelta(days = 1)).strftime("%Y-%m-%d")
   df = pd.read_sql(query_partition_sql, mysqlconn)
 except:
  mysqlconn.rollback()

其它

如果刪除的數(shù)據(jù)超過表數(shù)據(jù)的百分之50，建議拷貝所需數(shù)據(jù)到臨時(shí)表，然后刪除原表，再重命名臨時(shí)表為原表，附MySQL如下：

 INSERT INTO New
  SELECT * FROM Main
   WHERE ...; -- just the rows you want to keep
 RENAME TABLE main TO Old, New TO Main;
 DROP TABLE Old; -- Space freed up here

可通過： ALTER TABLE table_name REMOVE PARTITIONING 刪除分區(qū)，而不會(huì)刪除相應(yīng)的數(shù)據(jù)

參考：

1）https://dev.mysql.com/doc/refman/5.7/en/alter-table-partition-operations.html具體分區(qū)說明

2）http://mysql.rjweb.org/doc.php/deletebig#solutions 刪除大數(shù)據(jù)的解決方案

本文版權(quán)歸作者和博客園共有，歡迎轉(zhuǎn)載，但未經(jīng)作者同意必須保留此段聲明，且在文章頁(yè)面明顯位置給出原文連接，否則保留追究法律責(zé)任的權(quán)利。

************************************************************************

精力有限，想法太多，專注做好一件事就行

我只是一個(gè)程序猿。5年內(nèi)把代碼寫好，技術(shù)博客字字推敲，堅(jiān)持零拷貝和原創(chuàng)寫博客的意義在于打磨文筆，訓(xùn)練邏輯條理性，加深對(duì)知識(shí)的系統(tǒng)性理解；如果恰好又對(duì)別人有點(diǎn)幫助，那真是一件令人開心的事

到此這篇關(guān)于MySQL 快速刪除大量數(shù)據(jù)（千萬(wàn)級(jí)別）的幾種實(shí)踐方案詳解的文章就介紹到這了,更多相關(guān)MySQL 快速刪除大量數(shù)據(jù)內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: