The story of physically deleting 300 million records in MySQL

Introduction



Hey. I am ningenMe, a web developer.



As the title says, my story is about physically deleting 300 million records in MySQL.



I became interested in this, so I decided to make a memo (instruction).



Start - Alert



The batch server I use and maintain has a regular process that collects data for the last month from MySQL once a day.



Usually this process completes in about 1 hour, but this time it did not complete for 7 or 8 hours, and alert did not stop crawling out ...



Searching for a reason



I tried to restart the process, look at the logs, but saw nothing terrible.

The request was indexed correctly. But when I wondered what was going wrong, I realized that the size of the database is quite large.



hoge_table | 350'000'000 |


350 million records. The indexing seemed to be working correctly, just very slow.



The required data collection per month was about 12,000,000 records. It looks like the select command took a long time and the transaction was not executed for a long time.



DB



Basically, it is a table that grows by about 400,000 records every day. The database was supposed to collect data only for the last month, therefore, the calculation was on the fact that it will withstand exactly this amount of data, but, unfortunately, the rotate operation was not included.



This database was not developed by me. I took it over from another developer, so it felt like it was technical debt.



The moment came when the amount of data inserted daily became large and finally reached its limit. It is assumed that working with such a large amount of data, it would be necessary to separate them, but this, unfortunately, was not done.



And then I stepped in.



Correction



It was more rational to reduce the database itself and reduce the time for its processing than to change the logic itself.



The situation should change significantly if 300 million records were erased, so I decided to do so ... Eh, I thought it would definitely work.



Step 1



Having prepared a reliable backup, I finally started submitting requests.



「Submitting a request」



DELETE FROM hoge_table WHERE create_time <= 'YYYY-MM-DD HH:MM:SS';


「...」



「...」



“Hmm… No answer. Maybe the process is taking a long time? " - I thought, but just in case I looked in grafana and saw that the disk load was growing very quickly.

"Dangerous" - I thought again and immediately stopped the request.



Step 2



After analyzing everything, I realized that the amount of data was too large to delete everything in one go.



I decided to write a script that could delete about 1,000,000 records and ran it.



「I implement the script」



"Now it will definitely work," I thought



Step 3



The second method worked, but proved to be very time consuming.

To do everything neatly, without extra nerves, it would take about two weeks. But still, this scenario did not meet the service requirements, so I had to move away from it.



Therefore, here's what I decided to do:



Copy the table and rename



From the previous step, I realized that deleting such a large amount of data creates an equally large load. So I decided to create a new table from scratch using insert and move the data that I was going to delete into it.



| hoge_table     | 350'000'000|
| tmp_hoge_table |  50'000'000|


If you make the new table the same size as above, the processing speed should also become 1/7 faster.



After creating the table and renaming it, I started using it as the master table. Now if I drop a table with 300 million records, everything should be fine.

I found out that truncate or drop is less overhead than delete and decided to use that method.



Performance



「Submitting a request」



INSERT INTO tmp_hoge_table SELECT FROM hoge_table create_time > 'YYYY-MM-DD HH:MM:SS';


「...」

「...」

「uh ...?」



Step 4



Thought the previous idea would work, but after submitting insert request, multiple error appeared. MySQL is not sparing.



I was already so tired that I began to think that I didn't want to do this anymore.



I sat and thought and realized that maybe there were too many insert requests for one time ...

I tried to send an insert request for the amount of data that the database should process in 1 day. Happened!



Well, after that we continue to send requests for the same amount of data. Since we need to remove the monthly amount of data, we repeat this operation about 35 times.



Renaming a table



Here, luck was on my side: everything went smoothly.



Alert gone



Batch processing speed has increased.



Previously, this process took about an hour, now it takes about 2 minutes.



After I was convinced that all the problems were resolved, I dropped 300 million records. I deleted the table and felt reborn.



Summarizing



I realized that rotate processing was overlooked in batch processing and that was the main problem. Such a mistake in architecture is a waste of time.



Do you think about the data replication load by deleting records from the database? Let's not overload MySQL.



Those who are well versed in databases will definitely not face such a problem. For the rest, I hope this article was helpful.



Thanks for reading!



We will be very happy if you tell us if you liked this article, was the translation clear, was it useful to you?



All Articles