mysql query to update db QUICKLY

guest2013-1

guest
Joined
Aug 22, 2003
Messages
19,800
Reaction score
13
Hi, yay for another stupid mysql question.

I have a table that needs to be normalized. Unfortunately this table contains 7.7 million products.

Now each of them has a manufacturer NAME on the same record. Many of which have the same manufacturers. So in an effort to change the existing database (owner approved finally) and to speed up the search based on manufacturer. I've added a ManufacturerID in the products table, and created a separate consolidated manufacturer table/list.

(Obviously you can see this will improve speed on a search massively since the previous developers never tested on this amount of data before)

My question now is, what is the MOST efficient AND QUICK way to update that table?

All the tables are MyISAM and will stay MyISAM unless you can prove to me InnoDB is superior.

I have a stored routine that has a cursor loop through each product and passes it's ID and manufacturer name through. Then I have an update statement that looks like this:

Code:
UPDATE xml_record_product SET ManufacturerID = (select ManufacturerID from ms_tblmanufacturer where ltrim(rtrim(Manufacturer)) = p_Manufacturer)
        WHERE product_id = p_ProductID;

Anything to help improve would be greatly appreciated. As it looks like this will run for almost 4 days non-stop (and providing no locks occur, which it will) just to update that.

And I have several tables to normalize still.
 
The code you've shown is very inefficient because that select statement has to execute for each row in xml_record_product.

I'm not sure whether MySQL can do this, but in MS Sql you can updated based on join criteria.

Something like:

Code:
UPDATE
    A
SET
    A.ManufacturerId = B.ManufacturerId
FROM
    xml_record_product A
    
    INNER JOIN ms_tblmanufacturer B ON
        A.ManufacturerId = B.ManufacturerId

If that works, remember to add the WHERE clause for productID
 
Last edited:
Thanks hyperian, and coming from a MS SQL background I was very disappointed to see that mysql doesn't support that... and I'm not sure how efficient joining on 2 text fields are (I'm actually updating the product table with the manufacturer ID to be able to do that in future perhaps)
 
...OR you can DTS to your local MS SQL Server, run hyperian query, then DTS back to Mysql. Will that be too much?
 
already played around with indexes, the one table is loaded in memory etc, still only updating 12 records per second which is too slow considering 7 million needs to run
 
Without knowing how the table data is user it become difficult to make big changes. Depending on the selects and how often old data is accessed you could always make it a merge table. This will massively increase the read and write of the table but would make large selects slower if they are going over multiple merged tables. If old data is rarely accessed you could do this. Other than that, indexes are your first and best bet.

EDIT: Looking at the type of data you are storing it is not date specific so I would probably stay away from merge tables, my bad
 
Last edited:
Without knowing how the table data is user it become difficult to make big changes. Depending on the selects and how often old data is accessed you could always make it a merge table. This will massively increase the read and write of the table but would make large selects slower if they are going over multiple merged tables. If old data is rarely accessed you could do this. Other than that, indexes are your first and best bet.

EDIT: Looking at the type of data you are storing it is not date specific so I would probably stay away from merge tables, my bad

Yea the exercise now is to normalize the data.

For example, if I wanted to do a search by manufacturer (it's a text box search from the site) I had to go (shorthand) select * from producttable where manufacturer like (%search%) (that or match against, depending if I were using fulltext or not)

What I'm doing now is storing the X amount of manufacturers in their own table (there's about 100k manufacturers for all 7.7 million products) and instead of doing an entire table scan text "like" search on 7.7 rows, I'll do it on 100k (which is considerably faster you'd agree)

One product is manufactured by 1 guy, so I need to store the manufacturer ID in the table row by matching the one from the consolidated (distinct) list I just created, with the one in the row of the product, and update that row. Hey presto, normalized and a speedier query.

There's one more thing I had to actually update on the row itself, the rest of the normalization shouldn't really affect the table at all. The table is actively used to search for products, added to the basket and then paid for. So if I were to do this on the current live set of data, I'd run into constant locking issues. So I'm probably going to make a restore of the current DB before I start making these changes.

We're also implementing RAID10 to help speed up the reading and writing of the drives itself because I think MySQL sometimes have issues with the writing side of things. Should also provide a significant improvement.
 
From what I've read it is possible to updated using a join in MySQL: http://www.java2s.com/Code/SQL/Join/JOINtwotablesinupdatecommand.htm

That is actually the first thing I've tried. I have a development database of just over 1.7 million records (not even the full 7.7 million on the live) and it took 4 hours to run.... and 0 records updated.... so yea... the method I described with the cursor so far actually does about 12-20 records a second, but i need it faster ;)

I think I'll try something else, load the manufacturers table into memory as well so the only thing that happens is the update, rest should be fast, unless the damn update is ****ed up and doing the slowness
 
At work the guys had to also cleanup a table and found it was quicker to do it in batches. Think the magic number 10'000 increments using offset and limit (sorry cannot remember MySQL commands for that).
 
No worries, thanks for the tip, I'll use offset and limit to try do it in batches
 
Hey Acid... here's my 2cents (only saw this post now). My MySql knowledge is fairly limited (ok, a lot limited... but perhaps the same principles apply)

If the update is failing you (wierd, that should be reasonable) and you want to try something completely different... the MS version of what i'd suggest is SqlBulkCopy. Create a new product table with the right structure then in batches of 100000 or so, pull the data from the old table, joined on the manufacturer table and insert using a bulk copy (i think the closest equivalent in MySql that I can find now online is LOAD). I doubt you can do it straight in the DB... in .Net, i'd run a datareader for the select statement reading based on the non-clustered key in chunks (say 100 000 at a time), and bulk copy with increments of 20 000 (dunno if mysql's load will have increments).

I'm working on a project where we insert massive amounts of data (over a billion rows in some of the tables) on a somewhat-regular basis. The imports would take a few weeks if we used normal inserts. There's nothing faster than bulk copying the data into the DB where the DB does not have to worry about validating much.

Something to note... Indexes severely degrade insert performance, even with bulk copy, so a slow down is sometimes related to that (not that there's anything you can do about it if you need the indexes).

Good Luck
 
Thanks for the info, I'll look at what I can do. I'm more and more inclined to just tell the client we're going "MS SQL" because I never (in my life) had to tweak and configure so much just for a database to work correctly.

I might try removing all the indexes and go from there. Actually thought about it this morning to just re-create the table.... might work!
 
Top
Sign up to the MyBroadband newsletter
X