mysql query to update db QUICKLY

guest2013-1 · Jul 29, 2010

Hi, yay for another stupid mysql question.

I have a table that needs to be normalized. Unfortunately this table contains 7.7 million products.

Now each of them has a manufacturer NAME on the same record. Many of which have the same manufacturers. So in an effort to change the existing database (owner approved finally) and to speed up the search based on manufacturer. I've added a ManufacturerID in the products table, and created a separate consolidated manufacturer table/list.

(Obviously you can see this will improve speed on a search massively since the previous developers never tested on this amount of data before)

My question now is, what is the MOST efficient AND QUICK way to update that table?

All the tables are MyISAM and will stay MyISAM unless you can prove to me InnoDB is superior.

I have a stored routine that has a cursor loop through each product and passes it's ID and manufacturer name through. Then I have an update statement that looks like this:

Code:

UPDATE xml_record_product SET ManufacturerID = (select ManufacturerID from ms_tblmanufacturer where ltrim(rtrim(Manufacturer)) = p_Manufacturer)
        WHERE product_id = p_ProductID;

Anything to help improve would be greatly appreciated. As it looks like this will run for almost 4 days non-stop (and providing no locks occur, which it will) just to update that.

And I have several tables to normalize still.

hyperian · Jul 29, 2010

The code you've shown is very inefficient because that select statement has to execute for each row in xml_record_product.

I'm not sure whether MySQL can do this, but in MS Sql you can updated based on join criteria.

Something like:

Code:

UPDATE
    A
SET
    A.ManufacturerId = B.ManufacturerId
FROM
    xml_record_product A
    
    INNER JOIN ms_tblmanufacturer B ON
        A.ManufacturerId = B.ManufacturerId

If that works, remember to add the WHERE clause for productID

guest2013-1 · Jul 29, 2010

Thanks hyperian, and coming from a MS SQL background I was very disappointed to see that mysql doesn't support that... and I'm not sure how efficient joining on 2 text fields are (I'm actually updating the product table with the manufacturer ID to be able to do that in future perhaps)

hyperian · Jul 29, 2010

Hmmm ... that's a pity. Perhaps play around with indexes on the tables. I'm pretty sure that that will help improve the performance of the update.

hyperian · Jul 29, 2010

From what I've read it is possible to updated using a join in MySQL: http://www.java2s.com/Code/SQL/Join/JOINtwotablesinupdatecommand.htm

Fuma · Jul 29, 2010

...OR you can DTS to your local MS SQL Server, run hyperian query, then DTS back to Mysql. Will that be too much?

guest2013-1 · Jul 29, 2010

already played around with indexes, the one table is loaded in memory etc, still only updating 12 records per second which is too slow considering 7 million needs to run

James · Jul 29, 2010

Without knowing how the table data is user it become difficult to make big changes. Depending on the selects and how often old data is accessed you could always make it a merge table. This will massively increase the read and write of the table but would make large selects slower if they are going over multiple merged tables. If old data is rarely accessed you could do this. Other than that, indexes are your first and best bet.

EDIT: Looking at the type of data you are storing it is not date specific so I would probably stay away from merge tables, my bad

guest2013-1 · Jul 29, 2010

Heita_Hola said:
...OR you can DTS to your local MS SQL Server, run hyperian query, then DTS back to Mysql. Will that be too much?

a bit since I don't have MS SQL access at all

guest2013-1 · Jul 29, 2010

James said:
Without knowing how the table data is user it become difficult to make big changes. Depending on the selects and how often old data is accessed you could always make it a merge table. This will massively increase the read and write of the table but would make large selects slower if they are going over multiple merged tables. If old data is rarely accessed you could do this. Other than that, indexes are your first and best bet.

EDIT: Looking at the type of data you are storing it is not date specific so I would probably stay away from merge tables, my bad

Yea the exercise now is to normalize the data.

For example, if I wanted to do a search by manufacturer (it's a text box search from the site) I had to go (shorthand) select * from producttable where manufacturer like (%search%) (that or match against, depending if I were using fulltext or not)

What I'm doing now is storing the X amount of manufacturers in their own table (there's about 100k manufacturers for all 7.7 million products) and instead of doing an entire table scan text "like" search on 7.7 rows, I'll do it on 100k (which is considerably faster you'd agree)

One product is manufactured by 1 guy, so I need to store the manufacturer ID in the table row by matching the one from the consolidated (distinct) list I just created, with the one in the row of the product, and update that row. Hey presto, normalized and a speedier query.

There's one more thing I had to actually update on the row itself, the rest of the normalization shouldn't really affect the table at all. The table is actively used to search for products, added to the basket and then paid for. So if I were to do this on the current live set of data, I'd run into constant locking issues. So I'm probably going to make a restore of the current DB before I start making these changes.

We're also implementing RAID10 to help speed up the reading and writing of the drives itself because I think MySQL sometimes have issues with the writing side of things. Should also provide a significant improvement.

guest2013-1 · Jul 29, 2010

hyperian said:
From what I've read it is possible to updated using a join in MySQL: http://www.java2s.com/Code/SQL/Join/JOINtwotablesinupdatecommand.htm

That is actually the first thing I've tried. I have a development database of just over 1.7 million records (not even the full 7.7 million on the live) and it took 4 hours to run.... and 0 records updated.... so yea... the method I described with the cursor so far actually does about 12-20 records a second, but i need it faster

I think I'll try something else, load the manufacturers table into memory as well so the only thing that happens is the update, rest should be fast, unless the damn update is ****ed up and doing the slowness

Other Pineapple Smurf · Jul 29, 2010

At work the guys had to also cleanup a table and found it was quicker to do it in batches. Think the magic number 10'000 increments using offset and limit (sorry cannot remember MySQL commands for that).

guest2013-1 · Jul 29, 2010

No worries, thanks for the tip, I'll use offset and limit to try do it in batches

guest2013-1 · Jul 29, 2010

Nope, by the looks of things I'm stuck running this for AT LEAST 4 days...

shogun · Jul 30, 2010

Hey Acid... here's my 2cents (only saw this post now). My MySql knowledge is fairly limited (ok, a lot limited... but perhaps the same principles apply)

If the update is failing you (wierd, that should be reasonable) and you want to try something completely different... the MS version of what i'd suggest is SqlBulkCopy. Create a new product table with the right structure then in batches of 100000 or so, pull the data from the old table, joined on the manufacturer table and insert using a bulk copy (i think the closest equivalent in MySql that I can find now online is LOAD). I doubt you can do it straight in the DB... in .Net, i'd run a datareader for the select statement reading based on the non-clustered key in chunks (say 100 000 at a time), and bulk copy with increments of 20 000 (dunno if mysql's load will have increments).

I'm working on a project where we insert massive amounts of data (over a billion rows in some of the tables) on a somewhat-regular basis. The imports would take a few weeks if we used normal inserts. There's nothing faster than bulk copying the data into the DB where the DB does not have to worry about validating much.

Something to note... Indexes severely degrade insert performance, even with bulk copy, so a slow down is sometimes related to that (not that there's anything you can do about it if you need the indexes).

Good Luck

guest2013-1 · Jul 30, 2010

Thanks for the info, I'll look at what I can do. I'm more and more inclined to just tell the client we're going "MS SQL" because I never (in my life) had to tweak and configure so much just for a database to work correctly.

I might try removing all the indexes and go from there. Actually thought about it this morning to just re-create the table.... might work!

Join the MyBroadband community

Get started

mysql query to update db QUICKLY

guest2013-1

guest

hyperian

Expert Member

guest2013-1

guest

hyperian

Expert Member

hyperian

Expert Member

Fuma

Executive Member

guest2013-1

guest

James

Expert Member

guest2013-1

guest

guest2013-1

guest

guest2013-1

guest

Other Pineapple Smurf

Honorary Master

guest2013-1

guest

guest2013-1

guest

shogun

Expert Member

guest2013-1

guest