Blob vs File system storage

koeksGHT

Dealer
Joined
Aug 5, 2011
Messages
11,857
Don't store large things/blobs in database, makes it k@k slow. Move to folder, give unique name and just store location of file.
 

Pakka

Expert Member
Joined
Jul 22, 2010
Messages
2,271
We've used blobs in a DB for replication's sake - it works a charm.

Key is to have a good DB design and store the blob in a separate table which only ever needs to be queried when you actually want the file, all other meta data in a separate table.

If it wasn't for DB replication, we could've gone the file system way. Of course you could use CDN as suggested but in our scenario we had 3x private windows servers already so forking out more for a CDN wouldn't have been neccessary.

This also raises another question. What if you have a site where thumbnails and full size images are shown. Do you store one file in the DB or file system, and generate a thumbnail from it when needed (resource intensive) or do you save two files - one for thumbnail and one full size. But what then if you later on decide your thumbnails should be 300x300 instead of say the 100x100 that you initially saved it as...
 
Last edited:

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
CDN will resolve any scalability issue you may have.

Blob is free CDN is not - at least not that I am aware of? ( I am not too clued up on a CDN, but it feels there will have to be a significant amount of work to be done in order to implement a CDN correctly and I would be one of those people who would rather do cdn.thisdomain.co.za since places like trello has 40 cdns all over the place making it a nightmare to whitelist.

I went with the file system method, although I am going to setup a mirror of this project one with file system and one with blob and then will put it on githib and then like everyone to test the bezeezus out of it so we can see where the threshold is like when does the inode overhead become an issue.

Google recommends images stored as blobs - https://cloud.google.com/storage-options/
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
I'm interested in this but can't find a refernce on the linked page ..
Unless you're meaning Google Cloud Storage (which is their offering of EC2) or Firebase which is backed by GCS .. ?

Ahh sorry I did not think of it as EC2, I was selective reading

2016-11-11_12-34-11.jpg

What I ended up doing is serving the following from the DB

- Favicons
- Sprites
- Invoice header
- Email header

The rest - User generated images -> all file system

I chose this based on what I read here

http://www.sqlite.org/intern-v-extern-blob.html

I understand sqlite is not mySQL, but I also read this -> https://www.microsoft.com/en-us/research/wp-content/uploads/2006/04/tr-2006-45.pdf which is MS SQL again not mySQL, but the resemblance is clear objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors.

The biggest issue with this is that I need to test this in a modern world as all the research on it is ancient that article is from 2006 so SSDs would not have been something they would be able to imagine.

So now I think it becomes a DB vs SSD I/O question.


Does anyone have an account to this service? -> https://www.mockaroo.com/

I would like to download 500,000 rows and play around.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
Blob is free CDN is not - at least not that I am aware of? ( I am not too clued up on a CDN, but it feels there will have to be a significant amount of work to be done in order to implement a CDN correctly and I would be one of those people who would rather do cdn.thisdomain.co.za since places like trello has 40 cdns all over the place making it a nightmare to whitelist.

I went with the file system method, although I am going to setup a mirror of this project one with file system and one with blob and then will put it on githib and then like everyone to test the bezeezus out of it so we can see where the threshold is like when does the inode overhead become an issue.

Google recommends images stored as blobs - https://cloud.google.com/storage-options/
CDNs are in practice no more difficult to employ than a local file system store; usage is naturally based on reducing latency by transparently routing requests to the nearest CDN cache (nearest i.t.o. latency and/or geography)
The replication between CDN nodes is usually automatically done by the provider: you essentially treat it similar to a single file store. Many providers start off free for small usage patterns and scale up costs in proportionate with volume and subscribed services.
 

Hamster

Resident Rodent
Joined
Aug 22, 2006
Messages
42,920
I get the appeal but trust me the majority of people have no clue about hosted file repos or imeger and the likes and want if they are down etc etc.

I want it easy, drag image here.

Done.
Gravatar?

Anyway, Microsoft did a study a good couple of years back regarding this and found that if images are smaller than a specific size it's better to go database.
 

Thor

Honorary Master
Joined
Jun 5, 2014
Messages
44,236
Ahh sorry I did not think of it as EC2, I was selective reading

View attachment 400595

What I ended up doing is serving the following from the DB

- Favicons
- Sprites
- Invoice header
- Email header

The rest - User generated images -> all file system

I chose this based on what I read here

http://www.sqlite.org/intern-v-extern-blob.html

I understand sqlite is not mySQL, but I also read this -> https://www.microsoft.com/en-us/research/wp-content/uploads/2006/04/tr-2006-45.pdf which is MS SQL again not mySQL, but the resemblance is clear objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors.

The biggest issue with this is that I need to test this in a modern world as all the research on it is ancient that article is from 2006 so SSDs would not have been something they would be able to imagine.

So now I think it becomes a DB vs SSD I/O question.


Does anyone have an account to this service? -> https://www.mockaroo.com/

I would like to download 500,000 rows and play around.

Gravatar?

Anyway, Microsoft did a study a good couple of years back regarding this and found that if images are smaller than a specific size it's better to go database.
Yea link to the study is above in my reply.

Issue is its 2006, SSDs might give it a run for its money
 

Rocket-Boy

Honorary Master
Joined
Jul 31, 2007
Messages
10,199
Yea link to the study is above in my reply.

Issue is its 2006, SSDs might give it a run for its money

If its a huge amount of images then you wont be storing it on SSD's due to higher size/cost ratios. If there arent enough images to be concerned with storage then just go blob for simplicity.
 

[)roi(]

Executive Member
Joined
Apr 15, 2005
Messages
6,282
1. From a MySQL (MyISAM) table with 2 columns: ID (int, auto_increment) and DATA (mediumblob)
2. Using read file.
3. The third test method, "FS", simply loads the image over HTTP directly, without any intermediary scripts.

images.png
The results are the average of running Apache Benchmark 10 times: 10 concurrent requests, 1000 requests per run.

Reference: http://blog.lick-me.org/2013/01/repeat-after-me-mysql-is-not-a-filesystem/

Do your tests if you must, but speed is certainly not going to be a reason; then add all the other caveats and you'll realise why I said its never a good idea.
 
Last edited:
Top