Storing large files

CheekyC

Senior Member
Joined
Nov 25, 2013
Messages
515
Reaction score
0
I need to store lots (eventually 100 000's) of large (2-200MB) files as part of an app. The files need to be encrypted, but that's an aside. Would you store it on the file system of the server and link to it from the DB or store it in some kind of non-SQL database?
 
Well a "Filesystem" is sort of a non-sql database :-) So yes store on the filesystem and keep the reference to the file in a SQL database, Seeing it needs to be encrypted, I assume its sensitive info, so therefore only the APP should have access to the path where the files are stored, I.E if its a web server, only the server side of the web server should be allowed to get to the files, so that someone cant just go http://yourwebserver.com/files/encryptedfile.dat to download it.
 
My advice would be to look at Amazon Storage spaces or Azure Storage due to the file count and the file size you are looking at . Then allow your application to connect to it .
My idea would be simply put:
Azure Storage
Azure Sql Database
Azure Website
It will all scale very easily
 
Last edited:
My advice would be to look at Amazon Storage spaces or Azure Storage due to the file count and the file size you are looking at . Then allow your application to connect to it .
My idea would be simply put:
Azure Storage
Azure Sql Database
Azure Website
It will all scale very easily

ok will look at this. Is international hosting of large files the right way to go considering international bandwidth / hard capping etc etc. The app will be predominantly SA used.
Yes from a cost point of view it will be very cheap. For straight file system storage I was worried about max number of files in a directory.
 
It is just as secure if not more secure than hosting it locally. Azure, not sure about Amazon , is entirely POPI compliant.
Hard capping and slow international bandwidth are relatively old issues to worry about. Is your app going to be Commerical or consumer based ?
 
Hard capping and slow international bandwidth are relatively old issues to worry about. Is your app going to be Commerical or consumer based ?

Point taken. Not sure if I understand your question,but the site will be an e-commerce type site used by the general public but also other stakeholders
 
If you're using SQL Server, you can use FILESTREAM access to store the files outside SQL and protect the folder with Encrypting File System (EFS).
Also, if you are using Windows Server 2012 or newer, you should be able to use Data Deduplication to reduce the storage size of the files.
Please provide more details about the app and what technologies you prefer to use.
 
If you're using SQL Server, you can use FILESTREAM access to store the files outside SQL and protect the folder with Encrypting File System (EFS).
Also, if you are using Windows Server 2012 or newer, you should be able to use Data Deduplication to reduce the storage size of the files.
Please provide more details about the app and what technologies you prefer to use.

In it's basic form, users can upload files to the web app for later use by other parties. I would prefer to use C# for backend as this is where my skills lie. I have also invested lots of time into a framework that will help develop the app much quicker. From a DB point of view, not likely MS as it;s very expensive, but I will keep open mind. Probably Firebird or Postgress
 
See this discussion as well which is similar: http://mybroadband.co.za/vb/showthr...-docs-online-quickstart-guide-pointers-needed

From your description, Amazon S3 or Azure Object Storage sounds like a perfect match. Much better than some kind of server file system where you constantly have to worry about running out of space, crashing your app. On Amazon you can then use DynamoDB or RDS (PostgreSQL etc) to track the objects. All of these have a free tier you can use for your testing phase: http://aws.amazon.com/free/ Check the Azure website for their options, which I know less about.

As you say, if you store the files in a normal file system, you have to spread them over directories to avoid hitting file limits, making for extra complications. With S3, this is also not an issue.

Also, with S3, you can serve users signed url's to time limited decrypted copies also stored in S3, for them to download, rather than putting serious strain on the app's web server by serving the decrypted copies directly from memory. (If you are serving decrypted files.)

Other links for S3 and C#: http://aws.amazon.com/sdk-for-net/ & http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_NET.html
 
I'll throw in my two-cents. Dealt with this issue before :)

Consider the following:
- For how long do you need to store these files?
- how much traffic your web application will have which will use the feature to store files (seeing it's e-commerce, it would likely be a lot of people)
- File size limitations
- Additional code to write to possible 1) scan the files for malignency (such as malware), 2) encrypting the files, 3) compressing the files and 4) streaming methods to store these files to a cloud-based storage instance or file system

- File Systems have a disadvantage considering the following:
> Costs involved storing it on File System (which basically acts as a non-sql solution)
> Amount of reads/writes to these files
> Proper handling of file accessibility
> Proper handling of file in-use state (making sure that the system you are using won't have the documents open while you attempt to write to it again)
> Proper handling of file uniqueness (sometimes, just having the file name won't be good enough - how will you determine whether two files of the same name won't overwrite each other? what if two files have the exact same content but different file names?)
> Impact on disk for storing files of different sizes and how you would access the files (for instance, searching for exact file
names or looping through batches of files - here we go pro. Understand branch predictions and how this could influence the performance of your web application)
> Finally, usage of ILLEGAL names for files / directories should you choose to have your files not uniquely identified by a GUID or similar mechanism (which will require some serious coding to ensure you ALWAYS retrieve the right files) (Illegal names such as CON, PRN or NULL w/o file extensions for example)

For your situation, I would attempt to find hosting for MySQL databases (since you ruled out using MSSQL), or have a cluster of MSSQL Express databases hosted somewhere (since there is a size limitation for these types of databases). After scanning, encrypting and compressing these files, you might want to consider temporarily caching files and write it in batches to your DB (considering the size of your files) to reduce the amount of I/O read/write time and possible network load. You may also want a scanning mechanism to ensure file content uniqueness (answered beautifully by this post - filtering out exact same files), having a second table containing pointers to these files.

Heed some of the advice on the posts before mine - Amazon and Azure are great platforms to work with. If costs are an issue, you would likely find an alternative to hosting Here

Hope this advice helps :)

**Edit**
Never worked with amazon or azure before, don't know costs involved there. Had the luck of working with hosted MSSQL instances most of the time.
 
Last edited:
Given the file sizes, I do not think any kind of database is a good idea. Databases are not optimized for that kind of thing. I think CheekyC has to looks at object storage (Amazon S3 or Azure Object Storage) or file system based solutions. To avoid the long term admin work, I would recommend object storage over file system storage.

Then obviously a database is ideal for tracking the file metadata and for that part of the system pretty much any database will work.
 
Given the file sizes, I do not think any kind of database is a good idea. Databases are not optimized for that kind of thing. I think CheekyC has to looks at object storage (Amazon S3 or Azure Object Storage) or file system based solutions. To avoid the long term admin work, I would recommend object storage over file system storage.

Then obviously a database is ideal for tracking the file metadata and for that part of the system pretty much any database will work.

We store our large files outside of the DB with metadata about the files in the DB. Users can request files from a screen in our corporate app, which copies the requested file to a "Dropbox" location. It's based on a schedular which only copies the file if there is space at the destination. They get an email that the transfer is pending if there isn't space. Files in the drop location are deleted after 24hrs. As soon as the file has been transferred they get an email that it's ready and will be removed after 24hrs.

Users need to be setup and approved for it. We store metadata about all requests and all users are made aware that it is reviewed by the dreaded infosec dept.

This system works well. Stable, effecient. Been in place since the 80's, I believe. It's simple Unix based system that uses a combo of c++ and shl scripts to do its thing. Very cheap :) no frills :)
 
Last edited:
See this discussion as well which is similar: http://mybroadband.co.za/vb/showthr...-docs-online-quickstart-guide-pointers-needed

From your description, Amazon S3 or Azure Object Storage sounds like a perfect match. Much better than some kind of server file system where you constantly have to worry about running out of space, crashing your app. On Amazon you can then use DynamoDB or RDS (PostgreSQL etc) to track the objects. All of these have a free tier you can use for your testing phase: http://aws.amazon.com/free/ Check the Azure website for their options, which I know less about.

Also, with S3, you can serve users signed url's to time limited decrypted copies also stored in S3, for them to download, rather than putting serious strain on the app's web server by serving the decrypted copies directly from memory. (If you are serving decrypted files.)
Great info thanks. Tend to agree with you

I'll throw in my two-cents. Dealt with this issue before :)
Consider the following:
- For how long do you need to store these files?
- how much traffic your web application will have which will use the feature to store files (seeing it's e-commerce, it would likely be a lot of people)
- File size limitations
- Additional code to write to possible 1) scan the files for malignency (such as malware), 2) encrypting the files, 3) compressing the files and 4) streaming methods to store these files to a cloud-based storage instance or file system

Awesome info and perspective thanks!
 
Top
Sign up to the MyBroadband newsletter
X