Need to host 1TB of docs online, quickstart guide / pointers needed

Vis1/0N

Expert Member
Joined
Mar 10, 2009
Messages
2,745
Reaction score
552
Location
Durban
I need to host from 1TB of pdf files online, securely. Apart from the initial upload (or uplift of HD to service provider), monthly upload will be in the region of 15GB.

End users will start 5, although may scale to 50+ and their traffic should be low, should not amount to more than 500MB/month. I hope to have a simply asp.net site to allow for the queries and will scale up in future.

I don't have much infrastructure on my side, so I need guidance on the most economical option. I have zero experience with any cloud services, and don't know if it would be more feasible to host a machine on my side or to use a cloud services provider. What can I expect the ballpark to be on this amount of cloud storage (1TB)?
 
One option is to use OneDrive (1TB free) or OneDrive for Business (unlimited space) if you/users have Office 365.

The challenge is initial upload.
 
You can use this to calculate the cost to host the data on Amazon AWS S3: http://calculator.s3.amazonaws.com/index.html
Select the S3 tab on the left. S3 has built in redundancy and will allow you grow your data to petabytes without doing anything special.

Same kind of thing for Azure. Prices here: http://azure.microsoft.com/en-us/pricing/details/storage/

I have not used Azure for this, so cannot speak in details for Azure, but S3 is very easy to use and there is a 5GB free tier, so you can play around. http://docs.aws.amazon.com/AmazonS3/latest/gsg/GetStartedWithS3.html
 
I need to host from 1TB of pdf files online, securely. Apart from the initial upload (or uplift of HD to service provider), monthly upload will be in the region of 15GB.

End users will start 5, although may scale to 50+ and their traffic should be low, should not amount to more than 500MB/month. I hope to have a simply asp.net site to allow for the queries and will scale up in future.

I don't have much infrastructure on my side, so I need guidance on the most economical option. I have zero experience with any cloud services, and don't know if it would be more feasible to host a machine on my side or to use a cloud services provider. What can I expect the ballpark to be on this amount of cloud storage (1TB)?

If you're planning to handle the files programmatically and give users access via a web page, using object storage in the background really is the way to go. S3 as mentioned above is one option, Rackspace Cloud Files is another. It offers an API, your files are private unless you choose to make it public (i.e. via CDN), and the pricing is slightly less horrible than Amazon's.
 
If you're planning to handle the files programmatically and give users access via a web page, using object storage in the background really is the way to go. S3 as mentioned above is one option, Rackspace Cloud Files is another. It offers an API, your files are private unless you choose to make it public (i.e. via CDN), and the pricing is slightly less horrible than Amazon's.

??
http://www.rackspace.co.uk/cloud/files/pricing
vs
http://aws.amazon.com/s3/pricing/ & http://azure.microsoft.com/en-us/pricing/details/storage/
 
Azure pricing looks the most appealing, Amazon was decent enough in certain configurations which was over my head. I guess I have to consult properly with someone experienced to guide me. If we had business ADSL ( we are on iburst) I would just stick the stuff onto a server 2003 machine and quick front end with asp.net 3.5 and be up and running in Jan 2015 as the boss requests, hardening and growing it over time.

The front end will be quite complex (featured) in future but making the docs available online is important check box to some current clients whose IT policy won't allow or provision a R4K NAS onsite (and server space is expensive*)

* 200GB @ over R28K on server, whilst the R4K NAS would have been perfect, the client departments really don't need RAID or backups, or anti virus protection on the drive.

Will download some Azure videos and look into it. Kimsufi and AWS push up with Windows licensing costs and it will be a while for me to be up to speed with php/linux.
 
As far as I know AWS and Azure pricing for object storage is pretty much the same for the most part. Might be different in small details.

S3 does not involve any Windows licensing cost. S3 is just object storage. You can see it like an almost infinite database in the cloud where you put and get objects securely. You can put and get them, or the client's browser can get them. You can put and get them using signed URL's, so that your webpage will just have a link to the file, with a URL signed with a temporary access key and when the person click on the link, it will download the file directly from S3 to the person's PC or phone or whatever.

Here is the link for the SDK: http://aws.amazon.com/sdk-for-net/ to generate the signed URL's.

Example code: http://docs.aws.amazon.com/AmazonS3/latest/dev/ShareObjectPreSignedURLDotNetSDK.html

Initially you can upload the files using the AWS console website and then hack together your asp.net page with generated signed URL's to fetch the files. Then later you can maybe add something to upload the files to S3 in some way other than the console.

I would be surprised if it takes you more than an hour or so to get it all working. It is really easy. And you should be able to do all your initial work on the free tier, so it will only start costing you money when you start using it heavily.
 
That is not what the Rackspace pricing page says.

Sorry, I misspoke (or miswrote?). RS charges for storage and bandwidth, but the bandwidth is only if you use CDN, while the OP indiced they their files to be private, so no CDN.
 
S3 is just object storage. You can see it like an almost infinite database in the cloud where you put and get objects securely. You can put and get them, or the client's browser can get them. You can put and get them using signed URL's, so that your webpage will just have a link to the file, with a URL signed with a temporary access key and when the person click on the link, it will download the file directly from S3 to the person's PC or phone or whatever.

I have a virtual file system where the files are (mostly tif and some pdfs) are stored in password protected zip files between 100MB-300MB, so after a document search when the link is selected the files are to be extracted, converted to pdf if necessary and then streamed via the browser. This architecture was a reasonable compromise between storing 2 million files on a filesystem, or storing them in a database.

I could convert tif files to pdf, but with +2 million files of which only 0.5% could ever be accessed I would prefer to convert on the fly, as well as keep them in of 9000-20000 managable archives.
 
OK, that adds some extra complications. In that case you probably want to host the web application that does the extraction and conversion close to the data, so this could be an option: http://aws.amazon.com/net/ using Elastic Beanstalk. You can still store the zip files in S3 and then when one is needed, get it from S3 in your asp.net application, extract, convert and serve.

One option if you want to be more reliable is put the extracted/converted pdf file back in S3 in a different bucket with an expiry time and serve that link to the user, so then the user can download the file at whatever speed his internet connection can handle, even it it takes a couple of hours, without tying up resources in your asp.net application. And the expiry time for the file means it will be cleaned up automatically. And again, you can sign the URL as mentioned above so that only that user can download his file that you have generated for him.
 
I would get Google drive with the 1TB space for $9.99

Fr me its the best file cloud service out there.
 
Top
Sign up to the MyBroadband newsletter
X