Local distributed squid cluster

andres101 · Apr 22, 2009

Here is the idea.... Everyone willing to participate, opens up their squid for cache queries (on local bandwidth), not allowing requests to be forwarded. Not sure about the terminology... but (I think) you configure the other caches as peers and not parents/children.

When you browse, your cache queries the peer/neighbour caches to see if the content is cached and downloads it from a peer, if available.

Things to consider:

Scalability
This setup should be scalable since a new "server" is added for every "client".

Bandwidth
How much bandwidth does the ICP protocol consume for the average user? I think that Cache Digests are the way to go.

Registration
Everyone wishing to participate should register on a central server. The server will keep a list of (dynamic) domain names that the caches can refresh on a daily basis. If someone provided falsified information, or their cache is not configured correctly, the server will notify them and remove them from the list.

Security
Configure either iptables or squid to only accept queries from the list of peers. How safe is it to open up squid to semi-trusted peers? SSL should possibly be used.
Possibility and consequences of someone injecting malicious content into their cache (Upside-Down-Ternet).

Abuse
As always, people will try to abuse the service. The originating IP address will be recorded in the logs. Is this enough or should more be required at registration?

Did I miss anything?

Anyone willing to run a "beta" test or help out with the squid config? We should start off with just a couple of caches and expand slowly, taking note of the bandwidth/performance characteristics.

davemc · Apr 22, 2009

I must pay for bandwidth for others to use my cache'd content?
Erm, no.

andres101 · Apr 22, 2009

davemc said:
I must pay for bandwidth for others to use my cache'd content?
Erm, no.

And they pay for your use...

A 10GB IS prepaid local-only ADSL account costs R70... which is half a cent per MB. International bandwidth costs 10 times that. If everyone has a 1:1 upload/download ratio, you should get a monetary saving of close to 90% (on cached content).

davemc · Apr 22, 2009

Aaaah. Excuse my ignorance.

Let me get this right:
- my cache server will be distributing a list of URL's that it has cached to all my subscriber cache servers, which will have an impact on my available "upload" bandwidth. I must just ensure that I do not have too many subscribers, otherwise, my distribution of what is available will clog up my service.
- my subscribers will also be sending me url's that they have cached, and I will have to somehow optimise which one I must grab content from when I need it.
- I will log all content issued to my subscribers, and charge them for the content that they receive from me, obviously of-setting with the content that they send me.

Berry berrry interesting. I will have to take a much closer look at squid's capabilities.

How do we test this?

andres101 · Apr 22, 2009

davemc said:
Aaaah. Excuse my ignorance.

Excused

davemc said:
Let me get this right:
- my cache server will be distributing a list of URL's that it has cached to all my subscriber cache servers, which will have an impact on my available "upload" bandwidth. I must just ensure that I do not have too many subscribers, otherwise, my distribution of what is available will clog up my service.

The more peers you have, the better the chance of finding the content locally. What impact the number of peers have on your bandwidth can only be determined by running a trial, but I think that limiting the bandwidth to 20% of your upload capacity should be acceptable for both ends.

davemc said:
- my subscribers will also be sending me url's that they have cached, and I will have to somehow optimise which one I must grab content from when I need it.

squid does all the work for you. it automatically determines which peer is closest and what content is available at which peer.

davemc said:
- I will log all content issued to my subscribers, and charge them for the content that they receive from me, obviously of-setting with the content that they send me.

I don't like the idea of money having to exchange hands, it just complicates things. But I do understand that their is a need to ensure that your peers do not download much more than they upload.
Perhaps the central server could keep track of each persons upload/download ratio and somehow penalise the heavy downloaders or reward the heavy uploaders. It could possibly give a higher priority to peers with a low ratio, meaning that if content is available in more than one cache, the content will be fetched from the peer with the lowest ratio.

davemc said:
Berry berrry interesting. I will have to take a much closer look at squid's capabilities.

this is a good place to start.

davemc said:
How do we test this?

For the caches to reach each other, you need either a static IP or a DynDNS (or similar) account. Otherwise there is no way for my cache to reach yours.
Install squid and configure it with the correct peers.

My personal preference would be to run two instances of squid. One for myself (local+international) and one for my peers (international only). There is no point in me serving cached content over local bandwidth if the content they are looking for is already available locally.

Nickste · Apr 23, 2009

This is an interesting idea.

What about implementing it with a Hamachi (www.hamachi.cc) type vpn network? That way everyone has a "static" hamachi ipaddress (allowing for easier user tracking) and you can control access to the cluster more easily. The squid servers will obviously then act as gateways out of the hamachi network. All that said, I haven't used Hamachi since it was bought by LogMeIn, so I'm not sure what it's like any more.

I think that the squid configuration file will be very important in this kind of project's success. Getting a setup that will cache things like flash files (youtube), etc. is tricky, but would obviously be of great benefit.

Let me know if you need help testing - I've got a squid box running at the moment (unfortunately only on a 384 line).

Cheers,
Nick

davemc · Apr 23, 2009

Yea, I only have a 512 line, piggy backing off my landlord's wireless router, still waiting for Telkom to install my line, and i've only been waiting for 5 weeks now, so it's still early days.

I have an ubuntu machine that is my "server" and is running squid, which if configured for youtube caching that can be dedicated for this, but I need the line to be installed properly before I can proceed.

I don't think that Hamachi is a good idea, I don't like the extra overhead all that tunnelling will involve, we'll need to control access with the squid configuration file only.

I also think that, initially, we do not differentiate between local and international caching because that just complicates matters.

andres101 · Apr 23, 2009

I'll have a look at Hamachi, but I doubt it is necessary. domain names can be used in the squid conf.

I've been messing around with the squid config and got it working, but then I changed something and now I cannot get it working again

I'll try again later this week or over the weekend.

Nickste said:
Let me know if you need help testing - I've got a squid box running at the moment (unfortunately only on a 384 line).

As soon as I have a working config, I'll let you know.

My cache access ratio is 60%, but my cache transfer ratio is only 17% (amount of bandwidth saved). I'm hoping that this will increase as the number caches increase.

davemc · Apr 29, 2009

Yay!

I've heard from Telkom, I'll get my telephone line installed on the 10th of July 2009.
About 4 days there-after my ADSL will be up and ready.

Please be a patient ...

Join the MyBroadband community

Get started

Local distributed squid cluster

andres101

Expert Member

davemc

Executive Member

andres101

Expert Member

davemc

Executive Member

andres101

Expert Member

Nickste

Expert Member

davemc

Executive Member

andres101

Expert Member

davemc

Executive Member