My answers to a web hosting questionaire for a certain position are below, if anyone is interested or would like to comment on potential improvements? My experience with such a setup and load is slight, so I'd appreciate feedback from more experienced gurus on this forum....
=============
> Question: As part of your job you have been asked to relocate one of our
> high traffic websites (at peak times 600 req/s) to a new server farm. This
> new server farm will span 10 servers which can be of any configuration of
> your choice.
>
> The website it self is written in PHP and uses MySQL as a backend. The
> content it self is mostly text with a large amount of concurrent downloads.
>
> Please describe how you would arrange this server farm and how you would
> handle the following issues :
>
To start, I want to point out that there were some data points I found
myself asking for that are missing from the available facts. For
instance, some questions that came to mind while going through this
were: How long does each request take? What is the average request
size? How large is the database? How dynamic is the code and how
cache-able is it? Where is session stored, and how does that affect
load-balancing?
In any event, I've made decisions based on infomation provided and
whatever reasonable assumptions I could make for the rest. Where
necessary, I try to explain this in my comments.
I'll assume that the 10 servers include firewalls, load balancers, and
database servers, so I'd begin by laying them out as 2 firewalls, 2
load balancing nodes, 4 web hosts, and 2 database servers. All servers
in the farm would be Linux-based (if it isn't obvious) because it
provides all the tools for the task at hand, and is the better
platform for MySQL.
The web hosts would run nginx because of it's performance benefits and
being able to handle PHP natively. The hardware spec for these nodes
needn't be anything special, but I would likely ensure at least 2-4GB
memory, and 4 cores.
The MySQL servers would each get maybe 1.5x the memory to store the
entire database (however large it is), and at least two sockets with
as many cores as can be afforded to handle the concurrent load. If
cost is not an issue, I'd pack in some PCI-e SSD cards for ultra-fast
IO, failing that regular SSD would suffice. I would also ensure we
purchase hardware that can handle as much memory as possible for
future upgrades.
The firewalls and load-balancing nodes would need some decent
processing power and network bandwidth, since I wouldn't want them to
bottleneck anything or fall over during a peak period or DOS attack.
Memory and disk not as important on the firewalls, but for the
load-balancers I'd try to get at least 4-8GB memory and SSD disks (for
reasons described later).
>
> - Load balancing
>
For load-balancing I'd use nginx on the 2 nodes designated to the
task, using a method like least_conn and perhaps adding session
persistence if that is necessary for the site functions. Nginx was
chosen for performance (easily able to handle 600req/s) and features
like caching and ssl-offloading.
> - Server failover
>
I would have a virtual IP shared between load-balancing nodes with
keepalived, setup so one is primary and the other is a standby. DNS
for the site would resolve to that single floating IP. I've read that
nginx as a load-balancer can support many thousands of requests/sec,
so a single one of these should more than cater for current peaks and
growth.
I'd ensure the load-balancers use automated health tests to detect
failed web hosts and remove them from the pool.
MySQL failures would be handled using something like keepalived or
heartbeat between MySQL hosts, also with a virtual IP and converting
the slave to Master on failover.
Firewalls would be iptables based and setup with keepalived and
conntrackd for HA and failover.
> - Security
>
First step, enable SSL. Enforce if plausible.
Adding fail2ban to the firewalls would allow blocking attackers by IP
based on custom filters, which I like. This could hold off most (D)DOS
problems and hacking attempts.
Nginx on the load-balancers also has a limit request module to offer
some internal protection against flooding.
Regular updates to everything. Especially keeping on top of security
advisories for relevant software.
Lock down SSH to known IPs at the firewall level, disable
password-auth and force keys, disable root logins.
Firewall off traffic to the backend servers except for known services.
> - Performance
>
HTTP compression enabled on nginx would also help reduce data transfer.
Offloading the SSL endpoints to the nginx load-balancers would reduce
that load from the backend hosts.
> - Backups
>
Database backups could be pulled as often as possible (starting daily)
from the MySQL slave. Of course, stopping the slave process and
locking the tables should be done to ensure a consistent and
restorable set of data.
Backup of the web hosts could be accomplished by taking them out of
the load-balancing pool one-at-a-time and performing filesystem
backups (using rsync or whatever tool is desired).
Firewall and load-balancer backups would seem to just need
configuration snapshots, easily done with rsync or similar while
running.
>
> Try to go into as much detail possible describing which applications you
> would install on the servers and why. Also detail the hardware
> configurations you would choose. Feel free to lookup information on the
> internet to help you solve this scenario.
>
I hope the above makes a certain amount of sense. I must disclose
that, while I believe I have the understanding and tools necessary to
deploy and manage platforms like this, I haven't worked with a
load-balanced setup before and found that to be the most difficult
part to plan.
>
> Question: The new server farm is online and the increase in performance has
> allowed for additional growth. However soon the growth in traffic starts to
> overwhelm the database. In the evening the database becomes very
> unresponsive and often it rejects new connections with 'Too many
> connections'. How would you analyze and solve this problem? Try to come up
> with a solution that allows for further growth. Additional machines are not
> an option.
>
First thing I'd do is check what processes are holding connections at
these times, then reduce wait_timeout and/or increase max_connections
depending on whether there are many idle connections or they are all
busy.
Memcached can also be used to reduce database load from the web hosts
layer by storing a cache of objects there. This could be implemented
if the site design allows.
Increasing the memory can allow MySQL to handle up to 10,000
connections in theory, so I'd add memory as needed to handle more
connections.
Those steps should go a long way to ensuring we have capacity. If the
site load really becomes too much for the single master after the
above steps, I'd look at re-architecting to a multi-master setup with
the slave to balance load across the two available servers.
==========
Thanks
=============
> Question: As part of your job you have been asked to relocate one of our
> high traffic websites (at peak times 600 req/s) to a new server farm. This
> new server farm will span 10 servers which can be of any configuration of
> your choice.
>
> The website it self is written in PHP and uses MySQL as a backend. The
> content it self is mostly text with a large amount of concurrent downloads.
>
> Please describe how you would arrange this server farm and how you would
> handle the following issues :
>
To start, I want to point out that there were some data points I found
myself asking for that are missing from the available facts. For
instance, some questions that came to mind while going through this
were: How long does each request take? What is the average request
size? How large is the database? How dynamic is the code and how
cache-able is it? Where is session stored, and how does that affect
load-balancing?
In any event, I've made decisions based on infomation provided and
whatever reasonable assumptions I could make for the rest. Where
necessary, I try to explain this in my comments.
I'll assume that the 10 servers include firewalls, load balancers, and
database servers, so I'd begin by laying them out as 2 firewalls, 2
load balancing nodes, 4 web hosts, and 2 database servers. All servers
in the farm would be Linux-based (if it isn't obvious) because it
provides all the tools for the task at hand, and is the better
platform for MySQL.
The web hosts would run nginx because of it's performance benefits and
being able to handle PHP natively. The hardware spec for these nodes
needn't be anything special, but I would likely ensure at least 2-4GB
memory, and 4 cores.
The MySQL servers would each get maybe 1.5x the memory to store the
entire database (however large it is), and at least two sockets with
as many cores as can be afforded to handle the concurrent load. If
cost is not an issue, I'd pack in some PCI-e SSD cards for ultra-fast
IO, failing that regular SSD would suffice. I would also ensure we
purchase hardware that can handle as much memory as possible for
future upgrades.
The firewalls and load-balancing nodes would need some decent
processing power and network bandwidth, since I wouldn't want them to
bottleneck anything or fall over during a peak period or DOS attack.
Memory and disk not as important on the firewalls, but for the
load-balancers I'd try to get at least 4-8GB memory and SSD disks (for
reasons described later).
>
> - Load balancing
>
For load-balancing I'd use nginx on the 2 nodes designated to the
task, using a method like least_conn and perhaps adding session
persistence if that is necessary for the site functions. Nginx was
chosen for performance (easily able to handle 600req/s) and features
like caching and ssl-offloading.
> - Server failover
>
I would have a virtual IP shared between load-balancing nodes with
keepalived, setup so one is primary and the other is a standby. DNS
for the site would resolve to that single floating IP. I've read that
nginx as a load-balancer can support many thousands of requests/sec,
so a single one of these should more than cater for current peaks and
growth.
I'd ensure the load-balancers use automated health tests to detect
failed web hosts and remove them from the pool.
MySQL failures would be handled using something like keepalived or
heartbeat between MySQL hosts, also with a virtual IP and converting
the slave to Master on failover.
Firewalls would be iptables based and setup with keepalived and
conntrackd for HA and failover.
> - Security
>
First step, enable SSL. Enforce if plausible.
Adding fail2ban to the firewalls would allow blocking attackers by IP
based on custom filters, which I like. This could hold off most (D)DOS
problems and hacking attempts.
Nginx on the load-balancers also has a limit request module to offer
some internal protection against flooding.
Regular updates to everything. Especially keeping on top of security
advisories for relevant software.
Lock down SSH to known IPs at the firewall level, disable
password-auth and force keys, disable root logins.
Firewall off traffic to the backend servers except for known services.
> - Performance
>
HTTP compression enabled on nginx would also help reduce data transfer.
Offloading the SSL endpoints to the nginx load-balancers would reduce
that load from the backend hosts.
> - Backups
>
Database backups could be pulled as often as possible (starting daily)
from the MySQL slave. Of course, stopping the slave process and
locking the tables should be done to ensure a consistent and
restorable set of data.
Backup of the web hosts could be accomplished by taking them out of
the load-balancing pool one-at-a-time and performing filesystem
backups (using rsync or whatever tool is desired).
Firewall and load-balancer backups would seem to just need
configuration snapshots, easily done with rsync or similar while
running.
>
> Try to go into as much detail possible describing which applications you
> would install on the servers and why. Also detail the hardware
> configurations you would choose. Feel free to lookup information on the
> internet to help you solve this scenario.
>
I hope the above makes a certain amount of sense. I must disclose
that, while I believe I have the understanding and tools necessary to
deploy and manage platforms like this, I haven't worked with a
load-balanced setup before and found that to be the most difficult
part to plan.
>
> Question: The new server farm is online and the increase in performance has
> allowed for additional growth. However soon the growth in traffic starts to
> overwhelm the database. In the evening the database becomes very
> unresponsive and often it rejects new connections with 'Too many
> connections'. How would you analyze and solve this problem? Try to come up
> with a solution that allows for further growth. Additional machines are not
> an option.
>
First thing I'd do is check what processes are holding connections at
these times, then reduce wait_timeout and/or increase max_connections
depending on whether there are many idle connections or they are all
busy.
Memcached can also be used to reduce database load from the web hosts
layer by storing a cache of objects there. This could be implemented
if the site design allows.
Increasing the memory can allow MySQL to handle up to 10,000
connections in theory, so I'd add memory as needed to handle more
connections.
Those steps should go a long way to ensuring we have capacity. If the
site load really becomes too much for the single master after the
above steps, I'd look at re-architecting to a multi-master setup with
the slave to balance load across the two available servers.
==========
Thanks