Hosting crapping itself with Google Bot hits

guest2013-1

guest
Joined
Aug 22, 2003
Messages
19,800
Reaction score
13
I'm at my wits end.

PHP Website Application (think Wordpress, but my own code)

Pages load within an average of 1 second (best loading times are 0.340). Launched 7 sites, Google Bot comes along and kills the server. Shared host tells me to **** off, so I get a VPS of 512mb. Bot thinks "hey, now I can hit you more", it ****s the server like it's little bitch.

I do some tweaks to the code, change foreach/for loops to while loops where I can (working with arrays mostly). Check the database (insert/select's happen in like 0.00009 seconds) and I pull apart some functionality and rewrite it entirely.

Add in FastCGI/PHP-FPM on the server (huge timeout issue when doing so that has gone undocumented expect for one bloke, which then requires some code rewriting)

Nope, Google just *loves* the new performance increase, and increases their crawl rate, ****ing the server up the ass.

This all WITH cloudflare infront of the damn thing "saving" like 50% of the requests!

So, I upgrade to 1gb, same story.

This totally goes against Google's "user experience" when 7 websites get ****ed so hard by their bots that it's impossible for a normal user to even load the site... Webmaster Tools don't allow for changes in the crawl rate. And it's ignoring the "Crawl Delay" set in the damn robots.txt

I remember the days when 7 websites can live on 1 server sharing their resources with 700 others. Nowadays, nope. DDos attack courtesy of Google.
 
You're not going to like this, but I'm going to tell you anyway. The problem is with your code, not Google. I see this all the time with clients, and there are search engines that crawl much harder than Google.

If Google crawling you takes your site down, i.e. a single loading each link on your site consecutively, you're not going to be able to handle more than a handful of concurrent users anyway. Surviving google crawling is the basic minimum a modern website should be capable of, without cloudflare or varnish or any other crutches.

So, first things first, sign up for google webmaster tools. It allows you to set the crawl rate, all the way down to 0.002 requests per second, or 500 seconds between requests.

Second, switching to php-fpm doesn't change anything as far as the server is concerned. If you get time-outs introduced, your configuration is wrong. Post it here, with the errors - let's see if we can help you there.
 
Last edited:
can be resolved with some code tweaking and configuration. Im pretty sure wordpress has plugins that limit crawler activity. Try wordfence
 
wordpress-like. Everything is custom code. I tried going the Google Webmaster Tools route but it didn't allow me to set the crawl rate at all. It can handle a few hundred users concurrently, but the 50000 or so hits I'm getting from Google bot alone is insane. This isn't 1 site, this is 7.
 
wordpress-like. Everything is custom code. I tried going the Google Webmaster Tools route but it didn't allow me to set the crawl rate at all. It can handle a few hundred users concurrently, but the 50000 or so hits I'm getting from Google bot alone is insane. This isn't 1 site, this is 7.

I've had success using Webmaster Tools to set the crawl rate. Are the site(s) linked correctly? There can sometimes be a delay in linking new sites, and/or updating the crawl rate. And what do you mean it isn't allowing you to set the crawl rate? Is the option somehow blanked out?

Finally, are you sure they are actually Google bots? Have you looked up the ownership of the IP addresses to confirm? It's possible to get rogue bots that disguise themselves as popular search engines.
 
This is what I get in my webmaster tools:

google_crawl.png


Do you not see this at all? I had to verify that I own the site, by putting a file in the documentroot, but I think it forces you to do that when you add the site, so it's not easy to miss.
 
I verified all my sites using the DNS option. It still just says I'm on a special crawl rate and I can't change it at all

I used XHPROF to determine where the bottleneck is, busy rewriting that part of my code-base
 
lol probably, 4 of them has "quality issues"

I love how google would de-index you, tell you you have "quality issues" and then not tell you what it is so you can fix it.

Reminds me of women... "I'm dumping you because of quality issues". "I can change baby, just tell me what I did wrong, I won't ever make that mistake again". "No, it's too late, you have quality issues and I'm not telling you so that you can make the same mistake over and over again and be alone for the rest of your life"
 
"No, it's too late, you have quality issues and I'm not telling you so that you can make the same mistake over and over again and be alone for the rest of your life"

lol! been on the receiving end of this haha
 
You'd think setting the robots.txt file would help hey? lol.... was the first thing I did after the "special crawl rate" wasn't able to be set by myself
 
You'd think setting the robots.txt file would help hey? lol.... was the first thing I did after the "special crawl rate" wasn't able to be set by myself

Did you try using it to block the entire site from crawl access?

User-agent: *
Disallow: /
 
Did you try using it to block the entire site from crawl access?

User-agent: *
Disallow: /
Google and most of the other legitimate crawlers respect the robots.txt file. As asked previously - what IPs are these bots coming from? Are you sure they're genuinely Google and not some hacker busy DoSing you?
 
Top
Sign up to the MyBroadband newsletter
X