Microsoft bingbot request error

rpm

Admin
Staff member
Joined
Jul 22, 2003
Messages
66,805
Reaction score
5,057
Location
Johannesburg
Hi there

We are seeing thousands of these request errors. It obviously happens with a special character - ' in this case.

Any idea how to avoid this? I would prefer not to block Microsoft's Bingbot.


[Fri Nov 08 06:25:50 2013] [error] [client 157.55.36.37] (36)File name too long: access to /214218-Banning-cell-phones-doesn\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\xe2\x80\x9a\xc3\x82\xc2\xa2\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\xe2\x80\xa0\xc3\xa2\xe2\x82\xac\xe2\x84\xa2\xc3\x83\xc6\x92\xc3\xa2\xe2\x82\xac\xc2\xa0\xc3\x83\xc2\xa2\xc3\xa2\xe2\x80\x9a\xc2\xac\xc3\xa2\xe2\x80\x9e\xc2\xa2\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\xc2\xa2\xc3\xa2\xe2\x80\x9a\xc2\xac\xc3\x85\xc2\xa1\xc3\x83\xc6\x92\xc3\xa2\xe2\x82\xac\xc5\xa1\xc3\x83\xe2\x80\x9a\xc3\x82\xc2\xaf\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\xe2\x80\xa0\xc3\xa2\xe2\x82\xac\xe2\x84\xa2\xc3\x83\xc6\x92\xc3\x82\xc2\xa2\xc3\x83\xc2\xa2\xc3\xa2\xe2\x82\xac\xc5\xa1\xc3\x82\xc2\xac\xc3\x83\xe2\x80\xa6\xc3\x82\xc2\xa1\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\xc2\xa2\xc3\xa2\xe2\x80\x9a\xc2\xac\xc3\x85\xc2\xa1\xc3\x83\xc6\x92\xc3\xa2\xe2\x82\xac\xc5\xa1\xc3\x83\xe2\x80\x9a\xc3\x82\xc2\xbf\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\xe2\x80\xa0
 
Is that from the web server log? It's a bit light on detail ...
 
I thought special characters were already url encoded?

EDIT: Hmmm, I see apostrophes don't get encoded. But I do see other special characters are removed. Why not just remove apostrophes also?
 
Last edited:
I'll convert that hex to ascii when I get home. Bit limited on phone.
 
\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\ xe2\x80\x9a\xc3\x82\xc2\xa2\xc3\x83\xc6\x92\xc3\x8 6\xe2\x80\x99\xc3\x83\xe2\x80\xa0\xc3\xa2\xe2\x82\ xac\xe2\x84\xa2\xc3\x83\xc6\x92\xc3\xa2\xe2\x82\xa c\xc2\xa0\xc3\x83\xc2\xa2\xc3\xa2\xe2\x80\x9a\xc2\ xac\xc3\xa2\xe2\x80\x9e\xc2\xa2\xc3\x83\xc6\x92\xc 3\x86\xe2\x80\x99\xc3\x83\xc2\xa2\xc3\xa2\xe2\x80\ x9a\xc2\xac\xc3\x85\xc2\xa1\xc3\x83\xc6\x92\xc3\xa 2\xe2\x82\xac\xc5\xa1\xc3\x83\xe2\x80\x9a\xc3\x82\ xc2\xaf\xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x8 3\xe2\x80\xa0\xc3\xa2\xe2\x82\xac\xe2\x84\xa2\xc3\ x83\xc6\x92\xc3\x82\xc2\xa2\xc3\x83\xc2\xa2\xc3\xa 2\xe2\x82\xac\xc5\xa1\xc3\x82\xc2\xac\xc3\x83\xe2\ x80\xa6\xc3\x82\xc2\xa1\xc3\x83\xc6\x92\xc3\x86\xe 2\x80\x99\xc3\x83\xc2\xa2\xc3\xa2\xe2\x80\x9a\xc2\ xac\xc3\x85\xc2\xa1\xc3\x83\xc6\x92\xc3\xa2\xe2\x8 2\xac\xc5\xa1\xc3\x83\xe2\x80\x9a\xc3\x82\xc2\xbf\ xc3\x83\xc6\x92\xc3\x86\xe2\x80\x99\xc3\x83\xe2\x8 0\xa0

=

âï¿ÃÆ

Hmmm ... ?!
 
Why don't we just start by asking why these single quotes or apostrophes aren't getting removed from the url like other special characters?

Screenshot%202013-11-08%2018.59.02.png
 
The msnbot obviously cannot handle unicode chars in the URL. The ‘ and ’ characters may be nice to look at, but they're causing that spider to bug out. The best solution is to strip all non-ascii characters out of future URLs.
 
Last edited:
Why don't we just start by asking why these single quotes or apostrophes aren't getting removed from the url like other special characters?

Screenshot%202013-11-08%2018.59.02.png

They are getting ignored though, i tested this site a long time ago for sql vulnerablilties hehe
 
Top
Sign up to the MyBroadband newsletter
X