I suggest that you print out this anti-telemarketing counterscript and keep it handy for the next time you get called by a tele-marketer, or as happens in my office: the next time a not-so-tele-marketer steps through the door.
Due to my very kind posting of the overnet gui clc source code, I have had many a search bot drive by to index my web site. Unfortunatly there are parts which I don’t want index (like my wiki which is boring/empty and slow to use, and cgi-bin directories, etc). So I added a robots.txt to exclude these.
Now I’m all happy for Google, Ask Jeeves, etc to index everthing else, but where I draw the line is spambots and other dark harvesters (and particularly Web Content International) that blatently ignore robots.txt. Ideas to block them include mod_rewrite and deny from env, much like mark experienced with block spambots, ban spybots and tell unwanted robots to go to hell.
My immediate solution was to add an iptables reject line for
65.102.*.*. It worked beautifully and so far 10k blocked packets. Most likely I would like to set up a honey pot (a page that’s linked to but excluded in robots.txt) and automatically add servers that request it to the iptables reject list. I’m just a bit worried about the security requirement (root) to add the iptables line *grin*