|
如何防止robot骚扰你的网页(1) 今早在ASPalliance.com">www.aspalliance.com上看到一篇文章(Stopping Automated Web Robots Visiting ASP/ASP.NET Websites ,http://aspalliance.com/1018_Stopping_Automated_Web_Robots_Visiting_ASPASPNET_Websites), 主要是讲了下如何采取一些措施,防止robot过度去抓你的网站。看了一下,有的东西还是值得探讨下的,现归纳如下:
1、辨认ROBOT的一些参考标准 Large numbers of requests from a single IP address or a range of IP addresses within the same subnet (i.e. the first three numbers of the IP address are identical). · Large numbers of requests for database driven content compared to the rest of the website.
· Many requests made from browsers that do not support ASP Sessions.
· Lots of and increasing numbers of website visitors, but no corresponding increase in transactions (e.g. sales!).
· Large numbers of spam or automated requests being generated from online forms. 2、到http://www.robotstxt.org/wc/norobots.html上,可以找到一个组织提出的防御robot的建议标准(可惜这个不是什么权威标准拉,没什么约束力),在这里有一些平常我们可以用到的例子和方法,主要是搞一个robot.txt文件,放在网站根目录下,比如 User-agent: * Disallow: / 禁止所有robot
允许所有的robot访问:
User-agent: * Disallow:
User-agent: * Disallow: /cyberworld/map/ 不允许robot探访/cyberworld/map目录下的文件
User-agent: cybermapper 允许cybermapper这个robot Disallow:
User-agent: * Disallow: /cyberworld/map/ Disallow: /tmp/ Disallow: /foo.html 不允许访问foo.html这个文件了
3、如果不方便设置robot.txt的话,还可以在meta里做手脚,比如用 <meta name="robots" content="noindex, nofollow">
|