> If I cannot block all robots, I want to at least be able to identify them to prevent my script from counting robot-generated page views as being from real people
Nginx & Apache2 log the user agent, ip and page accessed. Web crawlers usually add things like "bing bot" to their headers. I still respect your other justifications for blocking the crawlers.
edit: OP is using Lighttpd
> Unfortunately, I do not record the user agents of robots that do not contain one of the following special strings: "bot", "Bot", "rawl", slurp", "pider", "oogle", "rchive", "acebook", "earch", or "http". This is because I don't log user agents that I think are coming from real people. This speeds up my script and reduces the wear on my storage media
Wow you’ve potentially saved literal microseconds and increased the lifetime of your drives by the same amount.
3 comments
3 u/topmost 07 Jul 2021 13:34
2 u/gomugomu 07 Jul 2021 21:29
2 u/package 10 Jul 2021 16:40