How to minimize bots traffic

[ad_1]

I made a firewall rule on Cloudflare to managed-challenge (not block) traffic from hosting companies / datacenter ASN, with some exceptions of course, I put holes for some specific user-agent to get through. I’m aware this will also affect human visitors that use VPN. I don’t mind. To find out the ASN of each IP address, you can use [https://ipinfo.io]), and to find out the bad IPs you can visit [abuseipdb.com]) or look at your server / visitor logs.

Please don’t tell me, “Why don’t you just Wordfence?” I did, but the result is still unsatisfactory and consuming too many server resources. I only use Wordfence for the Rate-limiting, just in case. I even disable the Brute Force Protection & Scan feature. It has been three weeks since I challenging the hosting/datacenter ASNs, my block count is zero.

Traffic fell slightly but not significant. I’ve read John Mu said Google doesn’t filter 100% of bot traffic report in GSC or Analytics. Maybe this is the cause. Adsense income is also stable. Plus, maybe what I’m doing in the future will be useful to minimize click fraud penalty from bots.

Not only ASN from hosting/datacenter, I also challenged traffic from several ASNs from ISPs, specifically only for those using HTTP/1.1, with this rule:

(ip.geoip.asnum in {174 4766 3786 3257 45899} and not http.request.uri.path in {“/wp-content/uploads/favicon.png” “/wp-content/uploads/og.png” “/favicon.ico” “/ads.txt”} and http.request.version in {“HTTP/1.0” “HTTP/1.1” “HTTP/1.2”} and not http.user\_agent contains “Mastodon” and not http.user\_agent contains “coccocbot”)

One other benefit that I see is, usually Googlebot visits my website about 500 times a day. And since I did this, they visit 700 times a day. Maybe because there are more server resources to accommodate their visits = craw budget going up. Data from Cloudflare’s Top Crawlers / Bots (Analytics & Logs > Security) is not much different from Crawl Stats on GSC.

People who maintain server farms for the purpose of spamming, scrapping, or trolling for vulnerabilities, probably won’t like this idea. They will say something like, “That is a bad idea for your SEO,” or “It will bring negative impact to user experience (VPN users).” As long as you make exceptions for search engine user-agents and other service user-agents you want, I don’t see any issues related to the SERPs. At least that’s what I’ve seen for the past three weeks.

When you use, in this order:

(http.user\_agent contains “Google” and not ip.geoip.asnum in {15169 396982 19527}) or (ip.geoip.asnum in {15169 396982 19527} and not http.request.uri.path in {“/wp-content/uploads/favicon.png” “/wp-content/uploads/og.png” “/favicon.ico” “/ads.txt”} and not http.user\_agent contains “Google” and not http.user\_agent contains “FeedBurner” and not http.user\_agent contains “Chrome Privacy” and not http.user\_agent contains “Lighthouse” and not http.user\_agent contains “IAB”)

… you will NOT interrupt the real [Google crawler]). But you can prevent bots coming from Google Cloud Platform if you feel the need to stop, and block fake Googlebot.

I’ve seen too many times here, the never-ending questions, questions from 5 years ago to 1 month ago, that I’m sure will still be asked in the future; question about how to prevent brute force or login attempts in WordPress. Most of the time the answer is “Wordfence.” Swear to God, I have no problem with Wordfence, like I said, I still use it. You just need to use this Cloudflare Firewall Rule to block all access to login page:

(http.request.uri.path contains “/wp-admin/” and not http.request.uri.path contains “/wp-admin/admin-ajax.php” and not http.request.uri.path contains “/wp-admin/theme-editor.php”) or (http.request.uri.path contains “wp-login”) … with “block” action.

Changing your login URL is useless. Trust me, bots will still be able to sniff it.

Don’t forget to set your IP “allow” in IP Access Rule. ISP usually use dynamic IP and your home IP address will change periodically. Then you also have to allow the new IP and delete the old IP. Using Wordfence login/brute force protection will drain your server resources. If there is other way that is just as effective but can save more resources to be used for other things, why not? Right?

Yes, JS or Managed Challenge can be passed by robots, although it’s not an easy thing to do. I believe genius engineers at Cloudflare are no less clever than hackers who will try to bypass, but when you set a “block,” you can be sure it will not passed.

You can also use this Firewall Rule to challenge / block traffic that doesn’t go through Port 80 (http) and Port 443 (https):

(http.host eq “domain.com” and not cf.edge.server\_port in {80 443})

It will be very useful for dealing with bots that trolling around scanning your server ports. Plus, on your native server you can also set the firewall to only allow traffic from Port 80 & 443 through Cloudflare IP and block traffic coming if it doesn’t go through Ports 80 & 443 via Cloudflare IP (cloudflare.com/ips/). You can google to find how to do this. It is very easy. It might be difficult if you use managed hosting. Your hosting provider might not provide a feature to do that.

[Some user-agent exceptions])

[Native firewall server to only allow traffic comes from Port 80 & 443 via Clouflare IPs])

[ad_2]

looking for hover image plugin

Can’t log in to staging site.