🕊️ This article is dedicated to a more open internet.
The reason for this is that I saw @geekbb's tweet introducing Warp. Although Warp has been released for a long time, in terms of protecting IP privacy, it is not as useful as iCloud Private Relay, and I don't have the need for magic internet access. But then I realized that I still have a need to hide my IP.
During the development of RSSHub over the years, I found that there are very few sites that provide public APIs, and many sites have strict anti-crawling controls to restrict access to their platform content. Some sites block excessive requests from the same IP, while others completely block IP addresses from common cloud server providers. Therefore, it has become very difficult to simply get the latest few content updates.
This situation requires the use of proxies, but dedicated crawler proxies are usually expensive and have a very low cost-effectiveness. It would be great if RSSHub could utilize the unlimited traffic and abundant IP resources of Cloudflare WARP. RSSHub already supports a common proxy protocol, so as long as WARP can be wrapped as a common proxy, it can be used.
Although it is not convenient to use the official client directly in the command line environment, this idea that is so easy to think of must have been implemented by someone else. I found a packaged Docker on GitHub.
Then, just add such a service to enable the proxy service in RSSHub's docker-compose.yml
warp-socks:
image: monius/docker-warp-socks:latest
privileged: true
volumes:
- /lib/modules:/lib/modules
cap_add:
- NET_ADMIN
- SYS_ADMIN
sysctls:
net.ipv6.conf.all.disable_ipv6: 0
net.ipv4.conf.all.src_valid_mark: 1
healthcheck:
test: ["CMD", "curl", "-f", "https://www.cloudflare.com/cdn-cgi/trace"]
interval: 30s
timeout: 10s
retries: 5
Finally, add a PROXY_URI environment variable to RSSHub
PROXY_URI: 'socks5h://warp-socks:9091'
I chose a hotukdeals route (the UK version of Dealabs) that I often use for testing. This site blocks all DigitalOcean IPs, so it has been in a 403 state.
With WARP, I can access it smoothly
In addition, I found that every time WARP is restarted, a new IP is output. Although I don't have time to verify it, I feel that the IP should change automatically frequently, which is good news for solving anti-crawling.
You can also further customize the WireGuard configuration, including using the paid version of WARP+ and custom endpoints, to get potentially better results.
To generate the WireGuard configuration file, you can use
To brush WARP+ traffic and filter endpoints, you can use
There is a saying that there is no significant difference in the speed of WARP+ (WARP, WARP+ Speed Comparison, and WARP Speed Limit), but further verification is needed to determine if it affects anti-crawling effectiveness.
If everything goes well, many strict anti-crawling routes in the official instance of RSSHub should be able to be used again. I will verify and update here in a few days.