IWF now blocking Internet Archive's Wayback Machine?
Chris Edwards
ukcrypto at chiark.greenend.org.uk
Wed, 14 Jan 2009 10:56:56 +0000 (GMT)
On Wed, 14 Jan 2009, Paul Vigay wrote:
| Richard Jones <rich@annexia.org> wrote:
| > According to reports in the Register anyway ...
|
| > http://www.theregister.co.uk/2009/01/14/demon_muzzles_wayback_machine/
|
| I'm assuming that if they have, they've only blocked portions of it. I've
| just done some tests here and I can retrieve pages from the internet
| archive via Demon broadband (testing on a couple of my own old sites).
Virgin seem to be filtering something.
Access from Virgin home cablemodem to web.archive.org appears to be routed
thru the IWF filtering proxy, in just the same way as for wikipedia
recently.
I can view *some* archived pages OK (including www.bbc.co.uk from 2002),
albeit slowly, whilst other pages give timeouts or too busy type errors.
However, this experience is very typical of the archive.org, which I
suspect is under-resourced. So, in contrast to The Register story, it
looks like NOT the whole site is on the IWF list. Which makes more sense.
Note - I ran "tcptraceroute 207.241.232.5 80" to determine (part of) this
site is on the IWF list. This is the single IP address I got when first
looking up web.archive.org. However, looking it up again, I get a
*different* IP, which interestingly doesn't appear to route thru Virgin's
IWF proxy. Looks kinda like archive.org use some sort of DNS-based
load-balancing, where different results get returned, according to some
criteria. (Similar to what's recently been discussed wrt google).
This would appear to throw a small spanner in the works for the "2-stage"
cleanfeed blocking systems, because the step of taking the URLs in the IWF
list and resolving to numeric IP addresses can't easily obtain *all* the
IP addresses that the content provider's load-balancing DNS may give out....