Schmozilla and wizard.yellowbrick.oz
Looking over my web server logs it was hard not to notice the user agent Schmozilla/v9.14 Platinum and its similarly spoofed referrer http://wizard.yellowbrick.oz. I'm used to seeing referrer spammers and harvesters in my logs, but I had never seen this one before. After fetching the blog and the front page, this spider went for a directory listed as Disallowed in my robots.txt file; that's not very nice.
Looks very much like it's Perl code copied verbatim from Perl Cookbook, specifically Fetching a URL from a Perl Script (recipe 20.1):
$ua->agent("Schmozilla/v9.14 Platinum"); # give it time, it'll get there
my req = HTTP::Request->new(GET => $url);
$req->referer("http://wizard.yellowbrick.oz"); # perplex the log analyzers
The person doing this isn't too bright – he would have changed the user agent and referrer to something more plausible if he was – but at least he has access to good literature.
4 December, 2004
Feedback
by Johan
The type and pattern of requests should give you some indication, but it's hard to tell. It could be as harmless as a Perl novice writing a simple Perl script for educational purposes and testing it on a live site to make sure it works. A student might be collecting a sample of real web pages for some kind of research. A webmaster may have based a script (e.g. for link checking) on the Perl Cookbook recipe and forgot to change the user agent and referrer string. Then again, a spammer could have done the same to harvest email addresses.
Perhaps it's all of the above, done by different people at different times. I think the Perl novice scenario is the most plausible and have only seen a small number of requests so I won't bother blocking it. If someone wants to copy an entire site or harvest email addresses there are much easier and less noticeable ways to do so.
by Andy
I've just seen this same string launched from a box within a corporate environment. That makes me think it's a zombie.
However, it's an odd part of my site they're accessing. This particular page isn't linked from anywhere else on the site, you'd have to know about the page to get there. I guess I'll have to read up on spiders and bots.
...I suppose I'll block the address.
by Danny
Hi, I've just seen this wizard.yellowbrick.oz on my referrers list too, but I have no idea how to find out what it was searching for or looking at. Could anyone tell me how?
Thanks
_dan
by Johan
Danny, to see what pages were retrieved you just look at the requests with that particular referrer in your access log. This might not be possible if you have a separate referrer log, but that's unusual.
by IO ERROR
The only times I've seen this user-agent or referrer I have been able to connect conclusively to spambot activity. It is either harvesting email addresses to send email spam, or harvesting comment forms to post blog spam. Sometimes both.
by fdask
Just found "Schmozilla/v9.14 Platinum" in my logs, and did a google. This was the first match. I thought it was simply someone spoofing, for shits and giggles...
by Daniel Webb
If you're running PHP/Apache, I have a small PHP package that will auto-ban all bots that ignore robots.txt, with the option to unban if a person types in the password.
by eat
I just emailed the ISP of whoever was running the thing on my site. Probably won't do much, but maybe I'm not the first to raise the question...
by maartenm
I noticed him on my company site. All with IP 66.17.15.136, which resolves to 66-17-15-136.biz.bkfd.arrival.net. What's that PHP package to ban these fellas?
by Johan
Mr Webb's bot trap package is available on his site. I like the idea of a trap, but I'm trying to avoid building an IP blocklist for this. I already have a huge email spammer blacklist to maintain, and a bad bot blacklist would also add a fair amount of extra processing to the web server.
by russ
I saw them in my logs and they were only at each page for a second or two. I think it's somekind of harvester but that's only my opinion. Try this to maybe stop it:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} CherryPickerSE [OR]
RewriteCond %{HTTP_USER_AGENT} CherryPickerElite [OR]
RewriteCond %{HTTP_USER_AGENT} EmailCollector [OR]
RewriteCond %{HTTP_USER_AGENT} EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} Schmozilla/v9.14 Platinum [OR]
RewriteCond %{HTTP_USER_AGENT} Wget [OR]
RewriteCond %{HTTP_USER_AGENT} Windows-Media-Player
RewriteRule ^.*$ http://www.yourpage.com/error/403.shtml [L]by Johan
That looks like an old list. I still get the occasional visit from EmailSiphon, but most of the harvesters I've seen lately have user agents like Java/1.4.2_04 or Mozilla/4.0 (compatible; MSIE 6.0; Windows 98).
by jkronegg
In my logs, 18 hits from IP 66.17.15.138 by that useragent (different from the one reported by Maartenm).
by Daniel Stad
Hey everyone, while at a conference to discuss some major changes to the www.aspirefreepress.com website a
number of people had the same questions as all of you. It just happen's that there was a spokesperson from
"The Internet Archive's Wayback Machine" he announced that the wizard.yellowbrick.oz is a search engine designed
to go and take snapshots of your websites.
When asked why they would use the dead URL, the reply was that the project was kept a secret for a long time so that
no one would copy the idea.
That's it folks.
Hope that puts your minds at ease...
by Pooneh
I couldn't figure out the wizard.yellowbrick.oz thing but thanks to the comments above me, everything makes sense now.
by John
Some may be part of web archive effort, notice that alot but I think some users of this program are going after confidential and protected areas. I am taking this bot as a major threat to security.
by Thomas Vinson-Peng
PAY NO ATTENTION TO THE MAN BEHIND THE CURTAIN!!!!
Feedback is closed for this entry.
by Dave
Why would someone want to do this, I am interested to know?
I also want to know if they could use this to take an entire copy of your site, or if it is just a harmless spoof?
I have just started seeing this in my logs, I am not sure what to think of it.
Dave