View Full Version : Yahoo! Slurp Spider?
Bella
03-30-2005, 04:18 PM
I just discovered the "Who's Online" feature on vBB. Very cool, although I feel a bit voyeuristic knowing what people are reading. Anyway, I understand the "Guest" bit - but what the fuck is a Yahoo! Slurp Spider?
I like checking the "who's online" feature when I'm bored and see all the different search engines that are scanning this forum. Kinda trippy.
Crumb
03-30-2005, 04:33 PM
A spider is a computer program that search engines use to index web pages so that Yahoo searchers can find them in their searches. So it goes around the web looking at and indexing the contents of any page it can get its virtual hands on. At least it has the decency to identify itself.
livius drusus
03-30-2005, 04:34 PM
It's a bot sent out by Yahoo's search engine to crawl the web and mine data. Say you noticed the Slurp spider checking out the "Franchises That Don't Suck" thread. A few days from now, someone using Yahoo to search for "On the Border" might see a link to our thread as one of the results.
Google, AskJeeves, MSN and a whole slew of other search engines you've probably never heard of send out armies of spiders to link hop all over the web and keep their databanks updated.
Nowadays, the overwhelming majority of guests on any forum are actually search engines. We've just turned on the vB option which identifies the spiders instead of calling them the generic "guest".
livius drusus
03-30-2005, 04:35 PM
At least it has the decency to identify itself.
He he... Only 'cause we let it. ;) Here's our identification list, btw. We haven't updated it in a while, and there are always new and obscure search engines springing up. Some of the guests may be the new and obscure guys. As you can see, though, we've covered quite a bit of ground.
ABCdatos BotLink
Ahoy!
Alexa
Alkaline
Almaden Crawler
ananzi
Anthill
Aport
Arachnophilia
Araneo
ArchitextSpider
arks
AskJeeves
ASpider
ATN Worldwide
Atomz.com
AURESYS
BackRub
Baiduspider
BBot
Big Brother
Bigmir
Bjaaland
BlackWidow
BoardReader
BoardViewer
Borg-Bot
BSpider
CACTVS Chemistry
Calif
Checkbot
ChristCrawler.com
cIeNcIaFiCcIoN.nEt
CMC/0.01
Combine System
ComputingSite Robi/1.0
Conceptbot
CoolBot
Cusco
CyberSpyder
Desert Realm
DeWeb(c)
Die Blinde Kuh
DienstSpider
Digger
Digimarc MarcSpider
Digimarc Marcspider/CGI
Digital Integrity Robot
Direct Hit Grabber
DNAbot
DragonBot
DWCP (Dridus' Web Cataloging Project)
e-collector
EbiNess
EIT Link Verifier Robot
ELFINBOT
Emacs-w3 Search Engine
Esther
Evliya Celebi
FAST / AlltheWeb
FastCrawler
Felix IDE
FetchRover
fido
Fish search
Fluid Dynamics
Fouineur
Freecrawl
FunnelWeb
GAMEKIT
gammaSpider
gazz
GCreep
GetterroboPlus Puu
GetURL
Gigabot
Girafabot
Golem
Google
Google AdSense
Griffon
Gromit
Grub Client
Gulper Bot
havIndex
HeinrichderMiragoRobot
HenryTheMiragoRobot
HKU WWW Octopus
Hometown
ht://Dig
HTML Index
HTMLgobble
Hämähäkki
I, Robot
iajaBot
IBM_Planetwide
IlTrovatore-Setaccio
image.kapsi.net
Imagelock
IncyWincy
Informant
InfoSeek Robot 1.0
Infoseek Sidewinder
InfoSpiders
Ingrid
Inktomi
Inspector Web
IntelliAgent
Internet Cruiser
Internet Shinchakubin
InternetLinkAgent
Iron33
Israeli-search
JavaBee
JBot
JCrawler
JoBo
Jobot
JoeBot
JumpStation
Katipo
KDD-Explorer
KIT-Fireball
KO_Yappo_Robot
LabelGrabber
larbin
legs
Link Validator
LinkScan
LinkWalker
Lockon
logo.gif
Lycos
Magpie
marvin/infoseek
Mattie
MediaFox
MerzScope
META
MetaGer
MindCrawler
mnoGoSearch
moget
MOMspider
Monster
Motor
MSNBot
Muscat Ferret
Mwd.Search
NameProtect
NEC-MeshExplorer
Nederland.zoek
NetCarta WebMap
NetMechanic
NetScoop
newscan-online
NHSE Web Forager
Nomad
Northern Light
nzexplorer
Occam
Openbot
Openfind data gatherer
Orb Search
Pack Rat
PageBoy
ParaSite
Patric
pegasus
PerlCrawler 1.0
PGP Key Agent
Phantom
PhpDig
PiltdownMan
Pimptrain.com's
Pioneer
PlumtreeWebAccessor
Pompos
Poppi
Popular Iconoclast
Portal Juice
PortalB Spider
psbot
Rambler
Raven Search
Resume Robot
Road Runner: The ImageScape Robot
RoadHouse Crawling System
Robbie the Robot
RoboCrawl
RoboFox
Robot Francoroute
Robozilla
Roverbot
RuLeS
SafetyNet
Scooter
SearchNZ
SearchProcess
Seekbot
Senrigan
SG-Scout
ShagSeeker
Shai'Hulud
Simmany Robot Ver1.0
Site Searcher
Site Valet
SiteTech-Rover
Skymob.com
SLCrawler
Sleek
Smart Spider
Snooper
Solbot
Speedy Spider
SpiderBot
Spiderline Crawler
SpiderMan
SpiderView(tm)
spider_monkey
Suke
suntek search engine
TACH Black Widow
Tarantula
tarspider
Tcl W3 Robot
TechBOT
Templeton
The Jubii
The NorthStar Robot
The NWI Robot
The Peregrinator
TITAN
TitIn
TLSpider
Turnitin.com
Turtle
UCSD Crawl
URL Check
URL Spider Pro
Valkyrie
Verticrawl
Victoria
vision-search
Voyager
VWbot
W3M2
w3mir
w@pSpider
Walhello appie
WallPaper
Web Core / Roots
Web Moose
WebBandit
WebCatcher
WebCopy
webfetcher
weblayers
WebLinker
Weblog Monitor
WebQuest
WebReaper
webs
WebStolperer
WebVac
webwalk
WebWalker
WebWatch
Wget
whatUseek Winona
Wild Ferret Web Hopper
Wired Digital
WiseNut
WWWC
WWWWanderer
X-Crawler
XGET
XYLEME Robot
Yahoo! Slurp
Yahoo-VerticalCrawler
YahooFeedSeeker
Yandex
xouper
03-30-2005, 09:55 PM
livius drusus: ... Some of the guests may be the new and obscure guys. ...
Wget
I've used "wget" to mirror a few websites now and then.
http://www.gnu.org/software/wget/wget.html
If wget has been logged as a guest on this site, could it be a private individual mirroring your forum?
livius drusus
03-30-2005, 10:01 PM
I've never actually seen wget here, but I haven't checked the server logs or anything. I shall defer to vm on this matter as I'm more of a poke the server with a stick kind of girl than anything.
viscousmemories
03-31-2005, 12:44 AM
I've never noticed wget here either, but thanks for that tip xouper. We got that list of known spiders from vBulletin's website and entered it in, we haven't actually seen all of them here.
Corona688
03-31-2005, 03:08 AM
wget's nothing to be alarmed about. It's a very polite program, fully abiding by robots.txt and such -- which is a major reason why people don't generally use it as a web robot. :P It's more of an obscure web powertool. I use it all the time for little one-shot retrievals, just 'wget http://something.org/filename.zip' and it downloads.
viscousmemories
03-31-2005, 03:41 AM
Cool. I've been starting to play with Linux a lot more lately, so I'm sure I'll find it handy.
Corona688
03-31-2005, 01:59 PM
Cool. I've been starting to play with Linux a lot more lately, so I'm sure I'll find it handy. One thing to watch out for, then. Put URL's with funny characters like %, &, ?, ! etc. inside single-quotes; otherwise whatever shell you're using might take them to mean something and split your URL into freaking bits.
viscousmemories
03-31-2005, 02:43 PM
Good tip. :1thumbup:
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.