Possibility and Probability

Python, AI, and other fun stuff

1/11/2005

Search engines and networks of links

Search engines are a neat thing to play with. I know, how dorky does that sound? But think about it for a second... I used to think that they were all pretty much the same in that they would all reference the same pages. Obviously engines like Google (and a9) have some big advantages over other engines (like yahoo, and excite), but they don't seem to have a monopoly.

This blog seems to get run over by the search engine spiders fairly often (at least judging by the topics that people were searching for when they hit this page). So every now and then I kinda ego surf google and see what comes up. The other day I was very surprised when I saw (ranked pretty high up) a response I gave on the Pygame mailing list. I was literally seeing this on google 2 days after I sent the email to the mailing list. (And it wasn't anything earth shattering in the email, I was just asking a question.)

Then a few days ago I wrote a blog entry about getting a webcam to work with my Mandrake 10.1 system. Within a few days I had already had a hit from a webpage with someone looking for that info.

So today during a lull I decided to run my name through a few different search engines and see what came up. I was very surprised. Tek-Tips is a nice site that has a lot of forums about various IT topics. Years ago I answered a few questions and now all of the search engines show links to that. Odd, seeing how the content is old (perhaps the pages are dynamic and that "hides" the age of the content), but interesting.

The going to Yahoo and doing the same search I see my slashdot info. I thought that was odd that it would show up here inside of yahoo, but not google. Just to make things even more interesting I went to excite and found that it had spidered Rent-A-Coder and found my profile there! Again, no other site had done that.

So the main conclusion I'm pulling from all of this is A)Not all engines index the same pages. and B)There is some interesting network stuff that can be done with these engines.

Now when I say network, I'm referring to the mathematical kind, not the computer kind. I always thought that the engines were already doing some type of network building (for example persons A and B talk about oranges. But A and B don't link to each other, but since they have oranges in common I'll include them both in a query set for oranges). I've been thinking that my wife's (hi Katie!) beading website would be picked up by some engine just because it was mentioned in my blog roll and because of its relationship to the topic of beads. But that hasn't happened so far. Perhaps its a time thing (the spiders just haven't gone over it yet), but at any rate now that I have mentioned it here I'm sure it will get hit soon.

And that will be an interesting thing to see. It will be cool to see the topology that forms between Katie's site, my links, and other sites about beads and beaded jewelry. I'm sure some PHD candidate out there has already done this, but I think it would be neat to see a program that could go to the engines, search on a topic, collate all of the results, and then present the strongest results as a network. That network could then be used for a lot of things, a starting point for new searches on a related topic for example.

Hmmm.... But I should stay focused on completing my main project (my game engine). :)

Comment on this post [ so far] ... more like this: [search engines]