Shark jumping Google
January 04, 2003 | co.mments
And assuming that Sergey was not leading Dave on at the conference last week, they are gung-ho about allowing people to update metadata directly into Google. Am I the only person who is grasping the full potential of this?
If you publish to Google's cloud, you get automatic indexing, metadata like who is linking to you, and more. And Google can add little semantic web-like features such as webquotes every few months to keep you hooked. Then, the advantages of a central index really kick in when metadata starts to explode. Obviously Google isn't pushing the "we made a better Internet" angle yet, but they could -- and the fact that they are so carefully surrounding key strategic bits of territory is not a coincidence. I think AOL and MSFT both blew it already, and the Google guys are not as "aww, shucks, we just like to write web crawler software" as they talk. Game over; the tired old Internet can't compete.
I talk to colleagues at work about search engines on and off especially with Sean and Conor. And onlist with Paul I said Google is now treated as web infrastructure. Paul reckoned you can't compare Google to something like DNS; maybe, most of us don't think much about DNS. But it seems that greatly underestimates the social importance of search engines.
What perhaps bothers me is the casual attitude we have to Google - this is a private company that has effectively, no competition (the one other company that comes to mind that gets such an easy ride these days is IntelliJ, but at least it has Eclipse snapping at its heels). We see rants incessantly about Microsoft, yet Google has web search by the short and curlys and somehow we're all comfortable with that. But Google is just another company, and least we forget, one that is not afraid to bare its teeth (the publication of this email is the point when Google shark jumped for me). Google has also shown it will game its rankings under external pressure; what would it do in time under internal pressure? I say all this while being continually awestruck at their ability to innovate without serious competition and a card-carrying believer that their statistical approach to managing information on the web will continue to wipe the floor with ones driven by logic and knowledge representation, such as RDF and OWL. Google are a class act.
At my previous job, with the since defunct InterX, I banged on about the collective insanity that are centralized search engines, even coming up with a model for distributing search index feeds. I was told, more than once, that no-one in their right mind would take on the search engines and the CDN providers, even though we were sitting on code to make it so. And after what the late Gene Kan was doing with InfraSearch became common knowledge, it seemed like it was game over for centralized search.
It hasn't worked out that way - and whether that's a lesson in believing too little in one's convictions or too much one's ability to see around corners I'm not sure about. Yet at the end of the day, having the web downloaded into a database for indexing and querying is such a insane state of affairs, it's hard to comprehend. The very fact that search engines continue to exist at all in their present form is a failure of imagination. There's so much more work to be done in web search. We already know from RSS and blogland that we can distribute content in a decentralized fashion - the question is will we distribute content indices?
One technology that might evolve into a distributed search mechanism is trackback; the sooner more people in blogland start using it and experimenting with, especially with referer ranking schemes, the better,
Here is something to think about: if you could "push" your web pages to Google to be indexed, and Google already caches those pages for access, why would you even have a web server?
So we get to what's Josh's blog leaves out. Not that we push pages into Google but that we break Google up and scatter its bots across the web for individuals to use. We then upload the locally generated indices or start moving them around the network. The key point is not pushing content but decentralizing the building of indices. The uncomfortable dependency we have on a few key engines as information hubs and brokers will inevitably become obvious, and developers will move to balance the power of search engines with open code. When indices that are built from the content models rather that presentation scrapes the search engines performs today are sent around the web the way send RSS content around today, that will be an evolutionary step forward for the web.
Finally, having a conspiracy theory coming from Joshua, an MS employee is somewhat... weird, given MS's behaviour in the past. Though I happen to agree with Joshua this time; Google is one to keep an eye on. Buy shares when you can.
January 4, 2003 02:11 AM
TrackBack URL for this entry: