« Trust and Robust | Main | New slide for Sam Ruby »

WikiSpam, TruckNumber, AnonymousCoward, LinkLove, LinkSpam

C2 is the finest repository of programming lore on the planet. But it seems Ward Cunningham is having severe spam problems over there:

"Due to continued abuse by vandals and computer programs we now require the entry of a random code word before we will save changes to pages. The current codeword will be made available on the edit page at times when I can directly supervise the site." - WardCunningham

It's not good that Ward has to be onsite to do this - as Steve Loughran pointed out "How can you be a wiki, when you don't allow edits?". But Ward's already sorting out the Wiki's TruckNumber - "I am assembling a group of stewards who have both a financial and emotional stake in the continued health of this site.". He has a great instinct for the social aspects of creating and using software; it will be interesting to see what he comes up with here.

In the meantime the Perl script that runs Wiki could do with an upgrade. An image captcha would help for now as would a javascript munged math question insofar as they make automated attacks difficult - they won't work so well for manual spammers.

I think, for the Atom crowd, this also indicates a limitation of using IP addresses either as identifiers or means of outing people (be they spamming or astroturfing). If the incentives are sufficient, embarrassment by IP is not a deterrent.

The core problems here are threefold:

  • Anonymity: people are making money in an anti-social fashion from link spam without having to reveal themselves.
  • An absence of deterrents for spammers: if I can make money spamming in an anti-social fashion, without the problem of the people's who's sites I'm wrecking knowing who I am, and without breaking any laws, why would I stop? So then we have to ask, on a public wiki or weblog, is registration really a problem?
  • Search engines: the point of link spam is typically to game search engine rankings. The idea of downloading the web into a database for indexing has always been a questionable idea, even more so now that it seems the statistically based ones have sufficient difficulty dealing with spam and blog topologies, that their inventors want us to pepper our blogs with metadata.

Technology alone can't deal with a problem that's mainly socio-economic in nature - only the last item in that list is really to do with technology or systems design, and arguably solving it for the current search architectures is AI-hard. Anyone who still believes Bayesian filters will win this war is blithe to how dynamic systems with limited resources and competing actors work, plus the fact is that the end-goal for online spam is somewhat different to email spam in ways that make Bayesian techniques less effective.

Search engines as of 2005 are incapable of telling the difference between link-spam and link-love - it's all whuffie to them.

March 18, 2005 09:51 AM


Post a comment

(you may use HTML tags for style)

Remember Me?

Trackback Pings

TrackBack URL for this entry: