Anne van Kesteren

rel="nofollow"

There is a new REL value in town. It was not released by the W3C, but by a major browser user agent, Google. I guess WHATWG will standardize the value or the Global Multimedia Protocols Group defines a profile for it, just like it was done for XFN.

The new REL attribute value is nofollow; it was introduced to “prevent comment spam.” However, it does not actually prevent comment spam, it makes sure that comment spam does not get indexed by Google, Yahoo and MSN Search. It will be implemented by various weblog systems as well. (Read preventing comment spam on the Google Blog for some more information.)

Personally I am more in the opinion that spam should not appear on weblogs at all and should be prevented in one way or another. I have seen various proposals that could make a chance in stopping spam. That does not mean that I am against the new value, by the way. It is useful, but I doubt if it will stop spam. Furthermore, the way it is currently implemented in some weblog systems it will harm useful comments.

Comments

  1. Since when is Google a browser?

    Posted by Moose at

  2. Sorry, I meant user agent. See HTML user agent if you have any doubts about that.

    Posted by Anne at

  3. I think the best solution is: remove the spam from the blogs. Use filters and make a moderation queu like the comment approve in Wordpress.

    Posted by Jort at

  4. Valid links will still work if someone is reading the comments. Yes, you will still have to remove comment spam but at the same time if it limits it's visibility to search engines the spammers will soon get bored and go on to some other sort of annoyment.

    Posted by jr at

  5. rel="nofollow" is the most pathetic thing I have seen in a while... and that means it must be really pathetic (all IMHO of course). What happened to sensible semantics? nofollow doesn't even present what it means; it should have rather been something like don't-increase-ranking or something shorter, because that's what it actually means.

    There are much better ways of preventing spam. We on Blogspot don't suffer, and Anne does neither with his (very) cool validation and preview features.

    Posted by Charl van Niekerk at

  6. The setup you're using right here, Anne, seems to work pretty good.

    The fact that the commentator has to write well-formed markup, and preview the comment before submitting it, will reduce comment-spam 'till a certain degree.

    Posted by Henrik Lied at

  7. This is some sort of first-aid, if you wan't to show comments without moderating them first, and if some spam comes through and Google picks it up before you've had a chance to moderate it, it won't help the spammer at all. The other solution would be to use a redirection service, which makes the link seriously less useful (uhm, where is it linking to?). So, I like the idea, but it's no way optimal.

    Posted by Mark Wubben at

  8. How does the implementation of nofollow in some weblog systems harm useful comments? It does not limit me at all to write comments! If you feel harmed you are either a spambot or somebody who only comments to have your link placed on a high PR site. I don't see the harm of reducing the effectiveness in those cases.

    Posted by Jeroen at

  9. I think it's a nice intermediate solution until we find something really powerful against spam in blogs.

    Posted by phnk at

  10. Jeroen, you never linked to a useful site in a comment? What would you say about comment #2? What if that wasn't increasing the page rank of that specific page? Because that would actually be one of the most useful pages on that topic.

    Posted by Frenzie at

  11. I’m pretty sure this will be (ab)used far more by people who think they need to protect their “Google juice” (that’s so stupid it hurts to write it) than to stop comment spam (which it doesn’t really do at all).

    This is a problem that needs to be solved by blog software developers. It shouldn’t have been a surprise that when you allow anyone to post on your website without any sort of verification/moderation/filtering system, and thousands of other people do the same, that the spammers will exploit it. I know blogging has attracted a lot of people who are relatively new to the Internet, but I didn’t think that applied to the people writing the software as well…

    Posted by Joel at

  12. I don't like nofollow either, and I won't be using it on my blog (assuming blogger gives me a choice in the matter). I've just revisited the idea of link relationships, including my reasons for why both vote-links and nofollow relationships are harmful

    Posted by Lachlan Hunt at

  13. Software developers know the golden rule: "Never trust user input", and that is exactly what nofollow does for web sites.

    The link in comment #2 is indeed useful, and I had no problem finding it. I read the comments, and decided to click on the link, no problem! I might even considder writing about that site, which will increase its PR. But in that case it is the owner and not a visitor of a site giving the pagerank.

    The point is: if you write good content, you will get your ranking anyway. It only might take a little bit longer. At this moment the importance of weblogs is highly inflated, and nofollow will give more balanced search results. In my opinion, that is a good thing.

    Posted by Jeroen at

  14. Good point. Still a stupid thing to call it nofollow imho though. :)

    Posted by Frenzie at

  15. I agree that this annotation shouldn't have been shoehorned into the 'rel' attribute. As a quick fix I would have preferred a specific classname, e.g. class="_unauthorised".

    Posted by Jonny Axelsson at

  16. So adding reserved class-names? Sounds like a bad idea to me. But well, we'll have to live with nofollow I guess...

    Posted by Frenzie at

  17. rel="nofollow" may help out, eventually, but in the meantime this is just a way for Google to have its problems solved by others.

    Of course, this isn't to say that I disagree with the methods (I actually think it's rather elegant -- think Tom Sawyer and the whitewashed fence), but I do think there's just a bit too much hype over what will only very gradually stem the flow.

    By the way, I think your XHTML comment validator is rather spiffy. (Pity it doesn't like the <ins/> element, as I found out when I tried to add this extra paragraph after previewing my comment for the first time.)

    Posted by Jeff Walden at

  18. As someone who has written a commenting system, writing "better filtering" is a lot more challenging than you think. Especially because anything the browser can see, so too can bots. Moderation, in its current incarnation, works if you can spend all day watching comments roll in... and who wants to delete 20 pieces of junk for every legitimate comment?

    I think the rel="nofollow" option is a fine band-aid solution in concept. However, for it to be effective, every site that uses commenting of any type needs to implement it or similar solution (I mask the URL and strip_tags() the display of all unmoderated comments). Spammers don't seem to care if only 90% of what they throw out provides useful results, only that the remaining 10% does. As a result, they'll continue with their hit-and-miss tactics and attack anything and everything that resembles a comment form, leaving little, if anything, changed. On the bright side, though, sites using the rel="nofollow" solution won't be increasing the pagerank of sites providing incest porn, viagra knock-offs, or online casinos.

    And just so we're all clear: forced preview does nothing to prevent automated form submissions. It stops human spammers, not bots. All they have to do is make sure $_POST['submit'] = 'Post' when they dump to the proper form processing script and they're in. The valid XHTML requirement is the only real barrier to entry I can see.

    Posted by c. s. at

  19. Software developers know the golden rule: "Never trust user input", and that is exactly what nofollow does for web sites.

    Trust can mean many different things. The above means that you can't trust user input to be correct so you must do validation in order to prevent your system from crashing. Every software package that allows the user to input data trusts it in some way or the other.

    1. Google themselves trust user input, because they don't treat all links per default like links that are marked rel="nofollow".
    2. Weblogs with comments are based on user input. And if you're displaying the comment on your weblog, you're trusting it, not?

    Posted by Charl van Niekerk at

  20. As a strong spam prevention system I'd suggest encrypting form input using a server-side password and then calculating the MD5 of it and passing it to a hidden field in a form. If the sum passed from UA matches the one calculated for the other input fields, post the comment, otherwise show the preview box again. Does not hurt impaired users (no obfuscated image deciphering required) and will kill most of the current spam bots.

    Posted by Patrys at

  21. I'm very skeptical about Google taking liberty to define their own attributes. The ripple is too large and the result is useless. Next thing you know Microsoft will start adding their tweaks again.

    This is more of a patch. Spammers will keep posting anyway, or they will devise a fancy workaround to "nofollow". Will Google patch things up and send another ripple?

    I'm conviced that prosecuting spammers and building filters, captchas, plug-ins, etc, is THE way to handle this problem.

    Posted by Milan Negovan at

  22. Patrys, that is a good idea, even making a simple random number would work too.

    Posted by The Wolf at

  23. An idea! How about a system that checks all comments posted in the last 20 minutes for three or more duplicate entries (doesn't have to be by the same author) or entries that are a 95% match and flags them as spam?

    Would that be a good system? Perhaps after testing something like that you could have it delete 95%+ matches and just flag 85%+ matches?

    Posted by The Wolf at

  24. The Wolf, I doubt if it'll work, spammers wil then resort to randomise their post for 90% , they could use words, valid markup, or whatever is needed to pass a kind of validation.

    You could, though, just check for matching url's. If they randomise their url as they do in email spam messages their pagerank won't increase.

    Posted by Jaap Schreurs at

  25. I am going to investigate the possibility of filtering comments through spamassasin. It does a very good job with most spam mail, so why not have a setup like the following:

    1. User submits form
    2. Data is validated against SQL-injection, XSS-attacks, etc.
    3. If it passes, the data is sent to spamassasin and the administrator can choose what levels that should be required for a message to:
      • be shown immediately
      • be withheld pending moderation
      • be dropped

    I do not now if this has been tried or considered, or even if it is possible without resorting to really ugly hacks, but I think the idea could very well work.

    Posted by itpastorn at

  26. Errruuugh. I'm not convinced. I think it's probably a small plus, but doesn't really do much at all.

    Personally, I'd have favoured using a meta tag that declared: "this class name is untrusted and shouldn't be followed". Thus, the user can decide the name for their "untrusted links" identifier. Second would be that by using a class, you could apply class="untrusted" to any element, such as a containing div or p and apply the trust/untrusted state of links to entire comments, rather than having to process every single link.
    Of course, you'd still have declare particular commentators as 'trusted', that's a job for the blog engine, but being able to apply something to the parent would be much cleaner.

    But hey, it can't do any harm. Wont kill the market for anti-spam tools by any means, but might at least discourage people from setting up new spam businesses. Obviously we need people to upgrade to software that will use this new trick for them to even consider changing their ways...

    Posted by Ben Ward at

  27. I've decided to provide you with a working example of comment spam prevention. You can see it working here.

    Posted by Patrys at

  28. Furthermore, the way it is currently implemented in some weblog systems it will harm useful comments. [...]

    Absolutely.

    The discussion on www-html (follow-up) fantastically shows the entire controversy.

    Posted by Jens Meiert at