Anne van Kesteren

getAttribute(attr) != null considered harmful

Time to consider things harmful again. The specification clearly says that the getAttribute() method can only return a DOMString. Either the attribute value as a string or the empty string if the attribute as absent or empty. Problem here is of course that Internet Explorer implemented it differently and returned null if the attribute was absent which kind of makes sense so you can distinguish between absent and empty attributes, but it breaks the portability of the DOM API. So use the hasAttribute() method first before you get the attribute.

(Internet Explorer, browsers based on Gecko, browsers based on KHTML and not yet public releases of Opera all return null because developers are incompetent fools or morons. Whatever title suits you.)

Once again, the real world does not play ball. Lets hope the DOM Scripting Task Force is on it.

Comments

  1. Anne, are you mad about not going to d.Construct or something? Because ever since that was announced, you haven't really posted anything constructive, just rants and, dare I say it, bickers and whines.

    It'd be nice to see more constructive posts again.

    And while I'm here, anyway, I find it very amusing that you advocate against XHTML so fiersomely, yet you demand your visitors to use well-formed XHTML for comments. It's bad enough we have to manually enter paragraph tags for comments, but in your case, it's worse that it has to be well-formed XHTML. Kind of destroys your credibility, if you ask me, but maybe that's just me...

    Posted by Faruk Ateş at

  2. Anne wishes to store well-formed XHTML on the back-end (in itself, a laudable goal), and convert it (using string substitutions instead of XSLT, or some other real HTML serializer) into HTML for output.

    Why he want to work this way is a bit of a mystery. In the early days, I used to amuse myself by seeing how many ways I could sneak valid XHTML code past his preview "validator" (which isn't really a validator either), which would turn into invalid HTML 4 on output.

    Posted by Jacques Distler at

  3. That's the thing exactly. One can just convert it to well-formed XHTML on the back-end, no need to demand that from the user. My CMS takes plain and simple text as input for comments, and formats it to well-formed XHTML with paragraphs and everything. No need to require so much as a whit of clue about HTML (or XHTML) from the user.

    Of course, if you want to be an elitist, then it makes sense to require clue from your users...

    Posted by Faruk Ateş at

  4. All DOM implementations I know, including server side implementations, return null for getAttribute when the attribute does not exist. This is the way most modern programming languages work, so the DOM is wrong here. DOM level 3 was released in 2004, when this practice was wel established, so they easily could have fixed this (like the CSS 2.1 spec.) So blame the spec writers.

    Posted by Sjoerd Visscher at

  5. Anne is paid to hunt down bugs, what else did you expect him to write about?

    Posted by Mark Wubben at

  6. That's the thing exactly. One can just convert it to well-formed XHTML on the back-end, no need to demand that from the user. My CMS takes plain and simple text as input for comments, and formats it to well-formed XHTML with paragraphs and everything. No need to require so much as a whit of clue about HTML (or XHTML) from the user.

    Well, I assume from the fact that you no longer serve your weblog as application/xhtml+xml that, even for you, ensuring well-formedness from user-input is not a completely trivial task.

    My point was otherwise: converting XHTML to HTML serves no particular useful purpose, at least in this context. Converting XHTML to HTML using string substitutions, however, should be (to use Anne's favourite phrase) considered harmful.

    Posted by Jacques Distler at

  7. Yeah, I think the last thing the DOMScripting Task Force should be focusing on is changing a consistent behaviour across all browsers just to match a recommendation. Let's work on the inconsistent ones first.

    And Anne has always had a bit of curmudgeon in him. It's part of his charm. ;)

    Posted by Jonathan Snook at

  8. Jacques, actually, the reason I do that is because I feel no need to serve it as a/x+x to others - I still serve it as true XHTML to myself. The reason the site is on an HTML-MIME type now is because it's in live, public development, which isn't the most reliable environment for doing true XHTML in (as I'm sure you'll agree with).

    I don't feel that there is a serious challenge in ensuring well-formedness, as long as you use the right tools.

    Posted by Faruk Ateş at

  9. he reason the site is on an HTML-MIME type now is because it's in live, public development, which isn't the most reliable environment for doing true XHTML in (as I'm sure you'll agree with).

    I don't feel that there is a serious challenge in ensuring well-formedness, as long as you use the right tools.

    Don't those two sentences contradict each other?

    To some extent, I'm sympathetic. I serve my site as text/html to XHTML user agents (Opera, Safari and Camino 0.x) which would derive no benefit from receiving the correct MIME-type (because they're not MathML-aware).

    But that's a legacy of the fact that my site used to contain named entities (e.g.  ) which would cause such XHTML UAs to choke. Now that all named entities (except for the safe 5) are converted before being sent over the wire, it would be safe to send these UAs real XHTML. But I decided that I'm only going to do so if (like Camino 1.x) they promise to support MathML sometime in the future.

    It seems to me (I know you disagree) that the whole point of going to the trouble of ensuring well-formed XHTML is to be able to serve it as application/xhtml+xml. Maybe your CMS uses real XML tools to manipulate your content (does it?), in which case, there's some back-end benefit to well-formed XML, even if no one else ever sees it. (Anne's CMS just pushes strings around, so there's really no point for him to be using XHTML on the back-end.)

    Posted by Jacques Distler at

  10. And while we are on topic, this is exactly the reason to disable comments on a number of posts. The occasional flamer comes by and ruins it all.

    Jonathan, because all those browsers (most notably Internet Explorer and Mozilla) did not conform Opera (and Safari) had to change their behavior to something that is not compliant. But thanks for letting me know that you (and perhaps the WaSP) care more about interoperability than standards. So do I.

    Posted by Anne at

  11. The point in doing some form of XHTML in my backend is just for fun and the exercise. It also allows me to produce Atom feeds without escaped markup. (Which incidentally are not really supported by the feed reader I use.)

    Another reason is to stop some people from commenting. Unfortunately some are persistent enough to do it anyway and talk about something completely different than what the subject of the post is. Oh well. Guess you can expect that.

    Posted by Anne at

  12. And once again it is shown how many (at least at first sight) redundant things one has to do while DOM-scripting. Nevertheless, the time to do things differently would have been while the specification was still in draft-status.

    Bitching is the thing Anne loves to do, live with it or don't visit his log. ;)

    Posted by Frenzie at

  13. @Jacques Distler, character entity references doesn't need to be changed to numeric ones if you used the corresponsing DOCTYPE (XHTML doctype in this case).

    Posted by minghong at

  14. [C]haracter entity references doesn't need to be changed to numeric ones if you used the corresponsing DOCTYPE (XHTML doctype in this case).

    DTDs are irrelevant. Browsers use non-validating XML parsers, and never read the DTD. Instead, they use a table of pseudo-DTDs for "known" DOCTYPEs. The upshot is that, if the browser "supports" the particular DOCTYPE in question (in my case, XHTML+MathML), then it will resolve the named entities defined in that DTD. If the DOCTYPE is unsupported (as in Opera or Safari or Camino), or if you omit a DOCTYPE declaration, then you are restricted to the 5 "safe" named entities. Since you cannot be assured that a (self-proclaimed) XHTML User Agent will actually resolved the named entities in your document, you need to convert them before sending it over the wire.

    But Anne knows this well, and he is not interested in further discussion of this or other off-topic matters.

    Posted by Jacques Distler at

  15. I don't feel that there is a serious challenge in ensuring well-formedness, as long as you use the right tools.

    The unfortunate thing is that PHP has holes in the infrastructure, so the right tool may not be readily available or not at all available. (I guess one could argue that PHP itself is not the right tool, then. ;-) Eg. Java (and I do not mean JSP) has the infrastructure (even if not always as part of the platform libraries), but the initial investment required to get things running is higher than with PHP.

    On-topic: The issue described here is not at all the only design weirdness/problem with the DOM. However, for most things, it is too late to fix the DOM. :-(

    Posted by Henri Sivonen at

  16. Via that page I found a slide called Reasons for DOM Ugliness. The funny thing is that browsers actually break with the DOM. Makes you wonder.

    Posted by Anne at

  17. It's interesting how quickly the topic of this post changed from the DOM to the XHTML processing of Anne's comment system. Oh well, I'm not exactly sure what the topic is anymore, but here goes anyway.

    I've always found it annoying that browsers returned null for a non-existent attribute, I just never realised they were all non-conformant because they were so consitent with each other. It makes doing this unsafe without checking hasAttribute() first: element.getAttribute.toLowerCase() (or some other String function). When it returns null, that causes a scripting error and as a result, I've always checked if the attribute exists first

    Although, ideally, implementations should follow the spec as closely as possible and this should never have happened, I do have to agree that interoperability is more important than what the spec says and, in cases such as this, the spec should be ammended. That's essentially what has happened with CSS 2.1 being rewritten in some parts to match the flawed implementations of CSS 2.0, which in turn actually increases interoperability with all current and future UAs, as they're (hopefully) written to match the new spec. It is also exactly what is happening with HTML5.

    Posted by Lachlan Hunt at

  18. Anne, if you would open comments on all posts, we could have on-topic debates just fine, but when you close comments on every post I would have something to comment about, I'm going to (ab)use this opportunity to bring it to you and others' attention. And please, "the occasional flamer" ? Exactly how would you describe yourself, then? Assaulting others repeatedly in public without allowing them the chance to make a remark about it on your own site? That's the cowardly flamer, Anne, and that's what you have been doing, here.

    Jacques, my CMS uses XML parsing on all user submitted content (be it in the admin panel or a comment on the site). I don't use the XML mimetype because I find it of no real importance at this time. I made the CMS be capable of doing real XHTML as a challenge and as proof that it can be done, even for a business level CMS. Now that I've done that, I am going back to what I feel is the important thing here: teaching others how to code properly. XHTML is just my example of choice, because the strictness makes it a lot easier for people, and the fact that it's different from HTML makes it easier for people to make the change from old-school table markup to new-school CSS-based sites. It's a psychological thing, well-researched, and that's why I choose XHTML.

    Posted by Faruk Ateş at

  19. I've always found it annoying that browsers returned null for a non-existent attribute, I just never realised they were all non-conformant because they were so consitent with each other. It makes doing this unsafe without checking hasAttribute() first: element.getAttribute.toLowerCase() (or some other String function). When it returns null, that causes a scripting error and as a result, I've always checked if the attribute exists first

    Don't you need to check whether the attribute exists, anyway? Just because the script didn't return an error doesn't mean it did the right thing. Not distinguishing between attribute="" and the nonexistence of attribute will, surely, in most cases lead to undesired results.

    Since you need to check for existence either way, the only question is whether this can be done with the same method-call (if (e.getAttribute()) {...}) or requires a different one (if (e.hasAttribute()) {...}). The DOM API provides for both. Apparently, the parsimony of using one method call instead of two appeals to many web authors. And Javascript implementors have obliged.

    Posted by Jacques Distler at

  20. Faruk, this post is about getAttribute not being implemented properly in browsers because developers follow the market leader rather than standards. It also makes the implicit point that standards are perhaps nice, but interoperability is preferred. Could you please give comments on that. If you want to comment on posts of which the comments are closed you can write a weblog entry. I will read them. Thanks. If you want to talk about my strange ways of accepting comments from people you can contact me.

    Posted by Anne at

  21. I just love the 'Why specs matter' article by "Dive into" Mark. Reminds me of xhtml enthusiasts :)

    Since all browsers return 'null', wouldn't it be a quite fantastic idea to simply change the spec and have the corresponding rules written with this situation in mind?

    We all know it's FAR easier to change a spec document than to get the whole world to update to an updated browser which doesn't return 'null'.

    Jacques' point about having to check anyway is rather insightful too.

    Posted by James / AkaXakA at

  22. The problem is, although some suggestions have been made to make the returning null ECMAScript specific, that the DOM API is not just for web browsers, but also covers the APIs which can be used in Java or C++. And there is at least one language which can not return null when a DOMString is expected as noted in the e-mail pointed to by the post.

    Posted by Anne at

  23. And there is at least one language which can not return null when a DOMString is expected as noted in the e-mail pointed to by the post.

    One of the 300 reasons no one has succeeded in writing a DOM implementation in COBOL.

    Just because a particular language doesn't support a particular construct is not by itself an argument one way or the other. "We'd like it to be possible to implement the DOM in language X, but X does not support construct Y." is an argument for not requiring construct Y in the DOM API, but only if we agree that having an implementation in language X is an important goal.

    Posted by Jacques Distler at

  24. First of all, let me say that I appreciate you bringing this issue up, but a little correction though: no currently released version of Konqueror returns null on getAttribute of non-existing attributes. So we're basically in the same position as Opera.

    And, like you, our next release, 3.5.1, will likely be changed, for the very same reasons --- just too many websites rely on this, including some very high-profile ones.

    Posted by Maks Orlovich at