Anne van Kesteren

WYSIWYG and presentational markup

Opera, just like Internet Explorer and Safari, supports the contenteditable attribute and document.designMode now. As well as document.execCommand which accepts various editing commands as first parameter to make it easier for authors to build their own WYSIWYG editor. There has been some discussion around the markup it spits out though. For example, the italic command used to create the i element. I guess it still does in release builds. This has been changed to the em element. Fortunately not based on the reasoning of certain people (they exist) that em is “more semantic,” but simply for compatibility with Internet Explorer. (Now em is more semantic, don’t get me wrong, it’s just that people that use Ctrl+i not always mean to emphasize the text.)

Other things that have been raised is that changing the font size generates a font element. Apparently some people prefer a span element with a style attribute. “More semantic.” If you want “semantics,” don’t offer these features in your editor. It’s really that simple. We do support styleWithCSS at the moment from the Midas Specification (I actually think we support most of the commands there), but it’s really crappy. It basically is about that inline style attribute mentioned before. I guess it’s really hard to make an editor that does semantics, is still generic and is not “abused” by people. If not impossible.

Anyone suggestions on how to improve this?

Comments

  1. Yes but maybe it would be too long for a comment. Care for a weblog post tomorrow? It is too late in Tokyo right now!

    Posted by karl at

  2. I think web authors need today a generic API to edit markup in a web page rather than a way to add <i> or <em> tags.

    It's good to implement features supported by the other browsers if users need them, but I don't think that changing their behaviours is relevant because adding an <em> where people expect <i> on the only argument of semantic could also break applications.

    A deep work about a generic API would be more profitable to web application authors. I know that it's very hard because of the number of case there are when a user select some text or press the enter key, but please don't let this work to web authors : implement it into web browsers.

    Posted by Grégoire Cachet at

  3. I'm very into semantics and I just won't use designMode and execCommand. I prefer to go just with contentEditable and js.

    It's maybe a little out of topic.. but will events dispatching be fixed on contentEditable anytime soon? I mean keyboard events and focus/blur events (at least to me they not act as expected)

    Posted by medyk at

  4. I think to mimic and implement as much as IE's functionality (however bad it is) is a very good way to go. But even when IE's functionality is 100% covered (which I suppose it never will be, but anyhoo), the job is not done. Then the work should go on to what can be done to improve the interfaces and, as Grégoire writes, build a generic and standard API for these features, that hopefully can be ratified in W3C.

    I do not think the arguments about <em> being more semantic than <i> (and especially <span> being more semantic than <font> ) is very fruitful. Nor necessarily correct. But that's not the point. When people want a portion of their text to be italic, they want it to be italic. Really. They do not have any semantics in mind or deeper meaning with it (some of them might, but we don't know who), they just want the text skewed a bit, and for what reason is completely unknown to us. If we want it to be more semantic, we have to use and name that in our applications, we can't redefine the operation "Italisize" to "I want this text to be emphasized", because they are two different operations.

    Another point is compatibility. These are interfaces defined by Microsoft, implemented in Internet Explorer and now being reverse engineered to be supported in other browsers like Opera. Developers expect these interfaces to work like they work in IE and they shouldn't be "improved" when implemented in other browsers. The interfaces mostly suck (from a programmatic, architectural or semantical point of view), but that doesn't matter. They are what they are and should be left that way, also when implemented in other browsers.

    Improvements can be done by defining a new set of more well-thought-out interfaces, but we cannot improve Internet Explorer's existing interfaces in any way. That is also a pretty weird thing to want to do, considering how bad those interfaces are to begin with. Creating something new and better should be preferrable to most developers, no?

    Posted by Asbjørn Ulsberg at

  5. When a WYSIWIG editor on a webpage allows a visitor to add content to the page, the semantics stop where the user-entered text starts. The only thing you can say about the text is "this is a comment", which could be done using <textarea contenteditable=true readonly>.

    It's a shame that WYSIWIG editors generate HTML. If they generated (for example) RTF which could only be displayed back using an appropriate element, then this question would never have come up.

    Posted by Milo at

  6. WYSIWIG = What you see is what I get. Obviously.

    Posted by Milo at

  7. Sure, semantic markup is better than purely presentational markup -- but only if the semantics are correct. Using <em> for all italics is no better than the old practice of using <h5> for small bold text. The problem is that WYSIWYG tends to emphasize the presentation, so any attempt to map that to semantics is likely to be wrong. (This very problem is one of the reasons I quickly ditched the WYSIWYG editor in WordPress.)

    I can think of two solutions, one more reliable, and one more convenient. The reliable one is to offer the semantic elements in addition to the presentational ones, so that people who want to make a distinction can do so. That clutters up the interface, though. The alternative would be some sort of heuristic that would try to determine from context whether the italicized text is intended as emphasis, a title, a citation, etc.

    Posted by Kelson at

  8. I've followed the web standards movement and the rise of semanticism since the beginning, and really there's no end to the amount of pedantic hand-wringing that web geeks can engage in over the use of <em> vs <i>. But out in the real world, the majority of web designers are so far away from understanding the purpose of markup, or even the meaning of "semantic" that such discussions in the context of commercial web design tools are moot.

    The first thing to realize is that HTML is semantically bereft. I'm all for using HTML to its semantic fullest, but let's be honest, if you need serious semantics you need to look to XML, XHTML 2 or RDF or something else. The fact is that 99% of web pages are created by a team of people with no knowledge of semantic markup, and will never be indexed by anything other than a search engine spider, before being replaced in 3-5 years. In this context, the value of getting the semantics of your italic text is surely worth much less than the time it takes to ponder it.

    The reason CSS and semantic HTML took off was because they are a time saver. Yes, even for the mundane task of formatting a website in IE, standards save a lot of time if done right. I work with clueless Dreamweaver monkeys who spew out things like 9-cell tables with bgcolors and spacer gifs to create a border. Yes, 2 pages of markup to recreate border: 1px solid red;. These people are out there and billing at $100/hour. So obviously people need to first understand the first benefits of standards to the existing goals before we can ever hope to engage them in a conversation about real semantics.

    Certainly the tools could be better. Dreamweaver could be retrofitted with a bunch of icons for all the different HTML tags, and we could train all the monkeys religiously. They could remove all in-line styling and emphasize generic tag styling. But pretty soon you hit an insurmountable barrier for the majority of designers out there, which is how to properly make basic styling decisions:

    Tools can make it easier to do the right thing (and they should), but ultimately it takes people to make good semantic decisions. Computers are fantastic at manipulating symbols, but they can't understand semantics, and they can't abstract away the need for designers to understand them.

    Posted by Gabe da Silveira at

  9. Others have posted many interesting comments, there's not much I can add.

    To me it seems that trying to make a semantic WYSIWYG editor is somewhat pointless. While those who really understand the benefit of semantic markup will benefit from such an editor (but these people tend to write HTML by hand anyway), for the others each new, more semantic editor is acually a bigger challenge. They will spend their time solving an inverse rendering problem: instead of first thinking I want to emphasize this text, and then it's nice to have emphasized text rendered italic, they will first think I want this text renreded as italic and then if I mark this text as emphasized, I'll get what I want. No matter how far the semantic level of the editor will be from the actual fonts and pixels — they will just solve a harder inverse rendering problem.

    My point is that the web will suffer more from those inversely derived semantics than it will benefit from the real semantics produced by people knowingly using the semantic editors.

    Posted by Alexey Feldgendler at

  10. Alexey has a point. Better to use <i> for emphasis than abuse <em> for non-emphatic italics.

    Posted by Aristotle Pagaltzis at

  11. You know what. I have been obsessed with validating all of my sites xhtml strict by the w3c standards and guess what, i and font just don't validate.

    We'll throw out a hypothetical situation too, if the search engine can crawl my site easier and faster which leads me to getting results easier and quicker, why should I even consider using i and font.

    I'm currently trying to figure out how to hack the wysiwyg in WordPress so I can replace all font with span. When it's my site it's no big deal hard coding it because I know what I'm doing, but when I'm helping everyone else, the minute I leave them on their own the editor start devalidating everything because of the default bad coding.

    I have learned if you want something coded right, you gotta code it yourself. If you don't know what w3c xhtml strict validation is, you better find out because that will help you present and create the purest code that you can and cleanest code for the search engines.

    Posted by Brandon Buttars at

  12. <i> does not validate? Which universe do you live in? (And I dearly hope you’re not using <em> simply to italicise text. That would be wrong.)

    Replacing <font> with <span> is fine I suppose; both are equally meaningless and presentational, so if you’re fond of pointless work I guess that’s OK.

    What any of this has to do with search engines I don’t know, since search engines will take a dog’s breakfast and make sense of it – by sheer necessity, in a world where 99.5% of the web is invalid HTML. Not that the argument would make any sense if they required valid markup; to say nothing of the fact that <i> is valid. Talking about search engines is completely bogus in this context.

    Lastly, I would bet good money that Anne himself as well as nearly every commenter here knows a good deal more about XHTML and the validation thereof than you do.

    Posted by Aristotle Pagaltzis at