Anne van Kesteren

`contentEditable`

6 July 2005

Yesterday was my first day. Monday I arrived here at 20:20 and I was at the office a quarter past ten or so. After that I went to my apartment together with Christian and Markus. Next morning I got my security pass, a computer and a place to work on projects. At lunch time I met quite a few people. Apparently Opera hires all their fan boys. I met Arve, Moose, Mark, Ian, Tim and Claudio. Actually, I believe there were more people I already saw at Opera forums or knew from their web sites, IRC, et cetera, but I can't recall them at the moment. Sorry about that.

Håkon is on vacation at the moment, but he did manage to send me an e-mail telling me I could work on testing CSS printing. Not yet the CSS3 Paged Media Module or CSS Print Profile. More or less the paged media module from CSS 2.1.

However, before that e-mail arrived Ian tracked me down and put me on on debugging contentEditable in Internet Explorer. Quite cool. Except for the fact that IE is horribly broken. Totally, utterly, completely broken. Or rather, it works, but it is messy as hell. Ever seen nested paragraphs? If I recall correctly HTML 4.01 says something like “the P element must not contain block level elements including P itself,” who cares? Microsoft doesn't, for one. They will also happily embed the P element in all kinds of inline elements, like A or SPAN.

Let me give you a small example. I found about this today. Until now I thought IE would at least always generate a somehow well-formed document. I was wrong. Sample document:

<!DOCTYPE html>
<html>
 <head>
  <title>SPAN element</title>
  <body>
   <p><span contentEditable="true">Test</span></p>
  </body>
</html>

I paste the following between 'Te' and 'st' (the text is 14 points with a typeface of Arial copied straight out of the OpenOffice document formatter):

Test

Test

The resulting DOM (using javascript:alert(document.documentElement.outerHTML) is this:

<HTML><HEAD><TITLE>SPAN element</TITLE></HEAD>
<BODY>
<P><SPAN contentEditable=true>Te </P>
<P style="MARGIN-BOTTOM: 0cm"><FONT face="Arial, sans-serif"><FONT size=4>Test</FONT></FONT></P>
<P style="MARGIN-BOTTOM: 0cm"><BR></P>
<P style="MARGIN-BOTTOM: 0cm"><FONT face="Arial, sans-serif"><FONT size=4>Test</FONT></FONT></P>
<P>st</SPAN></P></BODY></HTML>

I also used javascript:var a=document.documentElement.outerHTML;document.body.innerHTML="<plaintext>"+a; as bookmarklet which works pretty well. Especially when you need to copy the resulting DOM as shown above.

Now I could bore you to death with a lot more examples of broken rendering and strange things happening, but I guess you can figure that out yourselves. And if not, be happy that other people are doing it for you. By the way, if you have any top references other than then the first ten results in Google or so that are worth looking at please point them out in the comments or through private e-mail.

And if you really know the difference between designMode and contentEditable I would really appreciate it if you told me. I can make an educated guess, but I'm looking for the subtleties.

Comments

I'm not sure what subtleties you are referring to but designMode can only be applied to the document whereas contentEditable can be set on most elements. Otherwise, as far as I know, they expose MSHTML in the same way.
Posted by Jonathan Snook at 6:12PM
Holy crapoly that is horrible!
Posted by Faruk Ateş at 6:25PM
Yeah, nice one! Set the margin to 0, then add another element to pretend to be margin!
Nasty peice of work you've got there.
Posted by The Wolf at 7:46PM
Note that this way you can get DOM trees which aren't trees anymore, which puts a treewalker in an infinite loop. And it's not just pasting, just pressing return in this case will get you paragraphs in your span.
Posted by Sjoerd Visscher at 10:26PM
Well, try any CMS system based on MSHTML and you will see enough bad code that you seriously think about changing your line of work...
Posted by Robert Nyman at 3:03AM
I might be wrong, but I'm pretty certain Satan himself was involved in the design, implementation and development of MSHTML.
Posted by Asbjørn Ulsberg at 6:12AM
Sjoerd, only with pasting you get it as ill-formed as shown above. Take a closer look if you missed it. The start tag of the SPAN element is inside a P element and so is the closing tag. Horror!
Posted by Anne at 1:59PM
Well, granted that MSHTML generates crap, I'd be interested in seeing a (server side?) solution to cleaning up the mess. I admit I haven't even done a simple Google search before typing this, but I doubt there must be anything out there, considering the complexities of the crap that is generated.
Does anyone have any links to any such code, preferably in PHP? Ideally, it should clean up MSHTML mess to strip unnecessary tags, then take the useful tags and convert them to something that makes better semantic sense, and finally present it as the user intended by adding the necessary CSS rules.
Posted by Rakesh Pai at 3:05PM
Anne, yes, I noticed. But I don't like regular paragraphs in my spans either :).
<advertisement>Q42 has a client side solution that cleans up most of the mess. Advies Overheid.nl uses it.</advertisement>
Posted by Sjoerd Visscher at 3:58PM
Rakesh, you might want to look at http://www.w3.org/People/Raggett/tidy/
Posted by Calm_Pear at 4:03PM
A solution would be; attach to all contentEditable elemnts an onpaste event. get the content from the clipboard. Check what elements are allowed within the contentEditable area. Strip anything that is illegal, paste it into the contentEditable area and cancel the onpaste event.
Say this is my HTML:
<p contentEditable="true">content</p>
What if I have the following code on my clipboard, "content" selected and want to paste it into the paragraph:
<section><h>header</h><p>paragraph<em>some em</em></p></section>
To my mind, the custom paste behaviour should strip all illegal elements without removing the content and inserting a
after block level elements.
So the result has to be:
<p contentEditable="true">header<br>paragraph<em>some em</em><br></p>
Posted by Jorgen Horstink at 7:04PM
Firstly, good luck and best wishes in your new job.
I've done a lot of work with MSHTML and contentEditable over the last year, and it can be truly dreadful.
One thing I've found is that you should set contentEditable on a block-level element surrounding the paragraphs, rather than your example of setting it on a <span> within the paragraph. Wrap the editable stuff in a <div> with contentEditable set, and make sure there's at least one paragraph element (empty if necessary) in there. Otherwise, people start to type, hit Enter, and MSHTML suddenly introduces the <p> elements. (Sometimes it does other stuff instead...)
Pasting from Word has appalling effects: you get the same tag soup as from Word's "Save as HTML" option inserted into the page. The solution I've used is to re-parse the DOM, building up the structure in an MSXML DOM Document and throwing away the cruft so as to get something reasonable out. And that's when you may come across a truly amazing gotcha: MSHTML gets confused as to how deep it's nested stuff, and sometimes you will finish up with a DOM node of type ELEMENT_NODE and a tagName of /p - yup, it creates elements with a slash at the start of their name. Needless to say, if you try to do that from script, it throws an exception, but deep down inside they didn't call the entry point that validates the name.
I wish you luck in your explorations; you'll need it :-) I hope you'll be able to share some of the details of what you learn, as a lot of us would love to have the time to work out whether there's any rationality behind MSHTML editing, or if it really is a non-deterministic process.
Posted by Nick Fitzsimons at 7:17PM
I've fixed a CMS which used contentEditable in the past, such that it would generate well-formed (if not necessarily valid) XHTML fragments. In the end, I had to walk the DOM by hand and build up my own version of the HTML, rather than relying on anything the editor would product.
It didn't seem to have any problems, though, once I filtered the attribute list (and it was surprisingly fast, too).
It is horribly broken, though.
Posted by Mo at 8:56PM
As far as I know Absolut Engine got some kind of WYSIWYG to XHTML cleaning function: http://www.absolutengine.com/
Posted by dusoft at 10:44PM
Regarding the difference between designmode and contenteditable, perhaps this article could help you.
Posted by Bza at 4:31AM
Hey. I foolishly wrote a CMS using contenteditable.
My rather draconian fix was to hook into onpaste, strip all block level tags (and anything else that smelled funny) from the pasted code leaving just harmless, b's, em's and the odd href.
Seems moderately bearable although the user now wants the ability to copy and paste code from elsewhere in the CMS. I might wrap internal chunks of HTML in a marker saying 'clean html' and only get draconian on fragments from unknown sources...
Any advice on what to use if I was to start over?
Posted by Andy Scott Baker at 7:08AM