The (Still) Sorry State of JavaScript XHTML Editors

Over three years ago I surveyed the state of JavaScript-based HTML editors embedded in web pages, and found that overall they did a crappy job of basic functionality and standards-compliance. With the growing popularity of HTML5, I grew dissatisfied with my choice of TinyMCE, so I revisited the options. Unfortunately, I've found that, even though there have been a few changes, it still seems nobody cares to get the basics right. I did change my choice of editor for my Guise framework, though.

TinyMCE

In 2008 I chose TinyMCE because it was the only editor that came close to outputting clean, standards-compliant XHTML code. But its facility for adding new elements is atrocious, forcing me to create an entire set of plugins just to get new inline elements such as <code>. For HTML5 I needed a new block element, <figure>. Unfortunately the supposedly official approach to easily configuring new block elements to appear in the drop-down selector doesn't work and hasn't worked for years, even though many people keep asking. I filed a bug, but there's no evidence anybody will be working on this or even looking at it any time soon.

To make matters worse, even if I accept invisible entries in the drop-down, TinyMCE 3.4.7 mangles my new HTML5 structure. I enter this:

<p>I'm gonna have a figure:</p>
<figure>
<a href="product.jpg" rel="object">The figure.</a>
<figcaption>The figure</figcaption>
</figure>
<p>Testing.</p>

TinyMCE happily changes it to the following—which isn't even correct HTML5:

<p>I'm gonna have a figure:</p>
<p> </p>
<p><figure> <a href="product.jpg" rel="object">The figure.</a> <figcaption>The figure</figcaption> </figure></p>
<p> </p>
<p>Testing.</p>

Time to move on.

CKEditor

Since I last checked out FCKEditor, it has been completely rewritten and renamed to "CKEditor". The web site for version 3.6.2 proclaims "Quality XHTML Output". So I spent a day integrating CKEditor into Guise™, started up the server, and… my site died! You see, Guise™ uses true XHTML (complete with namespaces), and serves the pages as true XHTML with a content type of application/xhtml+xml (except, of course, on brain-dead browsers such as IE). It turns out that CKEditor can only work if served in a document of content type text/html. Yes, that's right: this wonderful, supposedly XHTML-compliant editor will not work at all if actually served in a true XHTML page—and nobody cares to fix it.

So I spent several days converting Guise™ to produce text/html pages (which eventually involved ripping out all Guise namespace usage in favor of HTML5 data attributes because of even more incompetence on the part of browser developers), and CKEditor appears. It looks nice. But now I have to get the data out of it. Unbelievably, CKEditor provides no event to signal that its data has changed and they aren't going to add one! Luckily, someone wrote an onChange plugin that is simple to install and works pretty well—the author has my gratitude.

So we're getting closer to getting CKEditor working. It has a pretty snazzy facility for adding new "styles", which can even include new block elements, making adding support for new HTML5 elements pretty straightforward. So I typed up a bit of text for testing, and tried to save it. My Marmox framework informed me that my output wasn't valid XHTML. What!? What happened to that wonderful XHTML support CKEditor talked about?

XHTML is basically HTML that follows the well-formedness rules of XML and is served as application/xhtml+xml. For a while the W3C was developing some rigid DTDs, but aren't required to use these DTDs to have XHTML. The rub is that XHTML, because it is simply a variation of XML, does not natively recognize HTML entities such as &nbsp;. Sure, if you use certain XHTML DTDs these HTML entities will be defined, but they aren't guaranteed to be processed by an XML parser. Without such a DTD (nobody uses the DTDs anymore, especialy for HTML5), it's guaranteed that these entities will not be available (unless you go out of your way to define them in an internal DTD). The HTML5 specification recommends never using HTML entities in XHTML documents, whether you use DTDs or not. You should simply encode them using the standard XML facility for numeric encoding of entities.

I didn't want non-breaking spaces to begin with, but here CKEditor is adding them in a way that breaks XHTML compliance completely. I mean, the document won't even parse! So much for "Quality XHTML Output". According to the documentation, it would appear that I could configure CKEditor not to produce &nbsp;, or at least use some other encoding than the entity name, but the CKEditor facility for configuring entities is broken. So at the end of the day I'm forced to turn off as many entities as I can and do a search/replace for &nbsp; on the output, hoping that CKEditor doesn't sneak in some other entities somewhere else.

So I've switched to CKEditor, which required days and days of structural changes on Guise and lots of research and fixups for CKEditor. After integration, CKEditor looks nicer than TinyMCE, and if for nothing else the easy facility for configuring new tags (and the straightforward configuration approach overall) will probably keep me from going back. But frankly, the whole state of affairs is just as sorry as before. Once again, it's a shame no one cares enough just to fix basic compliance issues. Sometimes I'm surprised that anything works on the web.