Subversion Property Namespaces

Today someone sent me an email to see if I had tracked down a problem I was having with Subversion changing WebDAV namespaces for custom properties , so I thought I'd write an entry about my experiences and explain a little more about Marmox in the process.

As I mentioned in my first blog entry, the Marmox™ system, which is running this web site (including this blog), maintains all the resources for a user inside a Marmot repository. A repository is a general resource-storage interface—many types of data stores can be used in the actual implementation, and they all work together to provide a seemless front-end for accessing everything from pictures to music. Different repositories can be mapped to different implementations—again, all seamless. I could map some subtrees to store data on Amazon's S3 service, and other subtrees to store data on GMail.

On this site, for example, the entire /photography/ subtree is currently backed by an external WebDAV server, while the entire /blog/ subtree is hosted on a an external Subversion server. As you can see, accessing photographs in the photography section use the same Marmox system as the blog, but all my blog entries are automatically versioned. Moreover, even metadata changes are versioned on the /blog/ subtree. When I change the title of an entry, or when I add a subject keyword to an entry, the history of these property changes is automatically saved.

Marmox allows each resource in a repository to have an infinite number of properties (which can themselves have properties, and so on) using a comprehensive resource/property framework I created called the Uniform Resource Framework (URF)—think of it as RDF yet simpler and more consistent and with more features. Each URF property is identified by a URI, so using WebDAV as a back-end seemed natural. Unfortunately I found a decade-old WebDAV mod_dav bug that corrupted property namespaces, but the Apache community was quick to track down the problem and fix it; it has now been incorporated into the main Apache release and Apache mod_dav works behind Marmox with no problem.

Because Subversion runs on top of WebDAV, turning on autoversioning should allow Subversion to work behind Marmox with virtually no code changes. Unfortunately, nothing is ever simple when you try to use advertised features to produce innovative new functionality. It turns out that whatever URI I send Subversion to identify an URF property for a Marmot repository, Subversion changes the namespace to its own namespace! For example, if I want to set the Dublin Core title of a property, I'd use the property URI <http://purl.org/dc/elements/1.1/title>. Subversion happily turns this into the special Subversion property URI <http://subversion.tigris.org/xmlns/custom/title>. The point of this is beyond me—obviously if the calling program later tries to retrieve the property value, it won't get anything because the property has been stored under a different URI.

I asked the Subversion community about this back in March 2007, but got no reply. So I brought it up again last December, but still no one replied. So I had to work around the problem; here is the approach I took.

Because Subversion unconditionally switches to the <http://subversion.tigris.org/xmlns/custom/> namespace, it seems futile to resist. So I decided to simply store my entire property URI as an XML local name within that namespace when communicating with Subversion+WebDAV. Several URI characters, such as '/' and ':', are not valid XML name characters and would have to be encoded. The most obvious option might be to use percent-encoding, which is the default encoding of URIs anyway, but the '%' character is not a valid XML name character. Most of the valid XML characters were also valid URI characters, which means that when unencoding the original Marmot resource property I wouldn't know if the character was an escape character or a normal character, unless I turned around and double-escaped whatever escape character I used. That would work as a last resort, but it wastes network bandwidth, is ugly, and is hard to read in an HTTP communication trace.

As luck would have it, however, there is one character that is an XML name character but not a URI character: the middle dot, '·' (U+00B7). Encoding non-XML name characters with '·' allows the encoded URI to serve as an XML local name in Subversion+WebDAV communication, and it also removes any ambiguity: because the '·' character is not a valid URI character, I know that it wasn't meant to be a literal character in the original URI!

This works very well in practice. When the Marmot Subversion WebDAV repository implementation in Marmox does a WebDAV PROPFIND on the Subversion back-end, here's part of the XML document that will be returned:

<?xml version="1.0" encoding="utf-8"?>
<D:multistatus xmlns:D="DAV:" xmlns:ns0="DAV:">
<D:response
xmlns:S="http://subversion.tigris.org/xmlns/svn/"
xmlns:C="http://subversion.tigris.org/xmlns/custom/"
xmlns:V="http://subversion.tigris.org/xmlns/dav/"
xmlns:lp1="DAV:"
xmlns:lp3="http://subversion.tigris.org/xmlns/dav/"
xmlns:lp2="http://apache.org/dav/props/">
<D:href>/garretwilson.com/www/blog/</D:href>
<D:propstat>
<D:prop>
<C:http·3A·2F·2Fpurl.org·2Fdc·2Felements·2F1.1·2Fcreator>
...
</C:http·3A·2F·2Fpurl.org·2Fdc·2Felements·2F1.1·2Fcreator>

I've removed the actual value of the Dublin Core title property; I'll explain that on another day. For now note that the custom Subversion namespace has been mapped to the "C" prefix in the XML communication, and the "real" property URI <http://purl.org/dc/elements/1.1/title> functions nicely, in readable form, in the XML element as <C:http·3A·2F·2Fpurl.org·2Fdc·2Felements·2F1.1·2Ftitle>.

Subversion has many wonderful features and functions and works well as one possible back-end for Marmox. It's a shame that no one has cared to make sure that its property handling is WebDAV compliant, but thankfully in this case it was a shortcoming that could be worked around.