About Me

I'm just someone struggling against my own inertia to be creative. My current favorite book is "Oh the places you'll go" by Dr. Seuss

Sunday, November 11, 2007

JSON, XML and the Relational Model part 3

Okay, let's talk XML. What is XML in terms of the set theory concepts from the last part.

Let's start with an element. An element in XML looks like this:


Okay, so an element consists of a few parts. We have the element name "element". we have a set of attribute names {attribute1, attribute2, attribute3}. The set of attribute names has all the properties of a set that I laid out in part 2. Each attribute forms one half of an ordered double. Finally we have content, which ultimately, is a tuple of nodes, with all the properties of a tuple discussed previously.

A node can be an ordinary text string, a cdata section or another element. Thus, we come full circle, and demonstrate how xml nests.

The key point here is that in XML you cannot nest in a set. To put another way, an attribute value cannot contain an element, or a tuple of elements. The only opportunity for nesting in XML is nesting a tuple of nodes in the content of an element.

What this ultimately means for XML and JSON together, is that JSON ends up slightly more expressive than XML, since it places no such restriction on data structures. Any of the JSON elements (object, array, string, number, etc) can be placed in either an object value, or an array value. (Set and tuple)

There are many approaches to conversion of XML to a JSON notation, or Javascript object. The following article on XML.com describes one approach:

converting between xml and json

One thing you may notice in that article, is that the approach taken is generally insensitive to whether an aspect of JSON or XML is a set or a tuple. The goal appears largely to be a conversion that produces the most aesthetically pleasing example code. No attempt seems to be made at understanding either format, or their logical differences and similarities.

In that mapping scheme, The attributes of an element become JSON object properties, and an @ is prefixed to the property names. This runs into no difficulty, since as I've covered, attribute names in XML are a set, just like property names in JSON. There is no mismatch here. Also, the names of that element's child elements become JSON object properties on that very same object. This will run into problems. In fact, the author of that article encounters the problem on page 2:

An attempt to map a structured XML element...


...to the following JSON object:

"e": {
"a": "some",
"b": "textual",
"a": "content"

yields an invalid result, since the name "a" is not unique in the associative array.

Indeed, it is not a valid conversion. The author is mapping an XML tuple, to a JSON set, but a set is not a tuple. Does the author then conclude that it is an invalid approach, scrap it, and try and think of something better?

So we need to collect all elements of identical names in an array. Using the patterns 5 and 6 above yields the following result:

"e": {
"a": [ "some", "content" ],
"b": "textual"

Now we have a structure that doesn't preserve element order. This may or may not be acceptable, depending on whether the above XML element order matters.

Oof. Nope. Instead, this approach scrambles the data even further, in an attempt to preserve those pretty code samples on the first page of the article. The author then goes on to conclude:

This example demonstrates a conversion that does not preserve the original element order. Even if this may not change semantics here, we can do the following:

1. state that a conversion isn't sufficiently possible.
2. tolerate the result if order doesn't matter.
3. try to make our XML document more JSON-friendly.

The first conclusion is assuming that this arbitrary scheme the author has devised is the only *Possible* way to convert from XML to JSON. The second conclusion is assuming that there are situations in software design where you can get away with bad logical mistakes. And the third conclusion is strangest of all. He is concluding the problems with the conversion are not due to logical mistakes in the scheme, but are due, he believes, to JSON being inherently *less* expressive than XML. I hope that I have already demolished that conclusion at the start of this blog post.

However, if you are still not convinced, stay tuned for my alternate conversion scheme in part 4.


Anonymous said...

I'm looking forward to Part 4 if you haven't given up on it (judging by the date).

Breton Slivka said...

Oh heyoh hi. Didn't know anyone was reading. I'll work on a followup soon as I can then.