Monday, May 26, 2008

Converting XML to JSON

Why would I want to convert XML to JSON. Mainly because JSON is a subset of JavaScript (JavaScript Object Notation) and XML isn't. It is much easier to manipulate JavaScript Objects, then it is to manipulate XML. This is because Objects are native to JavaScript, where as XML requires an API, the DOM, which is harder to use. DOM implementations in browsers are not consistent, while you will find Objects and their methods more or less the same across browsers.

Since, most of the content/data available on the web is in XML format and not JSON, converting XML to JSON is necessary.

The main problem is that there is no standard way of converting XML to JSON. So when converting, we have to develop our own rules, or base them on the most widely used conversion rules. Lets see how the big boys do it.

Rules Google GData Uses to convert XML to JSON

A GData service creates a JSON-format feed by converting the XML feed, using the following rules:

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to String properties.
  • Child elements are converted to Object properties.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to $t properties.

Namespace

  • If an element has a namespace alias, the alias and element are concatenated using "$". For example, ns:element becomes ns$element.

XML

  • XML version and encoding attributes are converted to attribute version and encoding of the root element, respectively.

Google GData XML to JSON example

This is a hypothetical example, Google GData only deals with RSS and ATOM feeds.

<?xml version="1.0" encoding="UTF-8"?>
<example:user domain="example.com">
 <name>Joe</name>
 <status online="true">Away</status>
 <idle />
</example:user>
{
 "version": "1.0",
 "encoding": "UTF-8",
 "example$user" : {
  "domain" : "example.com",
   "name" : { "$t" : "Joe" },
   "status" : {
    "online" : "true",
    "$t" : "Away"
   },
   "idle" : null
  }
}

How Google converts XML to JSON is well documented. The main points being that XML node attributes become strings properties, the node data or text becomes $t properties and namespaces are concatenated with $.
http://code.google.com/apis/gdata/json.html#Background

Rules Yahoo Uses to convert XML to JSON

I could not find any documentation on the rules Yahoo uses to convert its XML to JSON in Yahoo Pipes, however, by looking the output of a pipe in RSS format and the corresponding JSON format you can get an idea of the rules used.

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to String properties.
  • Child elements are converted to Object properties.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to string properties of the parent node, if the node has no attributes.
  • Text values of tags are converted to content properties, if the node has attributes.

Namespace

  • Unknown.

XML

  • XML version and encoding attributes are removed/ignored - at least in the RSS sample I looked at.

The only problem I see with the rules Yahoo Pipes uses is that if an XML node has an attribute named "content", then it will conflict with the Text value of the node/element giving the programer an unexpected result.

Yahoo Pipes XML to JSON example

<?xml version="1.0" encoding="UTF-8"?>
<example:user domain="example.com">
 <name>Joe</name>
 <status online="true">Away</status>
 <idle />
</example:user>
{
 "example??user" : {
  "domain" : "example.com",
   "name" : "Joe",
   "status" : {
    "online" : "true",
    "content" : "Away",
   },
   "idle" : ??
  }
}

XML.com on rules to convert XML to JSON

The article on XML.com by Stefan Goessner gives a list of possible XML element structures and the corresponding JSON Objects.
http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html

Pattern XML JSON Access
1 <e/> "e": null o.e
2 <e>text</e> "e": "text" o.e
3 <e name="value" /> "e":{"@name": "value"} o.e["@name"]
4 <e name="value">text</e> "e": { "@name": "value", "#text": "text" } o.e["@name"] o.e["#text"]
5 <e> <a>text</a> <b>text</b> </e> "e": { "a": "text", "b": "text" } o.e.a o.e.b
6 <e> <a>text</a> <a>text</a> </e> "e": { "a": ["text", "text"] } o.e.a[0] o.e.a[1]
7 <e> text <a>text</a> </e> "e": { "#text": "text", "a": "text" } o.e["#text"] o.e.a

If we translate this to the rules format given by Google it would look something like:

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to @attribute properties. (attribute name preceeded by @)
  • Child elements are converted to Object properties, if the node has attributes or child nodes.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to string properties of the parent node, if the node has no attributes or child nodes.
  • Text values of tags are converted to #text properties, if the node has attributes or child nodes.

Namespace

  • If an element has a namespace alias, the alias and element are concatenated using ":". For example, ns:element becomes ns:element. (ie: namespaced elements are treated as any other element)

XML

  • XML version and encoding attributes are not converted.

XML.com XML to JSON example

<?xml version="1.0" encoding="UTF-8"?>
<example:user domain="example.com">
 <name>Joe</name>
 <status online="true">Away</status>
 <idle />
</example:user>
{
 "example:user" : {
  "@attributes" : { "domain" : "example.com" },
   "name" : { "#text" : "Joe" },
   "status" : {
    "@attributes" : {"online" : "true"},
    "#text" : "Away"
   },
   "idle" : null
 }
}

Other rules being used to convert XML to JSON

Here is a blog on the topic of an XML to JSON standard. http://khanderaotech.blogspot.com/2007/03/mapping-between-xml-json-need-standard.html.
A good discussion on the differences between XML and JSON. http://blog.jclark.com/2007/04/xml-and-json.html

We need a standard way of converting XML to JSON

I'm tired of hearing the "XML vs JSON" debate. Why not just make them compatible. Now, that we see just how many different rules are being used, we can definitely see another reason why a standard would come in handy. But till then, I think I'll add to the confusion and come up with my own ruleset.

My rules of converting XML to JSON

My rules are simple and is based on the XML DOM. The DOM represents XML as DOM Objects and Methods. We will use the DOM objects only since JSON does not use methods. So each Element would be an Object, and each text node #text property and attributes an @attributes object with string properties of the attribute names. The only difference from the DOM Objects representation in JavaScript is the @ sign in front of the attributes Object name - this is to to avoid conflicts with elements named "attributes". The DOM goes around this by having public methods to select child nodes, and not public properties (the actual properties are private, and thus not available in an object notation).

Basic

  • The feed is represented as a JSON object; each nested element or attribute is represented as a name/value property of the object.
  • Attributes are converted to String properties of the @attributes property.
  • Child elements are converted to Object properties.
  • Elements that may appear more than once are converted to Array properties.
  • Text values of tags are converted to $text properties.

Namespace

  • Treat as any other element.

XML

  • XML version and encoding attributes are not converted.

In order to convert XML to JSON with JavaScript, you first have to convert the XML to a DOM Document (to make things simpler). Any major browser willd do this either automatically in the case of the XML/XHTML Document you are viewing, or an XML document retrieved via XMLHttpRequest. But if all you have is an XML string, something like this will do:

function TextToXML(strXML) {
 var xmlDoc = null;
 try {
  xmlDoc = (document.all)?new ActiveXObject("Microsoft.XMLDOM"):new DOMParser();
  xmlDoc.async = false;
 } catch(e) {throw new Error("XML Parser could not be instantiated");}
 var out;
 try {
  if(document.all) {
   out = (xmlDoc.loadXML(strXML))?xmlDoc:false;
  } else {  
   out = xmlDoc.parseFromString(strXML, "text/xml");
  }
 } catch(e) { throw new Error("Error parsing XML string"); }
 return out;
} 

This will give you the XML represented as a DOM Document, which you can traverse using the DOM methods.

Now all you'll have to do to convert the DOM Document to JSON is traverse it, and for every Element, create an Object, for its attributes create an @attributes Object, and a #text attribute for text nodes and repeat the process for any child elements.

/**
 * Convert XML to JSON Object
 * @param {Object} XML DOM Document
 */
xml2Json = function(xml) {
 var obj = {};
 
 if (xml.nodeType == 1) { // element
  // do attributes
  if (xml.attributes.length > 0) {
   obj['@attributes'] = {};
   for (var j = 0; j < xml.attributes.length; j++) {
    obj['@attributes'][xml.attributes[j].nodeName] = xml.attributes[j].nodeValue;
   }
  }
  
 } else if (xml.nodeType == 3) { // text
  obj = xml.nodeValue;
 }
 
 // do children
 if (xml.hasChildNodes()) {
  for(var i = 0; i < xml.childNodes.length; i++) {
   if (typeof(obj[xml.childNodes[i].nodeName]) == 'undefined') {
    obj[xml.childNodes[i].nodeName] = xml2Json(xml.childNodes[i]);
   } else {
    if (typeof(obj[xml.childNodes[i].nodeName].length) == 'undefined') {
     var old = obj[xml.childNodes[i].nodeName];
     obj[xml.childNodes[i].nodeName] = [];
     obj[xml.childNodes[i].nodeName].push(old);
    }
    obj[xml.childNodes[i].nodeName].push(xml2Json(xml.childNodes[i]));
   }
   
  }
 }

 return obj;
};

Converting XML to Lean JSON?

We could make the JSON encoding of the XML lean by using just "@" for attributes and "#" for text in place of "@attributes" and "#text":

{
 "example:user" : {
  "@" : { "domain" : "example.com" },
   "name" : { "#" : "Joe" },
   "status" : {
    "@" : {"online" : "true"},
    "#" : "Away"
   },
   "idle" : null
 }
}

You may notice that "@" and "#" are valid as javascript property names, but not as XML attribute names. This allows us to encompass the DOM representation in object notation, since we are swapping DOM functions for Object properties that are not allowed as XML attributes and thus will not get any collisions. We could go further and use "!" for comments for example, and "%" for CDATA. I'm leaving these two out for simplicity.

What about converting JSON to XML?

If we follow the rules used to convert XML to JSON, it should be easy to convert JSON back to XML. We'd Just need to recurse through our JSON Object, and create the necessary XML objects using the DOM methods.

/**
 * JSON to XML
 * @param {Object} JSON
 */
json2Xml = function(json, node) {
 
 var root = false;
 if (!node) {
  node = document.createElement('root');
  root = true;
 }
 
 for (var x in json) {
  // ignore inherited properties
  if (json.hasOwnProperty(x)) {
  
   if (x == '#text') { // text
    node.appendChild(document.createTextNode(json[x]));
   } else  if (x == '@attributes') { // attributes
    for (var y in json[x]) {
     if (json[x].hasOwnProperty(y)) {
      node.setAttribute(y, json[x][y]);
     }
    }
   } else if (x == '#comment') { // comment
   // ignore
   
   } else { // elements
    if (json[x] instanceof Array) { // handle arrays
     for (var i = 0; i < json[x].length; i++) {
      node.appendChild(json2Xml(json[x][i], document.createElement(x)));
     }
    } else {
     node.appendChild(json2Xml(json[x], document.createElement(x)));
    }
   }
  }
 }
 
 if (root == true) {
  return this.textToXML(node.innerHTML);
 } else {
  return node;
 }
 
};

This really isn't a good example as I couldn't find out how to create Elements using the XML DOM with browser Javascript. Instead I had to create Elements using the document.createElement() and text nodes with document.createTextNode() and use the non-standard innerHTML property in the end. The main point demonstrated is how straight forward the conversion is.

What is the use of converting JSON to XML

If you are familiar with creating xHTML via the DOM methods, you'll know how verbose it can be. By using a simple data structure to represent XML, we can remove the repetitive code needed to create the xHTML. Here is a function that creates HTML Elements out of a JSON Object.

/**
 * JSON to HTML Elements
 * @param {String} Root Element TagName
 * @param {Object} JSON
 */
json2HTML = function(tag, json, node) {
 
 if (!node) {
  node = document.createElement(tag);
 }
 
 for (var x in json) {
  // ignore inherited properties
  if (json.hasOwnProperty(x)) {
  
   if (x == '#text') { // text
    node.appendChild(document.createTextNode(json[x]));
   } else  if (x == '@attributes') { // attributes
    for (var y in json[x]) {
     if (json[x].hasOwnProperty(y)) {
      node.setAttribute(y, json[x][y]);
     }
    }
   } else if (x == '#comment') { // comment
   // ignore
   
   } else { // elements
    if (json[x] instanceof Array) { // handle arrays
     for (var i = 0; i < json[x].length; i++) {
      node.appendChild(json2HTML(json[x][i], document.createElement(x)));
     }
    } else {
     node.appendChild(json2HTML(json[x], document.createElement(x)));
    }
   }
  }
 }
 
 return node;
 
};

Lets say you wanted a link <a title="Example" href="http://example.com/">example.com</a>. With the regular browser DOM methods you'd do:

var a = document.createElement('a');
a.setAttribute('href', 'http://example.com/');
a.setAttribute('title', 'Example');
a.appendChild(document.createTextNode('example.com');
This is procedural and thus not very pleasing to the eye (unstructured) as well as verbose. With JSON to XHTML you would just be dealing with the data in native JavaScript Object notation.
var a = json2HTML('a', {
 '@attributes': { href: 'http://example.com/', title: 'Example' },
 '#text': 'example.com'
});

That does look a lot better. This is because JSON seperates the data into a single Object, which can be manipulated as we see fit, in this case with json2HTML().

If you want nested elements:

var div = json2HTML('div', {
 a : {
  '@attributes': { href: 'http://example.com/', title: 'Example' },
  '#text': 'example.com'
 }
});

Which gives you

<div><a title="Example" href="http://example.com/">example.com</a></div>

The uses of converting JSON to XML are many. Another example, lets say you want to syndicate an RSS feed. Just create the JSON Object with the rules given for conversion between XML and JSON, run it through your json2Xml() function and you should have a quick and easy RSS feed. Normally you'd be using a server side language other than JavaScript to generate your RSS (however Server Side JavaScript is a good choice also) but since the rules are language independent, it doesn't make a difference which language is used, as long as it can support the DOM, and JSON.

6 comments:

Albatros said...

The example is not valid JSON

{
"example:user" : {
"@" : { "domain" : "example.com" },
!!What is the key for the following value?!!
{
"name" : { "#" : "Joe" },
"status" : {
!!missing a key on this one too!!
{ "@" : {"online" : "true"},
"#" : "Away"
},
"idle" : null
}
}
}

every item in a {} must have a key

Unknown said...

Thanks for noticing that. It has been updated.

edgecrush3r said...

great example, however i had some issues with Forfox, not beeing able to push the following code (doooh).

// do children
if (xml.hasChildNodes()) {
for(var i = 0; i < xml.childNodes.length; i++) {
if (typeof(obj[xml.childNodes[i].nodeName]) == 'undefined') {
obj[xml.childNodes[i].nodeName] = xml2Json(xml.childNodes[i]);
} else {
if (typeof(obj[xml.childNodes[i].nodeName].length) == 'undefined') {
var old = obj[xml.childNodes[i].nodeName];
obj[xml.childNodes[i].nodeName] = [];
obj[xml.childNodes[i].nodeName].push(old);
}
obj[xml.childNodes[i].nodeName].push(xml2Json(xml.childNodes[i]));

as it seemed not all where valid objects.

fixed it using:
if (typeof(obj[xml.childNodes[i].nodeName])=='object') {
obj[xml.childNodes[i].nodeName].push(xml2Json(xml.childNodes[i]));
}

instead

Unknown said...

edgecrush3r's fix worked for me too.

Dinesh Gupta said...

Hi,

What 'edgecrush3r' suggests, where to add the code. Please suggest me. I am unable to find that place .

Murray Todd Williams said...

The json2HTML has a bug. It works with the example because it is only a depth of one. The 2 "node.appendChild" lines try to call json2HTML recursively, but they have the parameter signature wrong.

node.appendChild(json2HTML(json[x][i],document.createElement(x)));

should either be

node.appendChild(json2HTML(null,json[x][i],document.createElement(x)));

or more simply

node.appendChild(json2HTML(x, json[x][i]));

and the same goes for the 2nd node.appendChild 2 lines further down.