Friday, July 14, 2006

XSLT whitespaces beautifying

Sometimes it is necessary to regulate amount of whitespaces in XSLT output. Especially actual when you need to transform XML into human-readable text or to SQL dump, for example. With XSLT this often results in a compromise between ugly text and ugly templates. Most of the ugliness comes from linefeeds, which must be output at certain places, but any "beautifying" identation after such linefeed in template will also be automatically sent to output. Without the identation stylesheet becomes not that readable and to avoid this it is convenient to define an entity for a linefeed, such as:

<!ENTITY lf '<xsl:text>
</xsl:text>'>

XSLT strips whitespaces around tags (<xsl:text> tag in this case ), that's why there should not be any extra indentations in output. But this will not work, because of namespace error : Namespace prefix xsl on text is not defined. Let's define namespace:

<!ENTITY lf '<xsl:text xsl="http://www.w3.org/1999/XSL/Transform">
</xsl:text>'>

This should work as expected, but not with MSXML. We need to use xml:space="preserve" to satisfy it:

<!ENTITY lf '<xsl:text xsl="http://www.w3.org/1999/XSL/Transform" space="preserve">
</xsl:text>'>

That ok. The most important thing though is to embed this entity declaration directly into stylesheet. Like this:

<!DOCTYPE xsl:stylesheet [
<!ENTITY lf '<xsl:text xsl="http://www.w3.org/1999/XSL/Transform" space="preserve">
</xsl:text>'>
]>
<xsl:stylesheet>
...

Now you may use &lf; entity inside of stylesheet without these ugly <xsl:text>
</xsl:text> constructions.<xsl:text>

</xsl:text>Well, not really. If you use MSXML for XSLT transformation chances are that after insertion of <!DOCTYPE ...> you may encounter the error message like this one:

C:\farplugins\trunk\plugbase\mbxsl.wsf(50, 10) msxml4.dll: The stylesheet does not contain a document element. The stylesheet may be empty, or it may not be a well-formed XML document.

The magic is to turn off validation in your parser by setting validateOnParse attribute to false. In this case it was:

===================================================================
--- mbxsl.js (revision 298)
+++ mbxsl.js (working copy)
@@ -12,6 +12,7 @@
xslDoc=new ActiveXObject("MSXML2.FreeThreadedDOMDocument.4.0")
xmlDoc.async=false;
xslDoc.async=false;
+ xslDoc.validateOnParse=false;
xmlDoc.load(args("xml"));
xslDoc.load(args("xsl"));
if (xmlDoc.parseError.errorCode != 0)