March 2, 2005

Using SAXON with Cocoon

As many people know, Cocoon can be configured with the alternative XSLT processor, SAXON, instead of the default one, Xalan (off topic: when these guys will facelift theirs website???).

But why would you want to go into the trouble of switching the processor? There are several things you might gain (YMMV, as usual):

  • Less CPU utilization. Most of the time SAXON is faster then Xalan.
  • Less memory utilization. SAXON uses memory more efficiently, and it created less amount of garbage. As a result, in web serving environment, you can serve more requests with smaller amount of physical memory.
  • No (observed) memory leaks, as compared with Xalan bundled with JDK 1.4.2. This is especially important if you are not in position to upgrade JVM or J2EE server libraries.
  • Generally, SAXON has better error reporting.

Other side of the coin includes:

  • No Xalan specific extensions... Wait a second, it's not a bad thing: it's wiser to have processor independent stylesheets, and use processor independent way of invoking XSLT Java extensions.
  • SAXON is stricter to what it accepts... Again, not too bad: this results in better, more specification compliant stylesheets.
  • disable-output-escaping does not work properly. Even though it's evil, sometimes it is necessary. Read below for the fix.
  • I also observed that instead of outputting some exotic characters directly (when using UTF-8 encoding), SAXON produced character entities. Same fix applies.

If you decided to take a plunge and make a switch, here is how you can do it:

  • Download SAXON (Duh!), extract saxon.jar
  • Drop saxon.jar into the cocoon/lib/local folder if you are working with Cocoon SVN checkout or source release download, or into the ./WEB-INF/lib folder if you are working with Cocoon based web application.
  • Open up cocoon.xconf and uncomment XSLT processor section:
    <!--+
        | Saxon XSLT Processor
        | For old (6.5.2) Saxon use:
        |  <parameter name="transformer-factory" value="com.icl.saxon.TransformerFactoryImpl"/>
        | For new (7+) Saxon use:
        |  <parameter name="transformer-factory" value="net.sf.saxon.TransformerFactoryImpl"/>
        +-->
    <component logger="core.xslt-processor"
               role="org.apache.excalibur.xml.xslt.XSLTProcessor/saxon"
               class="org.apache.excalibur.xml.xslt.XSLTProcessorImpl">
      <parameter name="use-store" value="true"/>
      <parameter name="transformer-factory" value="net.sf.saxon.TransformerFactoryImpl"/>
    </component>
    
    Check value of the transformer-factory parameter.
  • (Optionally) Edit the xpath processor entry:
    <!-- Xpath Processor: -->
    <xpath-processor class="org.apache.excalibur.xml.xpath.Saxon7ProcessorImpl"
                     logger="core.xpath-processor"/>
    
  • Open up sitemap.xmap and edit XSLT transformer section:
    <!-- NOTE: This is the default XSLT processor. -->
    <map:transformer name="xslt" 
                     logger="sitemap.transformer.xslt"
                     pool-max="32"
                     src="org.apache.cocoon.transformation.TraxTransformer">
      <use-request-parameters>false</use-request-parameters>
      <use-session-parameters>false</use-session-parameters>
      <use-cookie-parameters>false</use-cookie-parameters>
      <xslt-processor-role>saxon</xslt-processor-role>
      <check-includes>true</check-includes>
    </map:transformer>
    
  • (Optionally) To fix disable-output-escaping and character entities issues noted above, it's possible to switch serializer(s) back to Xalan using this (lesser known) configuration parameter. In the sitemap.xmap, edit your serializer(s) entries like this:
    <map:serializer name="html"
                    mime-type="text/html"
                    pool-max="32"
                    logger="sitemap.serializer.html"
                    src="org.apache.cocoon.serialization.HTMLSerializer">
      <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
      <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
      <encoding>UTF-8</encoding>
      <transformer-factory>org.apache.xalan.processor.TransformerFactoryImpl</transformer-factory>
    </map:serializer>
    
    I find that it is cleaner to change configuration instead of modifying saxon.jar.
  • That's it!

PS: Environment: JDK 1.4.X, Cocoon 2.1.X, SAXON 7.9.X.

Posted by Vadim at March 2, 2005 8:45 PM
Comments

You didn't mention one of the biggest advantages of Saxon: it lets you get a start on XSLT 2.0. (Not that I've tried it.)

Posted by: pbinkley at March 3, 2005 11:30 AM

True. I'd not used it too - yet - had no need. XQuery is another one which is interesting.

Posted by: Vadim Gritsenko at March 3, 2005 7:57 PM

I'm working on Saxon heavy pipeline engine. It actually uses Saxon to generate the pipelines. Nice separation of concerns.

Posted by: Alan Gutierrez at April 12, 2005 12:30 AM