Type-preserving copy in XSLT 2.0

Posted by Florent Georges, on 2006-08-30, in xslt.

Disclaimer

This post refers to FXSL, because its currying functionality was the starting point and the context of the following thoughts. But there is no official link between these and FXSL, so neither Dimitre nor Colin could be judged as guilty for what is written here. I want to thanks them a lot for all their valuable input, while all remaining errors are only mine.

Problematic

A few months ago, I finally had a look at FXSL. This is a project that provides first-class object functions. That opens up some very interesting possibilities, and the possibility of a more functional programming style.

An interesting feature is the ability to curry parameters to a function, to create an other function of a lesser order. The principle is to attach parameters to the function. This new function can then be used as any other function, with specified parameters bound to specified values.

To achive this goal, we need a complex structure, because we have to be able to retrieve the original function and each curried parameter. The first thing that comes in mind is to use a sequence of the needed items. But this is not possible. We want to be able to use the resulting function as any other function object. For example to be able to create a sequence of functions. As sequences can not be nested, we would not be able to retrieve the new function after having added it to a sequence (only each individual item, no longer related to each other).

Instead, FXSL uses a dynamically built element as complex container. An element is at the same time a unique item and a complex structure, from which we may easily retrieve specific pieces of information.

But unlike sequences, the content of an element cannot reference an item. When we attach an item to a tree in XSLT, it is copied. A lot of properties are copied as is, but some change. The most obvious is that atomic items are no longer atomics, but become nodes. So it is not possible to know later if we attached an atomic value or a text node, for example.

If we do nothing special, the type is changed too. It is always set to xs:untyped. But we want to preserve it, because it can change the result of the evaluation of the new function (with curried parameters).

Solution

The idea is to have two functions. f:copy-with-type that takes a sequence of zero or more items as arguments and returns a node, and f:get-typed that takes a node obtained by the former as its argument and returns a sequence of zero or mode items:

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy>
    <!-- Still to implement... -->
  </copy>
</xsl:function>

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <!-- Still to implement... -->
</xsl:function>

The solution is different if we are in Basic mode or Schema Aware mode (SA). It is different for nodes and atomic values also.

For nodes in Basic mode, it is simple. A node can never have an annotation other than xs:untyped. So just using xsl:copy-of is enough. In SA mode, XSLT 2.0 has also the solution: just use the attribute [xsl:]validation with the value "preserve". This will preserve the type annotation for the copied nodes:

<!-- In Basic mode -->
<xsl:when test="$arg instance of node()">
  <node>
    <xsl:copy-of select="$arg"/>
  </node>
</xsl:when>

<!-- In Schema Aware mode -->
<xsl:when test="$arg instance of node()">
  <node xsl:validation="preserve">
    <xsl:copy-of select="$arg" validation="preserve"/>
  </node>
</xsl:when>

For atomic values, it is more complex. Actually, there is no way to say "I want to get the type of this atomic value and copy them (the value and the type) to the tree". The only way we have to simulate this is by using an xsl:choose on the type of the item (using instance of). In SA mode, we can use the attribute [xsl:]type to set the container element type to the same type as the item. But in Basic mode, it is impossible to set the type of a node to something else than xs:untyped. Instead, we use as the container element name the name of the simple type. This will act as a constructor function later (actually, these constructors are already defined in FXSL).

<!-- In Basic mode -->
<xsl:when test="$arg instance of xs:double">
  <f:double>
    <xsl:copy-of select="$arg"/>
  </f:double>
</xsl:when>

<!-- In Schema Aware mode -->
<xsl:when test="$arg instance of xs:double">
  <atomic xsl:type="xs:double">
    <xsl:copy-of select="$arg" validation="preserve"/>
  </atomic>
</xsl:when>

Below is what the whole solution looks like:

<!-- In Basic mode -->

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <xsl:apply-templates select="$arg/*" mode="f:get-typed"/>
</xsl:function>

<xsl:template match="node" mode="f:get-typed" as="node()">
  <xsl:sequence select="@*|node()"/>
</xsl:template>

<xsl:template match="f:*" mode="f:get-typed" as="item()">
  <xsl:sequence select="f:apply(., data(.))"/>
</xsl:template>

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy>
    <xsl:sequence select="for $a in $arg return
                            f:copy-with-type-1($a)"/>
  </copy>
</xsl:function>

<xsl:function name="f:copy-with-type-1" as="node()">
  <xsl:param name="arg" as="item()"/>
  <xsl:choose>
    <xsl:when test="$arg instance of node()">
      <node>
        <xsl:copy-of select="$arg"/>
      </node>
    </xsl:when>
    <xsl:otherwise>
      <xsl:when test="$arg instance of xs:a-basic-type">
        <f:a-basic-type>
          <xsl:copy-of select="$arg"/>
        </f:a-basic-type>
      </xsl:when>
      <!-- An xsl:when by simple type here... --> 
      ...
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

<!-- In SA mode -->

<xsl:function name="f:get-typed" as="item()*">
  <xsl:param name="arg" as="element(copy)"/>
  <xsl:apply-templates select="$arg/*" mode="f:get-typed"/>
</xsl:function>

<xsl:template match="node" mode="f:get-typed" as="node()">
  <xsl:sequence select="@*|node()"/>
</xsl:template>

<xsl:template match="atomic" mode="f:get-typed" as="item()">
  <xsl:sequence select="data(.)"/>
</xsl:template>

<xsl:function name="f:copy-with-type" as="node()">
  <xsl:param name="arg" as="item()*"/>
  <copy xsl:validation="preserve">
    <xsl:sequence select="for $a in $arg return
                            f:copy-with-type-1($a)"/>
  </copy>
</xsl:function>

<xsl:function name="f:copy-with-type-1" as="node()">
  <xsl:param name="arg" as="item()"/>
  <xsl:choose>
    <xsl:when test="$arg instance of node()">
      <node xsl:validation="preserve">
        <xsl:copy-of select="$arg" validation="preserve"/>
      </node>
    </xsl:when>
    <xsl:otherwise>
      <xsl:when test="$arg instance of xs:a-type">
        <atomic xsl:type="xs:a-type">
          <xsl:copy-of select="$arg" validation="preserve"/>
        </atomic>
      </xsl:when>
      <!-- An xsl:when by simple type here... --> 
      ...
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

For the actual complete files, you can go to:

Problem & future

Off course, there is a problem with atomic items in SA mode. Because we use an xsl:choose, we have to know statically all the possible types. For the standard types, it is not a problem, but it is not usable as is with user-defined types.

Two compatible techniques could be used to help to live with this restriction. The first one is the combination of the import mechanism of XSLT and the possibility to define first-class object functions. If we think about facilities to define resolver functions by namespace (i.e. by piece of XML Schema), that could result in a flexible system.

The second technique is to use a generator for pieces of XSLT code. Actually, I use such a simple generator to generate the whole two xsl:choose elements (with an xsl:when by atomic type). The input document is an ad-hoc document that lists the standard simple types an XSLT processor has to know. But we could maybe write a generator that takes as input XML Schemas.

I hope this will be the subject of an other post.

Posted by Florent Georges, on 2006-08-30T12:17:00, tag: xslt.