Nightly thoughts

A blog on XML, NoSQL, Semantics, MarkLogic.

Well, indeed, a blog about data...

Here are the most recent blog entries. You can also browse them by topic.

Forest layout on a MarkLogic cluster

Forest repartition and replicas done right

In MarkLogic, forests lay at the interface between the logical and physical handling of data. When you look at your dataset, you think of the documents being part of a database; this is the logical view. The way this works, especially on a cluster with several hosts, depends entirely on the forests, which store documents and perform queries on their "own subset of the whole database;" this is the physical view.

In order to achieve high-availability, forests can be replicated (the whole story is much richer, but this is a useful simplification.) The concept is dead simple: have another forest, within which you cannot insert documents yourself, but which is rather linked to the main forest and replicate every bit of change that happens. This replication happens synchronously and transactionally: when a transaction is committed, you can be sure its changes are reflected in the main forests and all their replicas, if any.

Read the rest of this blog entry...

Virtual box for MarkLogic, reloaded

Install MarkLogic 8 on CentOS 7 using Virtual Box

18 months ago, I wrote an article about creating a virtual machine for MarkLogic, using Virtual Box and CentOS 6. The article is still of some interest, and the background information it contains is still relevant, but I thought it was time to write an updated version of it, based on CentOS 7. Especially now that CentOS is a platform fully supported by MarkLogic.

Read the rest of this blog entry...

Include code snippets in Pulse articles

In September 2014, I decided to finally try Pulse, publication system of LinkedIn. I took the very first blog post I ever posted, and tried to post it in Pulse. It was very disapointing, as Pulse did not provide any way to include code snippets in the articles. Not even a mono-spaced font. Since my posts do usually include code, I decided that Pulse was no option for me.

But now, I understood I could use PhantomJS to generate one image of each code snippet, so I can include it in place of actual code text. As the original posts include the code text, the reader can always go to the original one to copy and paste code. The solution is not ideal, but surely good enough.

Read the rest of this blog entry...

Virtual box for MarkLogic

One constraint with MarkLogic is that you cannot install more than one instance on a given machine. Either MarkLogic is installed or not on the machine, but you cannot install a second one. The biggest drawback with that approach is when time has come to evaluate a new version. Especially with the Early Access program, when you have access to beta versions before they are officially released. Of course you do not want to upgrade your instance to a beta version, without any testing, even if it is on your laptop or dev box.

The obvious solution that comes to mind is to create a virtual machine, for the only purpose of installing MarkLogic on it. The problem is, MarkLogic is not supported officially on any open source OS. So if you need to pay for Windows or RedHat, as a developer, just to install it on a virtual box, it might not be worth it. Fortunately, CentOS is very similar to RedHat, which makes it suitable to install MarkLogic. Thanks to CentOS and to help I received from an internal mailing list at MarkLogic, I was able to create such a vitrual box and use it to install MarkLogic 8 EA 2.

Read the rest of this blog entry...

oXygen, Scala and plugin repository

I am looking at Scala for future development, and wondered a bit about its integration abilities for edge cases. Writing a plugin for oXygen looked like a good candidate to investigate this: oXygen has no specific support for Scala, it is written in Java, there is a JAR library containing classes to use for developing the plugin, which has to implement some specific classes and use others, then be bundled as a plugin JAR file, loaded dynamically by oXygen and called from the description in the plugin.​xml descriptor. If Scala did not support that scenario, it would probably not be worth having a look at for me.

So I don't know anything about Scala. I downloaded it, tried to sketch a Hello World kind of plugin for oXygen, and the result was impressively simple! So here are the few steps I've done, if you want to make your own at home.

Read the rest of this blog entry...

Packaging extension steps for Calabash

In my previous blog post, I introduced how to develop an extension XProc step in Java, for the Calabash processor. Even though writing such an extension is quite easy when you know what to do, the configuration part for the final user is quite tricky. That complexity could be a serious argument for a potential user to give up even before he/she is able to run an example using your extension step. See the previous blog entry for details, but basically the user has to configure the classpath for Calabash with your JAR and all its dependencies, point to your config file when launching Calabash, and import your library into the main pipeline (after having decided where to install your extension step).

At the end of the previous post, I introduced the idea of having such extension steps, written in Java for Calabash, supported out of the box by the repository implementing the Packaging System. I played a little bit with the idea and came up with the following design (and implementation). Of course you still have to provide the same information (the step interface, its implementation, and the link between its type and the class implementing it), but the goal is to enable the author to do it once for all, so the user can simply use the following commands to install the package and run a pipeline using it:

Read the rest of this blog entry...

Writing an extension step for Calabash, to use BaseX

Writing an extension for Calabash in Java involves three different things: 1/ the Java class itself, which has to implement the interface XProcStep, 2/ binding a step name to the implementation class, and 3/ declaring the step in XProc.

Let's take, as an example, a step evaluating a query using the standalone BaseX processor. The goal is not to have a fully functional step, nor to have a best-quality-ever step with error reporting and such, but rather to emphasize how to glue all the things together. The step has one input port, named source, and one output port, named result. The step gets the string value of the input port (typically a c:query element) and evaluates it as an XQuery, using BaseX. The result is parsed as an XML document and sent to the output port (it is a parse error if the result of the query is not an XML document or element). Let's start with the Java class implementing the extension step:

Read the rest of this blog entry...

EXPath Packaging System: the on-disk repository layout

While working on the implementation for Calabash of the EXPath Packaging System, I was rewriting, again, a repository manager, dedicated to Calabash. Exactly as I did for Saxon one month earlier. Why? The repositories provide the same features. It should be then possible to make Calabash and Saxon share the same repository, if Saxon just ignore components other than XSLT and XQuery (for instance XProc pipelines) in that repository. So one just has to maintain one single repository for his/her whole computer (or one repository dedicated to a single project, like a Java EE application.)

Going further, I think the layout of such an on-disk repository should be part of the packaging specification itself. An implementation does not have to use such a standard repository, but if it does, it doesn't have to worry about package installation, repository management software, or even about the resolving mecanism between a component URI and the actual file with that component. One repository layout, one set of softwares for all those tasks.

Read the rest of this blog entry...

EXPath Packaging System prototype implementation for Calabash

An interesting piece of code I worked on during the past few weeks is the implementation of the EXPath Packaging System for the Norman Walsh's XProc processor: Calabash. It was interesting for itself, as a coding experience, but also for the still-in-development packaging system, as XProc provides all core XML technologies within a single language. Thus implementing the packaging system for Calabash implied to implement it for: RNC, RNG, Schematron, XProc (for XProc pipelines themselves,) XQuery, XSD and XSLT. This was enlightening about the relationships between those several technologies, and a proof of concept about the applicability of the packaging concept to those several technologies.

Unfortunately, Calabash does not provide any way for the user to finely configure the underlying processors (for instance Saxon for XSLT, Jing for RNG, etc.) So I first needed to add this feature to Calabash itself. Instead of plugging the EXPath stuff directly into the Calabash code base, I decided to add only a simple API for an external user to plug configuration code into Calabash. I hope Norm will agree on integrating such changes into Calabash, so the packaging support could be written entirely outside of the Calabash code base in a first time (and maybe included in Calabash in a second time.) In the meanwhile, you can just use an alternative JAR file for Calabash, including my changes (and based on the latest Subversion revision, so this is really beta stuff, besides some classes have been disabled also, due to dependency issues.) You can also have a look at the following email on XProc Dev with explanations on how to patch the Calabash code base.

Read the rest of this blog entry...

EXPath Packaging System prototype implementation for Saxon

After having released a first implementation of EXPath Packaging System for eXist, here is a version for Saxon. You can read this previous blog entry to get more information on the packaging system; in particular, it says: "The concept is quite simple: defining a package format to enable users to install libraries in their processor with just a few clicks, and to enable library authors to provide a single package to be installed on every processors, without the need to document (and maintain) the installation process for each of them."

The package manager for Saxon is a graphical application (a textual front-end will be provided soon,) and is provided as a single JAR file. Go to the implementations page, or use this following direct link to get the JAR. Run it as usual, for instance by double-clicking on it or by executing the command java -jar expath-pkg-saxon-0.1.jar. That will launch the package manager window.

Read the rest of this blog entry...

EXPath Packaging System prototype implementation for eXist

During the past few weeks, I have been working on the Packaging System for EXPath (see also this blog entry for more info.) The concept is quite simple: defining a package format to enable users to install libraries in their processor with just a few clicks, and to enable library authors to provide a single package to be installed on every processors, without the need to document (and maintain) the installation process for each of them.

Of course, this system should be supported right into the processors themselves, as this is intimately related to the way each processor manages its queries and/or stylesheets. But to convince vendors, we first have to show something that does work, and to show that users are actually interested. So I have written a prototype implementation for eXist (as well as one for Saxon, but it still needs some cosmetic work in order to be released.)

Read the rest of this blog entry...

eXist extension functions in Java

Investigating into opportunities for an implementation of EXPath Packaging for eXist, I am looking at the way to implement extension functions with Java. I detail here the key points I found to write Java extensions for eXist using NetBeans. But those info should be generic enough to be used with any Java IDE.

If you want to provide an extension function written in Java within eXist, you actually have to provide a full module in Java. You cannot provide the module part in XQuery and part in Java (though you can provide an XQuery module that imports a private module written in Java.)

Read the rest of this blog entry...

Divide and Conquer, or XPath, XSLT, XQuery and XProc packaging

Packaging is nothing in itself. It is always related to something else (a language, a technology, a framework...) Packaging is just a mean to ease sharing and delivering something in the scope of that "something else." The several files in an ODF document are packaged in a single ZIP file, with a pre-defined structure, to make it possible for an application to use its content. The important point is not the structure in itself, but rather the information it gathers.

I have followed some very interesting discussions about X* packging during the last few weeks, with very interesting people. Rapidly, I have seen everyone were talking about slightly (or not) different things. The most important point where people have different views IMHO, is the scope of packaging.

Read the rest of this blog entry...

SOA Design Patterns and Web Service Contract Design & Versioning for SOA

A few weeks ago, I received the final, paper version of the book "SOA Design Patterns" that I contributed to. I was used to the drafts, and I am glad to say the final layout is really nice. Same for the previous book, "Web Service Contract Design & Versioning for SOA." More info at http://​soapatterns.​com/​ and http://soabooks.​com/.

Read the rest of this blog entry...

XSLStyle and oXygen

On almost every XSLT projects I worked on, I used Ken Holman's XSLStyle. It enables one to document each stylesheet component (template, function, variable, module, etc.) using an XML vocabulary (DocBook and DITA are supported out of the box.) For instance, the following exerpt shows how to document a simple named template with a parameter, assuming the vocabulary has been set to DocBook:

   <para>Create a paragraph with a greetings message.</para>
   <doc:param name="who">
      <para>The name of the person to address the greetings to.</para>
<xsl:template name="greetings">
   <xsl:param name="who" as="xs:string"/>
      <xsl:value-of select="concat('Hello, ', $who, '!')"/>

Read the rest of this blog entry...

FXSL currying and nestable sequences

After an interesting discussion on the FXSL Help forum, the problem of currying and nested sequences showed up again. The FXSL project provides, among other things, first-class citizen functions. Basically, it represents a function as an element. When executing such a function, the dispatching to the code is done by applying templates on that element.

An interesting feature of FXSL is the ability to curry parameters to a function, to create an other function of a lesser order. The principle is to attach parameters to the function. This new function can then be used as any other function, with specified parameters bound to specified values.

Read the rest of this blog entry...

XProc with XSLT completion in oXygen

After having played a little bit with XProc, and having written a few simple XProc definitions with oXygen, I was tired to always check the step names spelling and to use copy & paste intensively. So I decided to add support for the XProc document type in oXygen.

Thanks to the XProc WG, who has published a schema as part of the current WD (and has done so in various schema languages,) the first step was quite straigthforward. Download the two RNC modules from the current WD, in appendix "D Pipeline Language Summary" (direct links: xproc.​rnc and steps.​rnc.) While editing an XProc definition, click the Associate Schema... button (see the screenshot below.) In the dialog box, choose RelaxNG Schema, choose the option Compact syntax and select the xproc.​rnc file you have just downloaded. The only configuration to change is in Preferences / XML / XML Parser / RELAX NG, and unselect the option Check ID/IDREF (thanks, George.)

Read the rest of this blog entry...

Poor man's Calabash integration into oXygen

XML Calabash, the XProc processor from Norman Walsh, becomes more mature from day to day. Here is a very simple (but very limited too) way to integrate it into the great oXygen XML IDE. Well, the word integrate is maybe too much for this simple trick, that will just add a button in the toolbar to execute the currently edited XProc definition file. But at least that will prevent you to switch between your IDE and a console.

You have to register Calabash as an external tool within oXygen. Go to Tools > External Tools > Preferences > New, and fill the various fields. The point is to correctly set the working directory to ${cfd} and the command line to something like:

Read the rest of this blog entry...

Simple SVG chart generation with XSLT

This week, for my job, I have to create a report generator for a financial company. The reports must be in PDF, so I naturally decided to use XSL-FO. Among other things, the reports contain graphical charts with, you know, financial stuff. The client wants its developers to be able to generate JPEG files themselves, so for the charts I just have to include external graphic files.

But I was curious to see if SVG was adapted to fit in this scenario. So this evening, after my working hours, I created a sample input files and started to learn a little bit about SVG. It was incredible as I was able to quickly get the result I wanted for a static SVG document.

Read the rest of this blog entry...

HTTP extension for Saxon

I have just finished a little extension function for Saxon, to be able to send HTTP request from XSLT 2.0 (and get the result back). The idea is based on the SOAP extension from Andrew Welch, but is less restricted, as it can perform other HTTP requests (besides SOAP request over HTTP.)

The function take two parameters: a URI and an element that describe the request (the payload, the headers, the HTTP method, etc.) The later looks like:

Read the rest of this blog entry...

Emacs: favourite directories implementation

Today, I have finally taken a look at one of the simple features I always missed in Emacs: the ability to define a set of "favourite directories." That is, a set of named directories that one can use in the minibuffer when prompted for instance to open a file. Given a set of such dirs:

  emacs-src -> /enter/your/path/to/emacs/sources
  projects  -> /path/to/some/company/projects
  now       -> @projects/the/project/I/am/working/on

Read the rest of this blog entry...

XSLT stacktrace with Saxon 9

I have played a little bit with the Saxon B's XSLT stack representation. When an error appears while evaluating a stylesheet, you can indeed catch the Java exception, so the Java stacktrace, but what is really interresting is the XSLT stacktrace. That is, where in the XSLT processing the error occured. For instance "in the function X, called from the template Y, applied from the external application".

I didn't find a comprehensive documentation on that subject, so I experimented a bit with what I got: an XPathException and from there an XPathContext. Please note I used for that the new version 9.​0.​0.​1 (I know there are a few differences between version 8 and 9 in the area of interest here, but I didn't look for cataloging them).

Read the rest of this blog entry...

Error handling extension for XSLT 2.0

I finally wrote a few words about the try/catch extension I wrote for Saxon a couple of months ago (see this and this entries). You can find the project page there: http://​​fgeorges.​org/​xslt/​error-safe/.

I also wrote a first draft of a specification for such an extension. There are some differences between this and the actual implementation for Saxon, but the spirit remains the same. It can be found there: http://​fgeorges.​org/​xslt/​error-safe/​error-safe.​html.

Read the rest of this blog entry...

Relax NG Compact schema for Ant build files

Someone asked a few days ago on the nXML mailing list a question about a schema for Ant build files, in this thread. Steinar Bang came with a schema he generated from a sample Ant build file.

A specific Ant task generated a DTD from the content of this build file, then he transformed it to a Relax NG Compact schema. This is not a rigorous schema, but this would be enough as a basis to edit Ant build files with nXML and get completion.

Read the rest of this blog entry...

XML Catalogs support in Saxon 8 for Java

Saxon doesn't support OASIS's XML Catalogs natively. But adding this support yourself is quite easy. You need the XML Commons Resolver from Apache. Make sure the JAR resolver.jar you will find in the archive is in you classpath. Then you need to call Saxon with the following options:

Read the rest of this blog entry...

Creating namespace nodes in XSLT 1.0

This entry discusses the better way to add namespace nodes to an element in XSLT 1.0. The goal is to try to have the right namespace bindings in context for content using them, as attribute values storing XPath expressions. There is no way to guarantee it, but the code developed below is reasonable enough to work in most cases.

In addition to showing how to generate a namespace node in the result tree, this entry discusses specific problems and tricks when employed in a meta-stylesheet, and how to use the XSLT 2.0 instruction xsl:namespace when available.

Read the rest of this blog entry...

Gexslt: using the CVS version

This entry is a short comment describing how to download, set up and compile Gexslt from the CVS repository. Gexslt is an XSLT 2.0 processor written in Eiffel by Colin Adams, and is part of the Gobo project.

This entry is not a comprehensive documentation. On contrary, it is a short explanation for non-Eiffel programmers that want to use the last version of Gexslt. The fewer steps are described here, the better. I also describe my own setup; it is on Windows, with EiffelStudio.

Read the rest of this blog entry...

Try/catch in XSLT 2.0: first test cases, first problems

In addition to a comment and some advices on Try/catch in XSLT 2.0, Michael Kay gave me today in a post on the Saxon mailing list four interesting test cases. The comment and advices say:

The design of this from the user perspective looks reasonable: it's better than my own attempt to do it solely using extension functions. I did it that way because I was targetting XQuery, but it's not ideal there either, if only because saxon:try() is not a true function.

The tricky part in doing try/catch properly, however, is the semantics. You need to make sure, as far as possible, that (a) you don't catch errors in expressions that are written outside the try but lazily evaluated within it, and (b) that you do catch errors in expressions that are written inside the try but lazily evaluated outside it. This involves both compile-time work, to suppress rewrites that move expressions into or out of the try block, and run-time work, to suppress lazy evaluation (or to make sure that lazily-evaluated expressions carry their catch block with them)

Read the rest of this blog entry...

Try/catch in XSLT 2.0

I have experimented a little bit with Saxon B 8.​8.​0.​4j to implement a try/catch instruction in XSLT. The overall idea is simple: an element error-safe (the name is maybe not very nice, but I didn't find anything else now) contains an element try then a suite of 1 or more elements catch. try and catch both contain any sequence constructor. catch has an optional @errors attribute that is a space-separated list of QNames, representing error names:

  <catch errors="err:ERRNAME">

Read the rest of this blog entry...

Shell script to launch Saxon

Here is the little script I use to start Saxon from the command line. With the help of two or three environment variables, it can be used as your day-to-day XSLT transformer. But having options to deal with the class path, it can also be used in Makefiles (or whatever) in projects that use extensions or even modified version of Saxon.

I think the doc at the beginning of the script is clear enough. Unfortunately, I can't translate it to a Windows BATCH script as I don't know it. The script is below and can be downloaded here.

Read the rest of this blog entry...

Type-preserving copy in XSLT 2.0

A few months ago, I finally had a look at FXSL. This is a project that provides first-class object functions. That opens up some very interesting possibilities, and the possibility of a more functional programming style.

An interesting feature is the ability to curry parameters to a function, to create an other function of a lesser order. The principle is to attach parameters to the function. This new function can then be used as any other function, with specified parameters bound to specified values.

Read the rest of this blog entry...

Add a namespace node to an element in XQuery

David Carlisle just sent me the way to add a namespace node to an element in XQuery. Here is his example:

Read the rest of this blog entry...

Translate SAX events to a DOM tree

I had to pass the XML document provided by a piece of software to an other piece of software. The first one provides the document as SAX events. But the second one expects a DOM Document. So here is a SAX-events-to-DOM-Document translator:

Read the rest of this blog entry...