2005-10-12

 

unifying Haskell2Xml and Xml2Haskell

So, there I was, hacking on Blobs (see below), and I realised that the file storage format for these diagrams is XML, so naturally Arjan has used the HaXml library for writing/reading these files. Great. But wait, what do I see? Even though all the internal data structures to be stored are defined in Haskell, he hasn't used the Haskell2Xml class, which can be handily derived for you by DrIFT. No, he uses the low-level generic XML parser/printer, and then hand-converts from the generic tree into and from his own data structures. Why ever did he do that? And then the first thing hit me. The Haskell2Xml methods return a "Maybe a" type. For production software, that simply isn't good enough. If the input fails to parse, then the developer is jolly well going to want an explanation of why not. The low-level generic parser gives back decent error messages if it finds a problem. But the high-level typeful parser does not. And then the second thing hit me. The Haskell2Xml method is indeed a parser. It parses the generic tree into a user-defined typeful tree. Yet it isn't written as one. Even though the low-level parser is constructed with nice combinators, Haskell2Xml doesn't use them, but rolls its own ad hoc solution.

The realisation of what I had to do was swift and immediately convincing. I had to replace HaXml's Haskell2Xml class with a true parser combinator framework. Not only that, but there is a corresponding class Xml2Haskell, derived not by DrIFT, but by the Dtd2Haskell tool, which uses an XML DTD as the source of the data structures. And yes, that too is really a parser, but was not written as one. So yes, that too needed to be rewritten.

In fact, come to think of it, why are there two classes anyway? Surely if you want to parse some XML to a typeful representation, then it is the same job, no matter where the types were originally defined. OK, so the actual instances of the class will be different depending on whether you use DrIFT or Dtd2Haskell to write them, but so what? In fact, won't there be circumstances in which you want to mix code of different origins - to dump a Haskell-derived data structure into the middle of a document with an open DTD, or vice versa? Having two separate classes would be unpleasant and unnecessary.

And thus the tale started. HaXml-2 will shortly (I hope) see the light of day. The original three kinds of parsing techniques will be reduced to one - a new set of combinators - and the parsers written with them will layer on top of one another. The generic tree parser is on the bottom, and the typeful tree parser on top, but both now using the same set of monadic combinators. It is so much more aesthetically pleasing. Not to mention giving far better error messages than before, so more practically useful too.

P.S. The Haskell Xml Toolbox from Uwe Schmidt got a new release today. Apparently it has Arrows now. Must look into that.

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?