2006-01-25
Improvements to HsColour
After my last post about HsColour, Neil stepped up to the plate and implemented a CSS backend, to complement the existing HTML-3.2 and ANSI terminal code outputs. More recently, extensions have been added to Haddock, the Haskell library auto-documentation tool, that allow the generated documentation to link to a wiki for user comments, to a bug-tracker for bug reports, and to the original source code for reference.
This is an obvious place for HsColour to fit in. Why display just raw source code on the web, when you can colourise it as an aid to readability? So there is a clear further requirement now. Ideally, Haddock should link to each individual function definition, not just to the top of the module that contains it. So, the HTML generated by HsColour needs to embed an anchor tag with each definition, so that the page.html#anchor syntax for references will work.
But there is a difficulty with extending HsColour to do this. Colourisation is a simple lexical problem. To find the defining occurrence of a function identifier (or datatype, or class), you really require a parser. Although simple top-level definitions have the function identifier starting in the leftmost column, what about (for instance) an infix-style definition, with an arbitrarily deep pattern on the left of the identifier you are trying to find?
It turns out that one can in fact write a finite state automaton to find the defining occurrences. It is rather like a complex lexer, but I reckon it is still more lightweight than either writing a full parser, or stealing one from elsewhere. I have an initial design, and I'm aiming for less than 150 lines of extra code. Work in progress is available in the darcs repository.
Meanwhile, the Programatica project has solved the same problem very nicely. Their Hs2Html tool is integrated with the entire compiler front-end, so they have, not only a parser, but a full module-import resolution phase as well. This means they can generate HTML with cross-links from every /use/ of an identifier to its definition, even across source files. Extremely useful, and very navigable. The downside is that it is a rather heavy-weight mechanism - the tool needs to have every module available (including the Prelude) or it can't finish.
This is an obvious place for HsColour to fit in. Why display just raw source code on the web, when you can colourise it as an aid to readability? So there is a clear further requirement now. Ideally, Haddock should link to each individual function definition, not just to the top of the module that contains it. So, the HTML generated by HsColour needs to embed an anchor tag with each definition, so that the page.html#anchor syntax for references will work.
But there is a difficulty with extending HsColour to do this. Colourisation is a simple lexical problem. To find the defining occurrence of a function identifier (or datatype, or class), you really require a parser. Although simple top-level definitions have the function identifier starting in the leftmost column, what about (for instance) an infix-style definition, with an arbitrarily deep pattern on the left of the identifier you are trying to find?
It turns out that one can in fact write a finite state automaton to find the defining occurrences. It is rather like a complex lexer, but I reckon it is still more lightweight than either writing a full parser, or stealing one from elsewhere. I have an initial design, and I'm aiming for less than 150 lines of extra code. Work in progress is available in the darcs repository.
Meanwhile, the Programatica project has solved the same problem very nicely. Their Hs2Html tool is integrated with the entire compiler front-end, so they have, not only a parser, but a full module-import resolution phase as well. This means they can generate HTML with cross-links from every /use/ of an identifier to its definition, even across source files. Extremely useful, and very navigable. The downside is that it is a rather heavy-weight mechanism - the tool needs to have every module available (including the Prelude) or it can't finish.