An Experiment in HTML Typesetting, Part III
The last time I wrote about this I'd completed a parser that could take a (mostly) XML document containing martial arts notation and turn it into an abstract syntax tree (AST). I've now completed the second half of the process, turning the AST into HTML markup, and am generally happy with the result. There's lots of room for improvement, but rather than let the perfect be the enemy of the good I figure it's time to release the code to the wild.
The Interesting Bits
An atom of Analytic Martial Arts Notation (AMAN) can consist of as many as 3 symbols, which may need to be written horizontally, vertically, or diagonally. So I chose to use 3x3 tables as the basis for laying out everything. A basic 3x3 grid
can be filled with 1, 2, or 3 symbols:
The use of tables in combination with text-centering ensured that all symbols are aligned vertically, horizontally, or at 45° from each other regardless of font size. I couldn't figure out any way to do that using pure CSS; if anyone has any insight into how that might be done in a reasonably straightforward manner I'd love to hear it.
After that it's just a matter of tightening up the line height and margins so that the individual symbols visually parse as a cluster:
Yielding the following once you take away the borders:
I think that looks pretty good considering the medium. These individual blocks are then joined side-by-side into 3xN tables, after which they are nested in a single (or sometimes two) larger table which establishes the gross left/center/right structure. The rest is just standard HTML/CSS.
Room For Improvement
Things which I know need work, in case anyone gets inspired:
- Subclass HTML::Table: Joining two tables together (current performed by TableFormatter::anneal_tables) should be implemented as a public method of a subclass of HTML::Table.
- Get rid of empty cells where possible: The typesetting process as currently implemented often results in table rows/columns which are completely empty. These, in turn, result in slightly irregular spacing when the document is rendered. Things would look a lot nicer if these rows were eliminated. Removing empty rows/columns should be implemented in the same place the table-joining subroutine.
- General CSS tinkering: Padding, margins, and line height need some fine-tuning in various places. The general used of CSS classes could probably be made more systematic and/or brought into line with best practices.
- Smarter two-column layout: Two-column layout works great for individual, long techniques. What I'd really like to see is some sort of automatic, two-column layout for short techniques where, rather than breaking up the notation block, the techniques themselves are laid out side-by-side. Right now I do that by hand by adding a few DIV elements in the appropriate places.
Here's the code:
- typeset.pl: Simple driver for invoking the parser and HTMLFormatter.
- AMAN modules, consisting of
- DocumentObjects.pm: Data encapsulation objects.
- DocumentParser.pm: Parser for the XML document format.
- HTMLFormatter.pm: Formats an AMAN document using HTML tables.
- NotationParser.pm: Converts AMAN notation blocks into ASTs.
- Palettes.pm: Pre-defined symbol palettes.
- TableFormatter.pm: Typesets AMAN using the technique described above.