An Experiment In HTML Typesetting, Part II
Awhile ago I wrote about my dissatisfaction with HTML typesetting and an experiment I was undertaking to see if the process could be made less painful. Fast-forward several months and I'm halfway there, having developed a machine-friendly grammar for MA notation and an encapsulating file format to go with it.
Recall that the overriding concern was easy-of-production; input files in this hypothetical system are always going to be written by hand. As such the goal was to add just enough structure so that a computer could do the basic job of translating the file into an abstract syntax tree. I believe that I've accomplished this goal, but the resulting file format is something of a Frankenstein's monster. I chose XML to provide the overall structure of the document since it's easy for both computers and humans to read and is relatively lightweight. This framework encapsulates a bunch of plain-text strings plus two types of specialized blocks, one of which uses a simple key-value scheme and the other which uses a full-blow LR grammar. Processing this file results in an AST which can then be used as input for arbitrary typesetting scripts.
The processing bits are implemented as a few Perl modules, plus I've thrown in a sample input file and a wrapper script to dump the resulting AST. Here are the goods:
- create_ast.pl: Wrapper script.
- example.xml: Sample input file.
- AMAN::DocumentParser: Module for parsing the overall XML document. Uses XML::Parser in 'subs' mode.
- AMAN::NotationParser: Module for parsing the notation blocks. Turns out that Parse::RecDescent is super easy to use once you get the hang of it; made writing the parser a piece of cake.
- AMAN::Objects: A bunch of data-encapsulation classes built using Class::Struct.
The script can be invoked as create_ast.pl example.xml, which will dump the AST to STDOUT.
Stay tuned for Part III where I'll turn the AST into actual HTML.