Shiny Ideas: An Experiment In HTML Typesetting, Part II

Awhile ago I wrote about my dissatisfaction with HTML typesetting and an experiment I was undertaking to see if the process could be made less painful. Fast-forward several months and I'm halfway there, having developed a machine-friendly grammar for MA notation and an encapsulating file format to go with it.

Recall that the overriding concern was easy-of-production; input files in this hypothetical system are always going to be written by hand. As such the goal was to add just enough structure so that a computer could do the basic job of translating the file into an abstract syntax tree. I believe that I've accomplished this goal, but the resulting file format is something of a Frankenstein's monster. I chose XML to provide the overall structure of the document since it's easy for both computers and humans to read and is relatively lightweight. This framework encapsulates a bunch of plain-text strings plus two types of specialized blocks, one of which uses a simple key-value scheme and the other which uses a full-blow LR grammar. Processing this file results in an AST which can then be used as input for arbitrary typesetting scripts.

The processing bits are implemented as a few Perl modules, plus I've thrown in a sample input file and a wrapper script to dump the resulting AST. Here are the goods:

create_ast.pl: Wrapper script.
example.xml: Sample input file.
AMAN::DocumentParser: Module for parsing the overall XML document. Uses XML::Parser in 'subs' mode.
AMAN::NotationParser: Module for parsing the notation blocks. Turns out that Parse::RecDescent is super easy to use once you get the hang of it; made writing the parser a piece of cake.
AMAN::Objects: A bunch of data-encapsulation classes built using Class::Struct.

The script can be invoked as create_ast.pl example.xml, which will dump the AST to STDOUT.

Stay tuned for Part III where I'll turn the AST into actual HTML.

Wednesday, February 22, 2012

An Experiment In HTML Typesetting, Part II

0 Comments:

Previous Posts