Wednesday, February 22, 2012

An Experiment In HTML Typesetting, Part II

Awhile ago I wrote about my dissatisfaction with HTML typesetting and an experiment I was undertaking to see if the process could be made less painful. Fast-forward several months and I'm halfway there, having developed a machine-friendly grammar for MA notation and an encapsulating file format to go with it.

Recall that the overriding concern was easy-of-production; input files in this hypothetical system are always going to be written by hand. As such the goal was to add just enough structure so that a computer could do the basic job of translating the file into an abstract syntax tree. I believe that I've accomplished this goal, but the resulting file format is something of a Frankenstein's monster. I chose XML to provide the overall structure of the document since it's easy for both computers and humans to read and is relatively lightweight. This framework encapsulates a bunch of plain-text strings plus two types of specialized blocks, one of which uses a simple key-value scheme and the other which uses a full-blow LR grammar. Processing this file results in an AST which can then be used as input for arbitrary typesetting scripts.

The processing bits are implemented as a few Perl modules, plus I've thrown in a sample input file and a wrapper script to dump the resulting AST. Here are the goods:

The script can be invoked as create_ast.pl example.xml, which will dump the AST to STDOUT.

Stay tuned for Part III where I'll turn the AST into actual HTML.

0 Comments:

Post a Comment

<< Home

Blog Information Profile for gg00