An Experiment In HTML Typesetting, Part I
Observation: Typesetting in HTML is painful.
HTML is not TeX; it's cumbersome to do anything more than generate blocks of (relatively) uniform text. Those of you who read my martial arts blog are aware that I've been developing, in conjunction with another blogger by the name of Scav, notation for recording martial arts forms/techniques. One of our secondary goals has been to restrict the notation is so that it's capable of being rendered using HTML. That has, so far, been mostly successful; I've been able to typeset some fairly complicated material using HTML. That process, however, has been a labor of love... there's a lot of hand-tweaking necessary to make things look good. I, personally, would benefit from some sort of lightweight, meta-HTML framework which allowed me to focus more on transcription and less on abusing CSS until it cries "uncle". Additionally, there's a lot to be said for separating meaning from representation; it would be great if the same underlying source could be used to produce both bottom-to-top and left-to-right variants. Right now the two are tightly coupled; I take a .csv file and run it through a small script that does some simple substitution and outputs the result as an HTML table. Not fancy and, as I said, I usually have to do a lot of tweaking afterwards.
So... having made that observeration, where to now? The first thing I want to do is see if I can come up with a generic input format that allows me to produce transcriptions quickly without worrying so much about the nuts and bolts of how its going to be displayed in HTML.
Reviewing the material I've produced to date I find that it typically has the following structure:
- Some introductory commentary
- A list of blocks/kicks/strikes
- A list of targets
- One or more blocks consisting of the following:
- A heading
- Some commentary
- A notation block
- Some numbered notes.
What format to use to encapsulate the above? XML is a natural candidate, but recall that I want to be able to produce transcriptions quickly. XML is a pain the butt to type by hand and is also "chatty"; the ratio of markup to actual information can be pretty high relative to alternatives (such as my .csv files). Right now I'm leaning towards a bastard amalgan of XML and CSV; make use of XML for describing the gross structure, but keep the notation blocks in CSV.
There's also the question of simplifying the typesetting of the notation itself. I've a computer science background, so when I think about separating meaning from representation the first thing that comes to mind are abstract syntax trees. What I'm really looking for is a system that will take a concise, easily-typed input file and turn it into an AST which can then be fed to rendering engine which will produce the desired HTML. This, in turn, implies the existence of a parser and a well-defined, though perhaps simple, language for describing the desired notation.
So I'm going to go off into a corner now and see what I can come up with; updates as events warrant.