Guide to Using m4 For Web Design

This documentation is for a deprecated process. New pages (or newly migrated pages) should be done with the Template toolkit process. These pages are retained only for dealing with the existing m4 templated pages.

Web Design using m4 preprocessor

Motivation

In looking over the PCS HTML document library, it occured to me that much of the PCS generated HTML code, on a line-by-line basis, was boiler-plate code that varied little if at all between different files. This part also often contained most of the most complicated or elaborate bits of HTML code in these documents. E.g., formatting the logo and page title, etc. Most of the rest of the documents are mainly long boring text, maybe some tables.

In the old way of doing things, we simply copied a template file and inserted the textual content desired (or copied another page built in a similar fashion, deleted the half or so of it that was not boiler-plate, and inserted our textual content). Even if we used the most current version of the template each time we added a new file (which is questionable assumption), any little tweaks to the basic template only affected files created after the tweak, which meant the look-and-feel was slowly diverging as pages aged.

In looking at this situation, it occurred that using a preprocessor would enable us to write inputs to the preprocessor to become html files. These inputs would almost entirely be content --- 90% or more of the lines would contain the real content not boiler-plate. For example, at the time of this writing, the preprocessor input for the index.html file for this directory consists of 13 lines, 2 of which are boiler plate, and 4 blank lines, leaving about 7 with real content (about 54%). Because the index file tends to be small, it will typically have the worst real content to boiler plate ratios (in either preprocessor or html versions). The html file generated from that was about 96 lines, 16 of which were blank, and still only about the same 7 lines of real content (less than 8%).

Another advantage is that these input files could be processed again trivially at a later date if some of the boiler plate changes.

The m4 preprocessor

The m4 preprocessor is a fast and powerful preprocessor engine, if not always the friendliest to write macros for. You should run

info m4

on an Unix system to obtain full documentation, but basically it reads an input, and writes that input to the output, occasionally expanding a macro in the process. Macros consist of a sequence of letters, underscores, or digits starting with a letter. They can take arguments, which are enclosed in parentheses. The GNU version of m4 generally only recognizes built-in macros that require arguments when the argument is supplied, thus although the word define is a built in macro, it is treated as a text literal unless is followed by parentheses with the required arguments.

m4 macros are words, and will not be expanded if embedded inside a word. Thus if I defined the macro bed to expand to cot, the phrase Embedded occurences are not expanded,bed - ton would expand to Embedded occurences are not expanded,cot - ton. The bed in Embedded does not get expanded because it does not occur at a word boundary, but the second occurance does get expanded.

Whitespace is generally not treated specially in the preprocessor, which means the carriage return after a macro definition is sent as a carriage return to the output file, which often is not desirable. A common work-around is the dnl command, which stands for delete to newline, which prevents everything after it (upto and including the next new line) to be excluded from inclusion in the output. It is also useful for starting comments in m4.

Quoting in m4 is a bit wierd, you use the backquote character (`) to start the quoted text, and the single quote character (') to end the quote. This makes some things hard to express, mainly unix commands using either of those character, but there are some helper macros for that, or you can just use the ' or ` notation for single and back quote, respectively. Quotes can be nested, and it is very difficult (if not impossible) to quote a quote (for html purposes, we have macros to generate &#NN; codes, which displays as we want and mean nothing to m4).

Using m4 in physics: the directory structure

The setup in the PCS/PNCE web pages is to have in each directory of the document tree an m4 directory, and a Makefile. The latter in general does not vary, and so is typically just a symlink to the one in the parent directory. It takes care of invoking the preprocessor correctly --- after you create or edit a file in the m4 directory, go up to the main directory and run make or make FILENAME.html and the m4 input script you modified will be used to create a correspondingly named HTML file. (The former command will actually recreate any HTML files which are older than there corresponding .m4 file).

The makefile and make command automatically call in some m4 libraries defining some basic commands to do the boilerplate stuff, as well as some commands to do simply tags. These are located in the basic_macros.m4 in the m4_stuff subdirectory at the top of the PNCE document tree.

The m4 directory is normally set up with permissions so as not to be visible from the web server. It should basically mirror the files in the the parent directory, but with .m4 rather than .html extensions. You can even have a RCS directory to do revision control of the m4 files. There should not be any other directories beneath the m4 directory.

Using m4 in physics: the basics

The whole point of this exercise is to have macros to encapsulate the boilerplate for your web page. Basically, you just include the macro

standard_pnce_header(TITLE)

at the top of your documents. TITLE is the title of the document (as used by the browser as the title of the browser window, and used for the text that is next to the PNCE logo). If you want a different window title and text next to the logo, give two arguments (separated by a comma) to the standard pnce header macro, first for the window title and next for the text. You may need to quote one or both of these, especially if they contain a comma. (Remember, use a back tick for the opening quote, and a single quote for the closing quote).

The above generates everything from the initial <html> tag, the metatags, the background image, the PNCE logo, the text next to the logo, and the hrule under the logo and text.

You now enter all your document specific text. You can basically put whatever you want in here, using raw html or the m4 macros written to handle some tags at your preference. The biggest problem is if you want to use single or back quotes (see the special macros for these below) or if you accidentally use a word which is an m4 macro name (this should be extremely rare --- the built in macros generally require arguments and are not expanded unless the requisite arguments are given, and the ones I created all contain underscores and are not commonly used in written language otherwise).

Then add one of

standard_pnce_footer

w3c_valid_pnce_footer

w3c_test_pnce_footer

at the bottom of your text. These both will generate everything from the closing hrule, all the contact info, the date the m4 preprocessor ran on the file, and the closing html tag. The w3c_valid version should be used on pages accessible without a password and adds a small statement declaring this page to contain valid HTML and CSS code some small images with links to retest the page. Of course, this should only be used on pages that actually are VALID, and making the corrections to make it valid is part of the previous suggestion. You can use the w3c_test version until the page actually is valid --- it links to the validators for testing without claiming to be valid. The validator links will not work properly with password protected pages, so for these it is best to use the plain old standard_pnce_footer macro, which omits the claims and test images.

That's it!

Using m4 in physics: the macros

The basic_macros.m4 file is fairly well documented, and that is the definitive commentary on the macros. This is just a brief listing of some other macros that might prove handy.

print_squote: This prints a single quote, using '
print_backtick: This prints a backtick, using `
squote_it: This prints the string given as its argument encased in single quotes
backtick_it: This prints the string given as its argument encased in backticks
code_it: This prints the string given as its argument encased in code tag
var_it: This prints the string given as its argument encased in var tag
em_it: This prints the string given as its argument encased in em tag
strong_it: This prints the string given as its argument encased in strong tag
h1_it: This prints the string given as its argument encased in h1 tag. Versions exist for other header levels 1-6
link_it: This prints its second argument wrapped in an anchor (a) tag referencing the URL given in the first argument. The URL will be wrapped in double quotes and given as the href attribute to the anchor tag
name_it: This prints its second argument wrapped in an anchor (a) tag with the anchor name attribute set to the first argument.
variable_code_it: Similar to the code_it macro, but uppercase letters are enclosed in var tags
codeblock_it: This sets up a non-wrapping block containing it argument wrapped in an code tag
variable_codeblock_it: Similar to the codeblock_it macro, but uppercase letters are enclosed in var tags