In looking over the PCS HTML document library, it occured to me that much of the PCS generated HTML code, on a line-by-line basis, was boiler-plate code that varied little if at all between different files. This part also often contained most of the most complicated or elaborate bits of HTML code in these documents. E.g., formatting the logo and page title, etc. Most of the rest of the documents are mainly long boring text, maybe some tables.
In the old way of doing things, we simply copied a template file and inserted the textual content desired (or copied another page built in a similar fashion, deleted the half or so of it that was not boiler-plate, and inserted our textual content). Even if we used the most current version of the template each time we added a new file (which is questionable assumption), any little tweaks to the basic template only affected files created after the tweak, which meant the look-and-feel was slowly diverging as pages aged.
In looking at this situation, it occurred that using a preprocessor would enable us to write inputs to the preprocessor to become html files. These inputs would almost entirely be content --- 90% or more of the lines would contain the real content not boiler-plate. For example, at the time of this writing, the preprocessor input for the index.html file for this directory consists of 13 lines, 2 of which are boiler plate, and 4 blank lines, leaving about 7 with real content (about 54%). Because the index file tends to be small, it will typically have the worst real content to boiler plate ratios (in either preprocessor or html versions). The html file generated from that was about 96 lines, 16 of which were blank, and still only about the same 7 lines of real content (less than 8%).
Another advantage is that these input files could be processed again trivially at a later date if some of the boiler plate changes.
The m4 preprocessor is a fast and powerful preprocessor engine, if not always the friendliest to write macros for. You should run
info m4
m4 macros are words, and will not be expanded if embedded inside a word. Thus if I defined the macro bed to expand to cot, the phrase Embedded occurences are not expanded,bed - ton would expand to Embedded occurences are not expanded,cot - ton. The bed in Embedded does not get expanded because it does not occur at a word boundary, but the second occurance does get expanded.
Whitespace is generally not treated specially in the preprocessor, which means the carriage return after a macro definition is sent as a carriage return to the output file, which often is not desirable. A common work-around is the dnl command, which stands for delete to newline, which prevents everything after it (upto and including the next new line) to be excluded from inclusion in the output. It is also useful for starting comments in m4.
Quoting in m4 is a bit wierd, you use the backquote character
(`
) to start the quoted text, and the single quote character
(') to end the quote. This makes some things hard to express,
mainly unix commands using either of those character, but there are some
helper macros for that, or you can just use the ' or ` notation
for single and back quote, respectively.
Quotes can be nested, and it is very difficult
(if not impossible) to quote a quote (for html purposes, we have macros to
generate &#NN; codes, which displays as we want and mean
nothing to m4).
The setup in the PCS/PNCE web pages is to have in each directory of the
document tree an m4 directory, and a Makefile. The latter in
general does not vary, and so is typically just a symlink to the one in the
parent directory. It takes care of invoking the preprocessor correctly ---
after you create or edit a file in the m4 directory, go up to the main directory
and run make
or make FILENAME.html
and the m4
input script you modified will be used to create a correspondingly named HTML
file. (The former command will actually recreate any HTML files which are
older than there corresponding .m4 file).
The makefile and make command automatically call in some m4 libraries
defining some basic commands to do the boilerplate stuff, as well as some
commands to do simply tags. These are located in the basic_macros.m4
in the m4_stuff
subdirectory at the top of the PNCE document tree.
The m4 directory is normally set up with permissions so as not to be
visible from the web server. It should basically mirror the files in the
the parent directory, but with .m4
rather than .html
extensions. You can even have a RCS directory to do revision control of the
m4 files. There should not be any other directories beneath the m4 directory.
The whole point of this exercise is to have macros to encapsulate the boilerplate for your web page. Basically, you just include the macro
standard_pnce
_header(TITLE)
TITLE
is the title of
the document (as used by the browser as the title of the browser window, and
used for the text that is next to the PNCE logo). If you want a different
window title and text next to the logo, give two arguments (separated by a
comma) to the standard
pnce header macro, first for the window title and next for the text. You
may need to quote one or both of these, especially if they contain a comma.
(Remember, use a back tick for the opening quote, and a single quote for the
closing quote).
The above generates everything from the initial
<
html
>
tag, the metatags, the background image,
the PNCE logo, the text next to the logo, and the hrule under the logo and
text.
You now enter all your document specific text. You can basically put whatever you want in here, using raw html or the m4 macros written to handle some tags at your preference. The biggest problem is if you want to use single or back quotes (see the special macros for these below) or if you accidentally use a word which is an m4 macro name (this should be extremely rare --- the built in macros generally require arguments and are not expanded unless the requisite arguments are given, and the ones I created all contain underscores and are not commonly used in written language otherwise).
Then add one of
standard_pnce
_footer
w3c_valid_pnce
_footer
w3c_test_pnce
_footer
w3c_valid
version should be used on
pages accessible without a password and adds a small statement declaring this
page to contain valid HTML and CSS code some small images with links to
retest the page. Of course, this should only be used on pages that actually
are VALID, and making the corrections to make it valid is part of
the previous suggestion. You can use the w3c_test
version until
the page actually is valid --- it links to the validators for testing without
claiming to be valid. The validator links will not work properly with password
protected pages, so for these it is best to use the plain old
standard_pnce
_footer
macro, which omits the claims and
test images.
That's it!
The basic_macros.m4
file is fairly well documented, and that
is the definitive commentary on the macros. This is just a brief listing
of some other macros that might prove handy.