MP4H - A Macro Processeur for HTML

Introduction

The mp4h software is a macro-processor specially designed to deal with HTML documents. It allows powerful programming constructs, with a syntax familiar to HTML authors.

This software is based on Meta-HTML, written by Brian J. Fox, Even if both syntaxes look similar, source code is completely different. Indeed, a subset of Meta-HTML was used as a part of a more complex program, WML (Website Meta Language) written by Ralf S. Engelschall and which i maintain since january 1999. For licensing reasons, it was hard to hack Meta-HTML and so i decided to write my own macro-processor.

Instead of rewriting it from scratch, i preferred using another macro-processor engine. I chose GNU m4, written by René Seindal, because of its numerous advantages : this software is stable, robust and very well documented. This version of mp4h is derived from GNU m4 version 1.4n, which is a development version.

The mp4h software is not an HTML editor ; its unique goal is to provide an easy way to define its own macros inside HTML documents. There is no plan to add functionalities to automagically produce valid HTML documents, if you want to clean up your code or validate it, simply use a post-processor like tidy.

Command line options

Optional arguments are enclosed within square brackets. All option synonyms have a similar syntax, so when a long option accepts an argument, short option do too.

Syntax call is

mp4h [options] [filename [filename] ...]
Options are described below. If no filename is specified, or if its name is -, then characters are read on standard input.

Operation modes

--helpdisplay an help message and exit
--versionoutput mp4h version information and exit
-E --fatal-warningsstop execution after first warning
-Q --quiet --silentsuppress some warnings for builtins

Preprocessor features

-I --include=DIRECTORYsearch this directory second for includes
-D --define=NAME[=VALUE]enter NAME has having VALUE, or empty
-U --undefine=COMMANDdelete builtin COMMAND

Limits control

-H --hashsize=PRIMEset symbol lookup hash table size (default 509)
-L -nesting-limit=NUMBERchange artificial nesting limit (default 250)

Debugging

-d --debug=FLAGSset debug level (no FLAGS implies `aeq')
-t --trace=NAMEtrace NAME when it will be defined
-l --arglength=NUMBERrestrict macro tracing size
-o --error-output=FILEredirect debug and trace output

Flags are any of:
ttrace for all macro calls, not only debugging-on'ed
ashow actual arguments
eshow expansion
cshow before collect, after collect and after call
xadd a unique macro call id, useful with c flag
fsay current input file name
lsay current input line number
pshow results of path searches
ishow changes in input files
Vshorthand for all of the above flags

Description

The mp4h software is a macro-processor, which means that keywords are replaced by other text. This chapter describes all primitives. As mp4h has been specially designed for HTML documents, its syntax is very similar to HTML, with tags and attributes. An important feature has no equivalent in HTML: comments until end of line. All text following three colons is discarded until end of line, like

;;;  This is a comment

Function Macros

The definition of new tags is the most common task provided by mp4h. As with HTML, macro names are case insensitive. In this documentation, only lowercase letters are used. There are two kinds of tags: simple and complex. A simple tag has the following form:

<name [attributes]>
whereas a complex tag looks like:
<name [attributs]>
body
</name>

  In macro descriptions below, a slash indicates a complex tag, and a V letter that attributes are read verbatim (without expansion) (see the chapter on macro expansion for further details).

 
/ define-tag

This function lets you define your own tags. First argument is the command name. Replacement text is the function body.
Source
<define-tag foo>bar</define-tag>
<foo>
Output
bar
Even if spaces have usually few incidence on HTML syntax, it is important to note that

<define-tag foo>bar</define-tag>
and
<define-tag foo>
bar
</define-tag>
are not equivalent, the latter form contains two newlines that were not present in the former.

 
/ provide-tag

This command is similar to the previous one, except that no operation is performed if this command was still defined.

 
  let

Copy a function. This command is useful to save a macro definition before redefining it.
Source
<define-tag foo>one</define-tag>
<let bar foo>
<define-tag foo>two</define-tag>
<foo><bar>
Output
twoone

 
  undef

Delete a command definition.
Source
<define-tag foo>one</define-tag>
<undef foo>
<foo>
Output
<foo>

Variables

Variables are a special case of simple tags, because they do not accept attributes. In fact their use is different, because variables contain text whereas macros act like operators. A nice feature concerning variables is their manipulation as arrays. Indeed variables can be considered like newline separated lists, which will allow powerful manipulation functions as we will see below.

 
  set-var

This command sets variables.

 
 Vset-var-verbatim

As above but attributes are read verbatim.

 
  get-var

Show variable contents. If a numeric value within square brackets is appended to a variable name, it represents the index of an array. The first index of arrays is 0 by convention.
Source
<set-var version="0.10.1">
This is version <get-var version>
Output
This is version 0.10.1
Source
<set-var foo="0
1
2
3">
<get-var foo[2] foo[0] foo>
Output
200
1
2
3

 
 Vget-var-once

As above but attributes are not expanded.
Source
<define-tag foo>0.10.1</define-tag>
<set-var version="<foo>">Here is version <get-var version>
<set-var-verbatim version="<foo>">Here is version <get-var version>
<set-var-verbatim version="<foo>">Here is version <get-var-once version>
Output
Here is version 0.10.1
Here is version 0.10.1
Here is version <foo>

 
  preserve

All variables are global, there is no variable or macro scope. For this reason a stack is used to preserve variables. When this command is invoked, the first argument is the name of a variable. The value of this variable is put at the top of the stack and this variable is reset to an empty string.

 
  restore

This is the opposite: first argument is a variable name, this variable is set to the value found at the top of the stack, and this value is pooped.

 
  unset-var

Undefine variables.

 
  var-exists

Returns true when this variable exists.

 
  increment

Increment the variable whose name is the first argument. Default increment is one.

Source
<set-var i=10>
<get-var i>
<increment i><get-var i>
<increment i by="-3"><get-var i>
Output
10
11
8

 
  decrement

Decrement the variable whose name is the first argument. Default decrement is one.

Source
<set-var i=10>
<get-var i>
<decrement i><get-var i>
<decrement i by="3"><get-var i>
Output
10
9
6

 
  copy-var

Copie a variable into another.
Source
<set-var i=10>
<copy-var i j>
<get-var j>
Output
10

 
  defvar

If this variable is not defined or is defined to an empty string, then it is set to the second argument.
Source
<unset-var title>
<defvar title "Title"><get-var title>
<defvar title "New title"><get-var title>
Output
Title
Title

 
  symbol-info

Show informations on symbols. If it is a variable name, the STRING word is printed as well as the number of lines contained within this variable. If it is a macro name, one of the following messages is printed: PRIM COMPLEX, PRIM TAG, USER COMPLEX or USER TAG
Source
<set-var x="0\n1\n2\n3\n4">
<define-tag foo>bar</define-tag>
<define-tag bar endtag=required>;;;
quux</define-tag>
<symbol-info x>
<symbol-info symbol-info>
<symbol-info define-tag>
<symbol-info foo>
<symbol-info bar>
Output
STRING
5
STRING
1
STRING
1
STRING
4
USER COMPLEX

String Functions

 
  string-length

Prints the length of the string.
Source
<set-var foo="0
1
2
3">
<string-length <get-var foo>>
<set-var foo="0 1 2 3">
<set-var l=<string-length <get-var foo>>>
<get-var l>
Output
7


7

 
  downcase

Convert to lowercase letters.
Source
<downcase "Does it work?">
Output
does it work?

 
  upcase

Convert to uppercase letters.
Source
<upcase "Does it work?">
Output
DOES IT WORK?

 
  capitalize

Convert to a title, with a capital letter at the beginning of every word.
Source
<capitalize "Does it work?">
Output
Does It Work?

 
  substring

Extracts a substring from a string. First argument is original string, second and third are respectively start and end indexes. By convention first character has a null index.
Source
<set-var foo="abcdefghijk">
<substring <get-var foo> 4>
<substring <get-var foo> 4 6>
Output
efghijk
ef

 
  subst-in-string

Replace a regular expression in a string by a replacement text.
Source
<set-var foo="abcdefghijk">
<subst-in-string <get-var foo> "[c-e]">
<subst-in-string <get-var foo> "([c-e])" "\\1 ">
Output
abfghijk
abc d e fghijk

Source
<set-var foo="abcdefghijk\nabcdefghijk\nabcdefghijk">
<subst-in-string <get-var foo> ".$" "">
<subst-in-string <get-var foo> ".$" "" singleline=true>
Output
abcdefghij
abcdefghij
abcdefghij
abcdefghijk
abcdefghijk
abcdefghij

 
  subst-in-var

Performs substitutions inside variable content.

 
  string-eq

Returns true if first two arguments are equal.
Source
1:<string-eq "aAbBcC" "aabbcc">
2:<string-eq "aAbBcC" "aAbBcC">
Output
1:
2:true

Source
1:<string-eq "aAbBcC" "aabbcc" caseless=true>
2:<string-eq "aAbBcC" "aAbBcC" caseless=true>
Output
1:true
2:true

 
  string-neq

Returns true if the first two arguments are not equal.
Source
1:<string-neq "aAbBcC" "aabbcc">
2:<string-neq "aAbBcC" "aAbBcC">
Output
1:true
2:

Source
1:<string-neq "aAbBcC" "aabbcc" caseless=true>
2:<string-neq "aAbBcC" "aAbBcC" caseless=true>
Output
1:
2:

 
  string-compare

Compares two strings and returns one of the values less, greater or equal depending on this comparison.
Source
1:<string-compare "aAbBcC" "aabbcc">
2:<string-compare "aAbBcC" "aAbBcC">
Output
1:less
2:equal

Source
1:<string-compare "aAbBcC" "aabbcc" caseless=true>
Output
1:equal

 
  match

Source
1:<match "abcdefghijk" "[c-e]+">
2:<match "abcdefghijk" "[c-e]+" action=extract>
3:<match "abcdefghijk" "[c-e]+" action=delete>
4:<match "abcdefghijk" "[c-e]+" action=startpos>
5:<match "abcdefghijk" "[c-e]+" action=endpos>
6:<match "abcdefghijk" "[c-e]+" action=length>
Output
1:true
2:cde
3:abfghijk
4:2
5:5
6:3

 
  char-offsets

Prints an array containing indexes where the character appear in the string.

Source
1:<char-offsets "abcdAbCdaBcD" a>
2:<char-offsets "abcdAbCdaBcD" a caseless=true>
Output
1:0
8
2:0
4
8

 
  set-regexp-syntax

This command controls which regular expressions are used in the macros described above. There are only two possible values: basic and extended. The former are basic regular expressions and the latter are extended regular expressions. By default extended regular expressions are used.
Source
<set-var foo="abcdefghijk">
<set-regexp-syntax type=basic>
<subst-in-string <get-var foo> "([c-e]+)" ":\\1:">
<subst-in-string <get-var foo> "\\([c-e]\\{1,\\}\\)" ":\\1:">
<set-regexp-syntax type=extended>
<subst-in-string <get-var foo> "([c-e]+)" ":\\1:">
<subst-in-string <get-var foo> "\\([c-e]\\{1,\\}\\)" ":\\1:">
Output
abcdefghijk
ab:cde:fghijk

ab:cde:fghijk
abcdefghijk

 
  get-regexp-syntax

Prints actual regexp type.
Source
<get-regexp-syntax>
Output
extended

Arrays

With mp4h one can easily deal with string arrays. Variables can be treated as a single value or as a newline separated list of strings. Thus after defining

<set-var digits="0
1
2
3">
one can view its content or one of these values:
Source
<get-var digits>
<get-var digits[2]>
Output
0
1
2
3
2

 
  array-size

Returns an array size which is the number of lines present in the variable.
Source
<array-size digits>
Output
4

 
  array-append

Add a value (or more if this value contains newlines) at the end of an array.
Source
<array-append "10\n11\n12" digits>
<get-var digits>
Output
0
1
2
3
10
11
12

 
  array-add-unique

Add a value at the end of an array if this value is not already present in this variable.
Source
<array-add-unique 2 digits>
<get-var digits>
Output
0
1
2
3
10
11
12

 
  array-concat

Concatenates all arrays into the first one.
Source
<set-var foo="foo">
<set-var bar="bar">
<array-concat foo bar><get-var foo>
Output
foo
bar

 
  array-member

If value is contained in array, returns its index otherwise returns -1.
Source
<array-member 11 digits>
Output
5

 
  array-shift

Shifts an array. If offset is negative, indexes below 0 are lost. If offset is positive, first indexes are filled with empty strings.
Source
<array-shift 2 digits>
Now: <get-var digits>
<array-shift -4 digits>
And: <get-var digits>
Output
Now: 

0
1
2
3
10
11
12

And: 2
3
10
11
12

 
  sort

Sort lines of an array in place. Default is to sort lines alphabetically.
Source
<sort digits><get-var digits>
Output
12
2
3

Numerical operators

These operators perform basic arithmetic operations. When all operands are integers result is an integer too, otherwise it is a float. These operators are self-explanatory.

 
  add

 
  substract

 
  multiply

 
  divide

 
  min

 
  max
Source
<add 1 2 3 4 5 6>
<add 1 2 3 4 5 6.>
Output
21
21.000000
Source
<define-tag factorial whitespace=delete>
<ifeq %0 1 1 <multiply %0 <factorial <substract %0 1>>>>
</define-tag>
<factorial 6>
Output
720

 
  modulo

Unlike functions listed above the modulo function cannot handle more than 2 arguments, and these arguments must be integers.
Source
<modulo 345 7>
Output
2

Those functions compare two numbers and returns true when this comparison is true. If one argument is not a number, comparison is false.

 
  gt

Returns true if first argument is greater than second.

 
  lt

Returns true if first argument is lower than second.

 
  eq

Returns true if arguments are equal.

 
  neq

Returns true if arguments are not equal.

Relational operators

 
  not

Returns true if string is empty, otherwise returns an empty string.

 
  and

Returns the last argument if all arguments are non empty.

 
  or

Returns the first non empty argument.

Flow functions

 
 Vgroup

This function groups multiple statements into a single one. Some examples will be seen below with conditional operations.

A less intuitive but very helpful use of this macro is to preserve newlines when whitespace=delete is specified.
Source
<define-tag text1>
Text on
3 lines without
whitespace=delete
</define-tag>
<define-tag text2 whitespace=delete>
Text on
3 lines with
whitespace=delete
</define-tag>
<define-tag text3 whitespace=delete>
<group "Text on
3 lines with
whitespace=delete">
</define-tag>
<text1>
<text2>
<text3>
Output
Text on
3 lines without
whitespace=delete

Text on3 lines withwhitespace=delete
Text on
3 lines with
whitespace=delete

Note that newlines are suppressed in text2 and result is certainly unwanted.

Source
<subst-in-string "=LT=define-tag foo>bar</define-tag>" "=LT=" "<">
<foo>
<subst-in-string "=LT=define-tag foo>quux</define-tag>" "=LT=" "<quote \"<\">">
<foo>
Output
bar
<quote "<">define-tag foo>quux</define-tag>
bar

 
 Vif

If string is non empty, second argument is evalled otherwise third argument is evalled.
Source
<define-tag test whitespace=delete>
<if %0 "yes" "no">
</define-tag>
<test "string">
<test "">
Output
yes
no

 
 Vifeq

If first two arguments are identical strings, third argument is evalled otherwise fourth argument is evalled.

 
 Vifneq

If first two arguments are not identical strings, third argument is evalled otherwise fourth argument is evalled.

 
/ when

When argument is not empty, its body fuction is evalled.

 
/Vwhile

While condition is true, body function is evalled.
Source
<set-var i=10>
<while <gt <get-var i> 0>>;;;
  <get-var i> <decrement i>;;;
</while>
Output
10 9 8 7 6 5 4 3 2 1 

 
/ foreach

This macro is similar to the foreach Perl's macro: a variable loops over array values and function body is evalled for each value. first argument is a generic variable name, and second is the name of an array.
Source
<set-var x="1\n2\n3\n4\n5\n6">
<foreach i x><get-var i> </foreach>
Output
1 2 3 4 5 6 

 
 Vvar-case

This command performs multiple conditions with a single instruction.
Source
<set-var i=0>
<define-tag test whitespace=delete>
<var-case x=1 <group <increment i> x<get-var i>> x=2 <group <decrement i> x<get-var i>> y=1 <group <increment i> y<get-var i>> y=2 <group <decrement i> y<get-var i>>>
</define-tag>
<set-var x=1 y=2><test>
<set-var x=0 y=2><test>
Output
x1y0
y-1

 
  break

Breaks the innermost while loop.
Source
<set-var i=10>
<while <gt <get-var i> 0>>;;;
  <get-var i> <decrement i>;;;
  <ifeq <get-var i> 5 <break>>;;;
</while>
Output
10 9 8 7 6 

 
  warning

Prints a warning on standard error.

 
  exit

Immediately exits program.

 
  at-end-of-file

This is a special command: all attributes are stored and will be expanded after end of input.

File functions

 
  directory-contents

Returns a newline separated list of files contained in a given directory.
Source
<directory-contents . matching=".*\\.mp4h$">
Output
mp4h.mp4h

 
  file-exists

Returns true if file exists.

 
  get-file-properties

Returns an array of informations on this file. These informations are: size, type, ctime, mtime, atime, owner and group.
Source
<get-file-properties mp4h.mp4h>
Output
40737
FILE
951738039
951738039
951746915
barbier
imacs

 
  include

Read input from another file.

 
/ comment

This tag does nothing, its body is simply discarded.

 
  set-eol-comment

Change comment characters.

Debugging

When constructs become complex it could be hard to debug them. Functions listed below are very useful when you could not figure what is wrong. These functions are not perfect yet and must be improved in future releases.

 
  function-def

Prints the replacement text of a user defined macro. For instance, the macro used to generate all examples of this documentation is
Source
<function-def example>
Output
<set-var-verbatim verb-body=%ubody><subst-in-var verb-body "&" "&amp;"><subst-in-var verb-body "<" "&lt;">
<subst-in-var verb-body ">" "&gt;"><set-var body=%body><subst-in-var body "&" "&amp;"><subst-in-var body "<" "&lt;">
<subst-in-var body ">" "&gt;"><subst-in-var body "<three-colon>\n[ \t]*" "" singleline=true><subst-in-var body "<three-colon>[^;][^\n]*\n[ \t]*" "" singleline=true><subst-in-var body "<three-colon>$" ""><subst-in-var body "^\n+" "" singleline=true><table border="2" cellpadding="0" cellspacing="0" width="80%" summary=""><tr><th bgcolor="#ccccff"><lang:example-source></th></tr><tr><td bgcolor="#ccff99" width="80%"><pre><get-var-once verb-body></pre></td></tr><tr><th bgcolor="#ccccff"><lang:example-output></th></tr><tr><td bgcolor="#ff99cc" width="80%"><pre><get-var-once body></pre></td></tr></table>

 
  debugmode

This comand acts like the -d flag but can be ynamically changed.

 
  debugfile

Selects a file where debugging messages are diverted. If this filename is empty, debugging messages are sent back to standard error, and if it is set to - these messages are discarded.

Note: There is no way to print these debugging messages into the document being processed.

 
  debugging-on

Declare these macros traced, i.e. informations about these macros will be printed if -d flag or debugmode macro are used.

 
  debugging-off

These macros are no more traced.

Miscellaneous

 
  __file__

Without argument this macro prints current input filename. With an argument, this macro sets the string returned by future invocation of this macro.

 
  __line__

Without argument this macro prints current number line in input file. With an argument, this macro sets the number returned by future invocation of this macro.
Source
This is <__file__>, line <__line__>.
Output
This is mp4h.mp4h, line 1515.

If you closely look at source code you will see that this number is wrong. Indeed the number line is the end of the entire block containing this instruction.

 
  __version__

Prints the version of mp4h.

 
  date

Prints local time according to the epoch passed on argument. If there is no argument, current local time is printed.
Source
<date>
<set-var info=<get-file-properties <__file__>>>
<date <get-var info[2]>>
Output
Mon Feb 28 15:08:36 2000

Mon Feb 28 12:40:39 2000

 
  timer

Prints the time spent since last call to this macro. The printed value is the number of clock ticks, and so is dependant of your CPU.
Source
The number of clock ticks since the beginning of generation of
this documentation by <mp4h> is:
<timer>
Output
The number of clock ticks since the beginning of generation of
this documentation by <b>mp4h</b> is:
user 352
sys 41

Macro expansion

This part describes internal mechanism of macro expansion. It must be as precise and exhaustive as possible so contact me if you have any suggestion.

Généralités

Let us begin with some examples:
Source
<define-tag foo>;;;
This is a simple tag;;;
</define-tag>;;;
<define-tag bar endtag=required>;;;
This is a complex tag;;;
</define-tag>;;;
<foo>
<bar>Body function</bar>
Output
This is a simple tag
This is a complex tag

User defined macros may have attributes like HTML tags. To handle these attributes in replacement text, following conventions have been adopted (mostly derived from Meta-HTML):

Note: Input expansion is completely different in Meta-HTML and in mp4h. With Meta-HTML it is sometimes necessary to use other constructs like %xbody and %qbody. In order to improve compatibity with Meta-HTML, these constructs are recognized and are interpreted like %body. Another feature provided for compatibility reason is the fact that for simple tags %body and %attributes are equivalent. These features are in the current mp4h version but may disappear in future releases.

Attributes

Attributes are separated by spaces, tabulations or newlines, and each attribute must be a valid mp4h entity. For instance with the definitions above, <bar> can not be an attribute since it must be finished by </bar>. But this is valid:

<foo <foo>>
or even
<foo <foo name=src url=ici>>
In these examples, the foo has only one argument.

Under certain circumstances it is necessary to group multiple statements into a single one. This can be done with double quotes or with the group primitive, e.g.

<foo "This is the 1st attribute" <group and the second>>

Note: Unlike HTML single quotes can not replace doube quotes for this purpose.

If double quotes appear in an argument, they must be escaped by a backslash \.
Source
  <set-var text="Text with double quotes \" inside">;;;
  <get-var text>
Output
  Text with double quotes " inside

Macro evaluation

Macros are characterized by

Characters are read on input until a < is found. Then macro name is read. After that attributes are read, verbatim or not depending on how this macro as been defined. And if this macro is complex, its body is read verbatim. When this is finished, some special sequences in replacement text are replaced (like %body, %attributes, %0, %1, etc.) and resulting text is put on input stack in order to be rescanned.

Note: By default attributes are evalled before any replacement.

Consider the following example, to change text in typewriter font:

<define-tag text-tt endtag=required whitespace=delete>
<tt>%body</tt>
</define-tag>

This definition has a major drawback:
Source
<text-tt>This is an <text-tt>example</text-tt></text-tt>
Output
<tt>This is an <tt>example</tt></tt>
We would like that inner tags are removed.

First idea is to use an auxiliary variable to know whether we still are inside such an environment:

<set-var _text:tt=0>
<define-tag text-tt endtag=required whitespace=delete>
<increment _text:tt>
<ifeq <get-var _text:tt> 1 "<tt>">
%body
<ifeq <get-var _text:tt> 1 "</tt>">
<decrement _text:tt>
</define-tag>
Source
<text-tt>This is an <text-tt>example</text-tt></text-tt>
Output
<tt>This is an example</tt>

But if we use simple tags, as in the example below, our definition does not seem to work. It is because attributes are expanded before they are put into replacement text.
Source
<define-tag opt><text-tt>%attributes</text-tt></define-tag>
<opt "This is an <opt example>">
Output
<tt>This is an <tt>example</tt></tt>

If we want to prevent this problem we have to forbid attributes expansion with
Source
<define-tag opt attributes=verbatim><text-tt>%attributes</text-tt></define-tag>
<opt "This is an <opt example>">
Output
<tt>This is an example</tt>

Author

Denis Barbier <barbier@imacs.polytechnique.fr>

Thanks

Sincere thanks to Brian J. Fox for writing Meta-HTML and Rene Seindal for maintaining this wonderful macro parser called GNU m4.