PML User Manual

PML Version

3.1.0 2022-10-03

License

CC BY-ND 4.0

Author and Copyright

Christian Neumanns

Website

https://www.pml-lang.dev

PML Markup Code

GitHub

Introduction

This manual is about creating web articles and books with the Practical Markup Language (PML).

If you have questions or suggestions then please post a message on GitHub discussions, or send an email to contact {at} pml-lang {dot} dev.

Quick Start

After installing PML, you can create a web document in three easy steps:

  • Create a PML file using any editor

  • Convert the PML file to an HTML file

  • Open the HTML file in your browser

Here is an example of how to proceed:

1. Create a PML file

Use your preferred text editor to create a text file named example.pml in any directory of your choice, and with the following content:

[doc [title First test]
    This is a [i simple] example.
]
2. Convert the PML file to an HTML file

Open a terminal in the directory of file example.pml.

Note

For instructions on how to do this in Windows, search for "open a terminal in Windows", or refer to this article.

Convert the PML file into an HTML file named example.html by entering the following command:

pmlc PML_to_HTML example.pml
Note

Alternatively you can also type:

pmlc p2h example.pml

By default, the resulting HTML file is written to directory output of your current working directory (i.e. output/example.html).

Here is an example of a terminal session in Windows:

C:\tests>pmlc p2h example.pml
INFO: Creating HTML file 'output\example.html'.
C:\tests>
3. Open the HTML file

Open file output/example.html in your browser. The result looks like this:

Tips

To see the list of options you can use with command PML_to_HTML, refer to chapter Convert PML to HTML in PMLC Commands Reference Manual. This manual also lists all other commands you can use with pmlc.

For general help type:

pmlc -h

Anatomy of a PML Document

Document Tree

Technically speaking, a PML document is a tree composed of PML nodes.

Here is a visual representation of the tree structure of a simple PML document:

  • document

    • chapter

      • title

      • paragraph

      • paragraph

      • paragraph

    • chapter

      • title

      • paragraph

      • image

The above document has two chapters. The first chapter is composed of a title and three paragraphs. The second chapter contains a title, a paragraph, and an image.

A real document having the above structure would look like this:

File example.pml
[doc [title PML Document Example]

    [ch [title Chapter 1]

        Text of paragraph 1.

        Text of paragraph 2.

        Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
    ]

    [ch [title Chapter 2]

        Paragraph

        [image source = images/strawberries.jpg]
    ]
]

File example.pml can be converted to file output/example.html with the following command:

pmlc p2h example.pml
Note

The above PML document uses image file images/strawberries.jpg. By default, all resources used in a document are located in directory resources. Therefore file resources/images/strawberries.jpg must exist before executing the above command.

Now file output/example.html can be opened in the browser, and the result looks like this:

Nodes

There are different types of nodes in a document tree.

A node's type is determined by a name. For example:

  • A chapter node has the name ch, and represents a chapter of an article or book.

  • An image node represents an image to be inserted in the document.

  • An italic node has the name i, and is used to write text in italics.

Every node in a document starts with [ and ends with ]. The node's name (always required) is written immediately after the opening [, without a space (e.g. [doc ... ] or [ch ... ]).

A node can be empty (it has a name, but no content), contain only text, or contain a mixture of text and child nodes. A node's name and its content are separated by a whitespace character (a space, TAB, or new line). Here are three examples:

[nl]
[i node containing only text displayed in italics]
[p A [i simple] paragraph with [b six] words.]

The full list of nodes is documented in the PML Reference Manual.

Attributes

Some nodes have attributes.

Some attributes are required, and some are optional.

For example, the image node has a required attribute named source, which defines the image's file path. Moreover it has some optional attributes, such as width and height, to explicitly define the image's dimensions in pixels. Here is an example of an image node with values assigned to attributes source and width:

[image ( source="images/juicy apple.png" width="400" ) ]

As we can see:

  • Attributes are enclosed in parentheses:

    ( ... )
  • The syntax name = "value" is used to assign a value to an attribute. For example:

    width="400"
  • Attribute assignments are separated by a space. No comma is needed:

    (a1="v1" a2="v2")
            ^

The list of available attributes for each node is documented in the PML Reference Manual.

HTML Attributes

HTML attributes can optionally be specified for some nodes.

This is used to explicitly set HTML attributes in the resulting HTML code. The most frequent use of HTML attributes is to explicitly set the style for a specific element.

An HTML attribute name starts with html_, followed by the real HTML attribute name. Thus, to specify a style attribute in the resulting HTML code, you would write for example: html_style="color:red;". Any valid CSS can be assigned to a html_style attribute.

Suppose we want to write a paragraph in red letters, surrounded by a blue dashed border. We can do it like this:

  • PML code:
    [p (html_style = "color:red; border:1px dashed blue")
        It is important to note that ...
    ]
  • Result:

    It is important to note that ...

  • Generated HTML code:
    <p class="pml-paragraph" style="color:red; border:1px dashed blue">It is important to note that ... </p>

To see if HTML attributes are allowed for a given node, please refer to the PML Reference Manual.

To see the list of attributes supported for a given node, please refer to the official HTML specification. PMLC does not check if an HTML attribute is valid.

Text Processing

A PML document is written using the PDML syntax. Therefore, all features available in PDML are also supported in PML. You can refer to PDML's documentation to get more information about PDML. In this chapter, we'll have a look at some PDML features that are often used in PML documents.

Comments

A comment starts with [- and ends with -]. Comments can appear anywhere, and they can be nested to any level. Text within comments is ignored.

Example:

  • PML code:
    This is [- good -] awesome.
    [- TODO: explain why -]
    
    Text
    [-
        This [i bad] text not show.
        [- a
            nested
            comment -]
    -]
    
    More text
  • Result:
    This is awesome.
    
    Text
    
    More text

Lenient Parsing

The following rules are applied to make the syntax more concise:

  • If a node can only have attributes (no child nodes) the parenthesis around attributes can be omitted.

    For example, the image node has no child nodes. Therefore, instead of writing:

    [image ( source="images/juicy apple.png" width="400" ) ]
           ^                                             ^

    ... we can also write:

    [image source="images/juicy apple.png" width="400" ]
  • Quotes around attribute values can be omitted if the value does not contain:

    • whitespace (a space, tab, carriage return or line feed)

    • any of the following characters: [ ] ( ) " '

    Hence, instead of writing:

    width="400"
          ^   ^

    ... we can write:

    width=400
  • If a node in a PML document has no attributes, it is not necessary to explicitly state the absence of attributes by writing (). Hence, the following code:

    [div () text]
         ^^

    ... can be shortened to

    [div text]

    However, if the node's text starts with (, then () is required.

    Say we want to render the text: (organic = healthy). In that case we can't write:

    [i (organic = healthy)]

    ... because the parser would interpret this as an attribute assignment (i.e. the value healthy assigned to attribute organic.

    To eliminate the confusion we have to write:

    [i() (organic = healthy)]
      ^^

If we apply all above rules, then this code:

[image ( source = "images/juicy apple.png" width = "400" ) ]

... can be shortened to:

[image source="images/juicy apple.png" width=400]

Whitespace Handling

PML uses whitespace handling rules that aim to be intuitive and practical. It is important to be aware of these rules, because ignoring them can lead to surprising or unwanted results, especially in edge cases.

As there is no standard and unique definition for whitespace, we first need to define some terms used in the context of PML:

  • Whitespace character

    There are four whitespace characters:

    NameC-style syntaxUnicode
    Space' 'U+0020
    Tab'\t'U+0009
    Carriage return'\r'U+000D
    Line feed'\n'U+000A


  • Whitespace

    The term whitespace is used to denote any sequence of one or more whitespace characters. For example: 4 spaces, followed by two tabs and a new line.

  • New line

    New lines are defined differently in Unix and Windows. Unix uses a single line feed (LF). Windows uses a carriage return, followed by a line feed (CRLF).

    New line rules in PML depend on whether PML is reading or writing text.

    When PML reads text, it supports both new line variations (LF and/or CRLF) correctly on Unix and Windows systems, even if a single document uses a mixture of Unix/Windows new lines.

    When PML writes text (e.g. a HTML file) it uses the operating system's canonical new line (CRLF on Windows, LF on Unix).

Whitespace is handled differently in nodes and attributes, as explained in the following chapters.

Nodes

Whitespace in nodes is handled as follows:

  • Whitespace reduction

    Whitespace in text is replaced by a single space character.

    Writing

    this     is
    text

    ... is the same as:

    this is text

    To preserve a sequence of several whitespace characters, the node sp can be used to explicitly insert non-breaking spaces, and the node nl can be used to explicitly insert new lines, e.g.:

    this[sp][sp][sp][sp][sp]is[nl]text

    ... is rendered as:

    this     is
    text

    Moreover, the monospace node can be used to insert a block of text in which whitespace is preserved (similar to the pre tag in HTML):

    [monospace
    this     is
    text
    ]
  • Paragraph breaks

    A sequence of two consecutive new lines generates a paragraph break.

    Writing:

    Paragraph 1.
    
    Paragraph 2.

    ... is the same as:

    [p Paragraph 1.]
    [p Paragraph 2.]

    ... and is rendered as:

    Paragraph 1.
    
    Paragraph 2.

    However, writing:

    Paragraph 1.
    Paragraph 2.

    ... would be the same as

    [p Paragraph 1. Paragraph 2.]

    ... and is rendered as:

    Paragraph 1. Paragraph 2.
  • Whitespace removal
    • Leading and trailing whitespace in an auto-generated paragraph is removed.

    • An auto-generated paragraph containing only whitespace is removed.

  • Whitespace does not define structure

    Adding or removing whitespace characters in a whitespace segment does not alter a document's structure. Hence, whitespace can be used freely to make documents more readable or visually more appealing.

    For example, instead of writing:

    [doc [title Doc Title]
    [ch [title Chapter 1] text]
    [ch [title Chapter 2] text]
    ]

    ... we can make the structure easier to grasp like this:

    [doc [title Doc Title]
    
        [ch [title Chapter 1]
            text
        ]
    
        [ch [title Chapter 2]
            text
        ]
    ]

Attributes

Whitespace in attributes is handled as follows:

  • Whitespace elimination

    Whitespace around attribute symbols ((, ), and =) is ignored. The following three image nodes are semantically equivalent:

    [image ( source = "images/juicy apple.png" width = "400" height = "200" ) ]
    
    [image(source="images/juicy apple.png" width=400 height=200)]
    
    [image (
        source = "images/juicy apple.png"
         width = 400
        height = 200
    ) ]
  • Whitespace in attribute values

    As stated already, unquoted attribute values cannot contain whitespace. Instead of writing color = light orange, we must write color = "light orange"

    On the other hand, quoted attribute values can contain whitespace (any sequences of spaces, tabs, and new lines). Whitespace within a quoted value is preserved.

    Suppose we want to assign the following value to attribute quote:

    He said:
       "She said: 'Wow!'"

    This can be achieved with:

    quote = "He said:
       \"She said: 'Wow!'\""

    Unix and Windows new lines are supported in attribute values.

    Unix or Windows new lines can be enforced by using escape sequences. For example, to force Windows new lines in the above example, we can write:

    quote = "He said:\r\n   \"She said: 'Wow!'\""

Escape Characters

Character escape rules in node text and attribute values are slightly different, as explained in the following chapters.

Nodes

As seen already, characters [ and ] define the start and end of a node.

Therefore, if these characters are used in text, they must be escaped, to avoid confusion. This is done by prefixing the character with a backslash (\). For instance, instead of writing [, we have to write \[.

As a backslash is used as escape character, it must itself also be escaped when it is used in text. Hence, instead of writing \, we have to write \\.

Here is an example to demonstrate how escaping works:

  • PML code:
    File path = C:\\tests\\example.txt
    
    Instead of writing \\, we have to write \\\\
    
    Instead of writing \[, we have to write \\\[
  • Result:

    File path = C:\tests\example.txt

    Instead of writing \, we have to write \\

    Instead of writing [, we have to write \[

The final rule is simple: Characters [, ], and \ must be preceded by \ when they are used in normal text.

Besides characters that must be escaped, there are also characters that can be escaped if desired, as shown in the following table:

Character or nameEscape sequenceMandatory
\\\yes
[\[yes
]\]yes
Tab\tno
Carriage return\rno
Line feed\nno
Unicode escape 4 hex digits\uhhhh (e.g. \u2764 for "heart shape": ♥)no
Unicode escape 8 hex digits\Uhhhhhhhh (e.g. \U0001F600 for "grinning face": 😀)no

Attributes

Escape sequences are not supported in unquoted attribute values.

If an attribute value is quoted, the following escape sequences are supported:

Character or nameEscape sequenceMandatory
"\"yes
\\\yes
[\[no
]\]no
Tab\tno
Carriage return\rno
Line feed\nno
Unicode escape 4 hex digits\uhhhh (e.g. \u2764 for "heart shape": ♥)no
Unicode escape 8 hex digits\Uhhhhhhhh (e.g. \U0001F600 for "grinning face": 😀)no

Example: Suppose we want to assign the value C:\temp\test.txt to attribute path. In this case the value can be quoted or unquoted. If the value is quoted then \ must be escaped:

Quoted:path = "C:\\temp\\test.txt"
Unquoted:path = C:\temp\test.txt

Parameters

Sometimes the same text or markup code appears several times in a document. In such cases you can use a parameter to avoid retyping or copy/pasting the same text again and again.

A parameter is composed of an identifier (unique name) and a value. The syntax for assigning a value to a parameter is:

[u:set name = value]

Note the mandatory u: before the node name set. The prefix u: defines a namespace with identifier u, which stands for utility node. This is necessary to make a distinction between normal PML nodes that contain text, and other nodes that are used to handle text (text processing).

The value assigned to a parameter can be re-used in the document with the following syntax:

[u:get name]

Here is an example of a URL that is re-used two times.

  • PML code:
    [u:set docs_root_URL = http://www.example.com/project/docs/public]
    
    For an overview please read the article [link url=[u:get docs_root_URL]/concepts.html text="Basic Concepts"].
    
    For detailed information please refer to the [link url=[u:get docs_root_URL]/user_manual.html text="User Manual"].
  • Result:

    For an overview please read the article Basic Concepts.

    For detailed information please refer to the User Manual.

You can define any number of parameters, anywhere in the document. Parameters are often defined at the beginning of the document, just after the doc node. After declaring a parameter, its value can be re-used any number of times in the document.

After assigning a value to a parameter, its value cannot be changed later in the document. Parameters are like constants in programming languages.

The syntax rules for assigning values to parameters are the same as those for attributes (lenient parsing, whitespace handling, and character escapes).

A parameter identifier must start with a letter or an underscore, and can be followed by any number of letters, digits, underscores, hyphens, and dots. Note for programmers: The regex of an identifier is: [a-zA-Z_][a-zA-Z0-9_\.-]*. Identifiers are case-sensitive. The following identifiers are all different: name, Name, and NAME.

You can assign several parameters in a single set node. For example:

[u:set
    color = "deep blue"
    default_width = 300
]

Besides assigning simple text snippets to parameters, you can also assign markup snippets to re-reuse. Imagine, for example, that a company logo needs to be inserted several times in the document. Suppose that the markup code to be inserted is:

[image source=images/company_logo.png width=200 height=200 border=yes]

Re-inserting this code several times would be cumbersome. Worse, it would be hard to maintain. For example, if the logo's dimensions are changed later, the change must be done everywhere the node is used. These inconveniences can easily be eliminated by using a parameter. Here is the code to define the code once and re-use it two times:

[u:set company_logo = "[image source=images/company_logo.png width=200 height=200 border=yes]"]
...
[u:get company_logo]
...
[u:get company_logo]

If the dimensions are changed later, you just need to make the change at one place.

Parameters can use other parameters that have already been defined in the document. For example, you might want to define a common root directory once, and re-use it in the definition of subsequent parameters:

[u:set root_directory = /foo/bar/]
[u:set images_directory = "[u:get root_directory]images"]
[u:set examples_directory = "[u:get root_directory]examples"]

[p Images: [u:get images_directory]]
[p Examples: [u:get examples_directory]]

This is rendered as:

Images: /foo/bar/images

Examples: /foo/bar/examples

File Splitting

If you create a big document, it is useful to split it up into several files, instead of having one big file that contains all the text. For example, each chapter could be defined in a separate pml file.

The syntax to insert a pml file at the current location is:

[u:ins_file path = file_path]

file_path can be an absolute or relative path. If it's a relative path, it's relative to the directory of the pml file in which ins_file is used.

Example

Suppose we create a PML document composed of two chapters. We want each chapter to be defined in its own file, in sub-directory chapters (relative to the main document).

  • PML code:

    We create file chapters/chapter_1.pml with this content:

    [ch [title Chapter 1]
        blah blah blah
    ]

    We also create file chapters/chapter_2.pml:

    [ch [title Chapter 2]
        blah blah blah
    ]

    The main file book.pml is defined like this:

    [doc [title Book]
        [u:ins_file path = chapters/chapter_1.pml]
        [u:ins_file path = chapters/chapter_2.pml]
    ]
  • Result:

Table of Contents

By default, a table of contents (TOC) is created on the left side of the HTML page, as shown before.

The TOC is automatically created based on the chapters defined in the document. As chapters can be nested, the TOC results in a tree, which the user can expand or collapse. When the document is first displayed, all TOC chapters beyond level 1 are collapsed. Only chapters of level 1 are displayed. The user can then expand chapters and sub-chapters, and click on a chapter's title to see its content.

PML's default behavior for the TOC can be customized with the following options:

  • TOC_title: change or remove the title displayed at the top of the TOC

  • TOC_position: define the TOC's position. Allowed values are:

    • left: Display the TOC at the left side. This is the default value.

    • top: Display the TOC at the top of the document, after the document's title.

    • none: Don't display a TOC.

  • TOC_max_level: the maximum chapter level that is included in the table of contents. Chapters with a higher level are excluded from the TOC.

Here is an example of using an options node to set the TOC's title to "Inhaltsverzeichnis", display the TOC at the top, and include only chapters up to level 4:

[doc [title TOC test]

    [options
        [TOC_title Inhaltsverzeichnis]
        [TOC_position top]
        [TOC_max_level 4]
    ]

    lorem ipsum tralala ...
]

Customization

A major goal of PMLC is to give you full control over how PML documents are converted to HTML or other formats. Therefore PMLC provides several features allowing you to partially or fully customize the conversion process, so that the target document's look and feel honors your preferences and specific requirements.

Customization is currently supported in the following ways:

  • A set of options that can be specified at the command line, in the PML document, or in a shared options file.

  • HTML Attributes.

  • Customized CSS files to style the HTML document.

  • User-defined nodes that allow you to add new nodes to PML, or to override PML's standard nodes.

The following features are planned to be supported in future versions:

  • HTML templates to easily customize HTML code generation for specific nodes.

  • Custom HTML node writers to programmatically generate specific HTML code for individual (or all) nodes.

  • Custom writers to programmatically convert PML documents into other formats, such as plain text, JSON, XML, PDF, etc.

  • AST interception to programmatically change the AST produced by the PML parser (i.e. change, add, and remove PML nodes and attributes).

Options

To see the list of options available when a PML document is converted to HTML, please refer to section "Input Parameters" in chapter Convert PML to HTML of the PMLC Commands Reference Manual.

When PMLC executes command PML_to_HTML, it looks for options in the following order:

  • If the value for an option is explicitly specified as a command line argument, then this value is applied. Options defined on the command line always have highest priority.

    For example, the title and position for the table of contents can be explicitly defined on the command line like this:

    --TOC_title "Book Content" --TOC_position top
  • If no value is specified on the command line, then PMLC looks for a value specified within an options node in the PML document. If a value is found then this value is applied.

    The options node must be defined as a direct child node of the doc node.

    For example, the title and position for the table of contents can be defined in the PML document as follows:

    [doc [title Options Example]
    
        [options
            [TOC_title Book Content]
            [TOC_position top]
        ]
    
        text ...
    ]
  • If a value is neither specified on the command line nor in the PML document, then PMLC looks for a value specified in a shared options file.

    The relative path of this file is config/PML_to_HTML/options.pdml. The file's root directory depends on the operating system. For example, on Windows it is normally a subdirectory of %APPDATA%. To know the exact location on your machine, you can run command pmlc info in a terminal. Then look for the field labeled Shared data dir., which shows the root directory. A typical value on Windows would be C:\Users\Albert\AppData\Roaming\PMLC_3.0\. In this case the full path of the options file would be
    C:\Users\Albert\AppData\Roaming\PMLC_3.0\config\PML_to_HTML\options.pdml

    Note that file config/PML_to_HTML/options.pdml is itself optional. If it doesn't exist, you need to create it manually to store shared options.

    If the file exists, and a value is found for an option, then this value is applied.

    The content of the file is the same as the options node in a PML document.

    For example, the default title and position for the table of contents can be defined as follows:

    File config/PML_to_HTML/options.pdml
    [options
        [TOC_title Book Content]
        [TOC_position top]
    ]
    Tip

    You should consider using the options.pdml file for options that are needed in most or all of your PML documents. Defining options in the shared options.pdml file eliminates the need to repeat the same options again and again in different PML documents or as command arguments.

  • If a value is neither specified on the command line, nor in the PML document, nor in a shared options file, then PMLC applies a hard-coded default value.

Note

The following options are only available as command line arguments. They cannot be used in a PML document or in a shared options file:
input, output, verbosity, open_file_cmd

HTML Attributes

If you want to change the style of a single node, you can define an HTML attribute for that node, as explained previously. You can assign any valid CSS to the node's html_style attribute, or you can use an html_class attribute.

Besides customizing CSS, HTML attributes defined for a PML node can also be used to set other HTML attributes in the target HTML document.

Please refer to chapter HTML Attributes for more information.

Customized CSS

Option CSS_files can be used to specify one or more customized CSS files to be applied in the final HTML document. For more information, please refer to parameter CSS_files in section "Input Parameters" of chapter Convert PML to HTML.

Each tag in the final HTML document has a class attribute. Hence, the class value can be used in any CSS file to style the HTML node. All CSS class names used in PML are prefixed with pml-, so that PML styling doesn't interfere with other styling rules that might exist in the final HTML page.

For example, the document title's class is pml-doc-title. Hence, to change the appearance (font, size, color, etc.) of the document's title, you can apply any CSS rule to class pml-doc-title in a CSS file.

If you want to change the style just for an individual node, you can use the html_style attribute, as seen already. Alternatively, you can define an identifier for the node, and then use the identifier in the CSS file to change the style.

For example, to display a single paragraph with a yellow background, you would write the following PML code:

[p (id = my-id)
    This text is displayed on a yellow background.
]

Then you can add the following rule in a CSS file:

#my-id {
    background-color: yellow;
}

Result:

This text is displayed on a yellow background.

User-Defined Nodes

Please refer to chapter User-Defined Nodes for more information.