The LDP uses a number of scripts to distribute your document. These scripts retrieve your document from the LDP's document version management system (currently git using GitHub), and then they transform your document to other formats that users then read. Your document will also be mirrored on a number of sites worldwide (yet another set of scripts).
In order for these scripts to work correctly, your document must be both “well formed” and use “valid markup”. Well formed means your document follows the rules that XML is expecting: it complies with XML grammar rules. Valid markup means you only use elements or tags which are “valid” for your document: XML vocabulary rules are applied.
If your document is not well formed or uses invalid markup, the scripts will not be able to process it. As a result, your revised document will not be distributed.
There is more information about how to validate your document in the DocBook section. Check out Section 3, “Validation” for more help with validating your document.
Your life is already hard enough without having to install a full set of tools just to see if you validate as well. You can upload your raw XML files to a web site, then go to http://validate.sf.net, enter the URL to your document, then validate it.
When this information was added to the Author Guide external entities were not supported. Follow the instructions provided on the Validate site if you have trouble.
XML and SGML files contain most of the information you need; however, there are sometimes entities which are specific to SGML in general. To match these entities to their actual values you need to use a catalog. The role of a catalog is to tell your system where to find the files it is looking for. You may want to think of a catalog as a guide book (or a map) for your tools.
Most distributions (Red Hat/Fedora and Debian at least) have a common location
for the main SGML catalog file, called
In times past, it could also be found in
The structure of XML catalog files is not the same as SGML catalog files. The section on tailoring a catalog (see Section 3.4, “Creating and modifying catalogs”) will give more details about what these files actually contain.
If your system cannot find the catalog file, or you are using
custom catalog files, you may need to set the
XML_CATALOG_FILES environment variables. Using
check to see if it is currently set. If a blank line is returned,
the variable has not been set. Use the same command to see if
XML_CATALOG_FILES is set as well. If the variables
are not set, use the following example to set them now.
Example B.1. Setting the SGML_CATALOG_FILES and XML_CATALOG_FILES Environmental Variables
To make this change permanent, you can add the following lines to
If you installed XML tools via a RedHat or Debian package, you probably don't need to do this step. If you are using a custom XML catalog you will definitely need to do this. There is more on custom catalogs in the next section. To ensure my backup scripts grab this custom file, I have added mine in a sub-directory of my home directory named “docbook”.
You can also change your
.bashrc if you want to
save these changes.
If you are adding the changes to your
.bashrc you will not see the changes
until you open a new terminal window. To make the changes immediate in the current terminal,
“source” the configuration file.
In the previous section I mentioned a catalog is like a guide book for your tools. Specifically, a catalog maps the rules from the public identifier to your system's files.
At the top of every DocBook (or indeed every XML) file there is a
DOCTYPE which tells the processing tool what kind of document it is
about to be processed. At a minimum this declaration will include a public
identifier, such as
V4.2//EN. This public identifier has a number of sections all
//. It contains the following
information: ISO standard if any (
- -- in this case
there is no ISO standard),
author (OASIS), type of document (DTD DocBook V4.2), language
(English). Your DOCTYPE may also include a URL.
A public identifier is useless to a processing tool, as it needs to be able to access the actual DTD. A URL is useless if the processing tool is off-line. To help your processor deal with these problems you can download all of the necessary files and then “map” them for your processing tools by using a catalog.
If you are using SGML processing tools (for instance Jade), you will need an SGML catalog. If you are using XML processing tools (like XSLT), you will need an XML catalog. Information on both is included.
Example B.2. Example of an SGML catalog
-- Catalog for the Conectiva Styles -- OVERRIDE YES PUBLIC "-//Conectiva SA//DTD DocBook Conectiva variant V1.0//EN" "/home/ldp/styles/books.dtd" DELEGATE "-//OASIS" "/home/ldp/SGML/dtds/catalog.dtd" DOCTYPE BOOK /home/ldp/SGML/dtds/docbook/db31/docbook.dtd -- EOF --
Comment. Comments start with “--” and follow to the end of the line.
The public type association
Comment signifying the end of the file.
As in the example above, to associate an identifier to a file just follow the sequence shown:
Copy the identifier PUBLIC
Type the identifying text
Indicate the path to the associated file
The most common mappings to be used in catalogs are:
public identifiers for identifiers on the system.
SYSTEM keyword maps
system identifiers for files on the system.
SYSTEM "http://nexus.conectiva/utilidades/publicacoes/livros.dtd" "publicacoes/livros.dtd"
SGMLDECL designates the
system identifier of the SGML statement that should be used.
Similar to the
DTDDECL identifies the SGML statement
that should be used.
DTDDECL makes the
association of the statement with a public identifier to a
DTD. Unfortunately, this association isn't
supported by the open source tools available. The benefits of this
statement can be achieved somehow with multiple catalog files.
DTDDECL "-//Conectiva SA//DTD livros V1.0//EN" "publicacoes/livros.dcl"
CATALOG allows a catalog
to be included inside another. This is a way to make use of several
different catalogs without the need to alter them.
OVERRIDE informs whether an
identifier has priority over a system identifier.
The standard on most systems is that the system identifier
has priority over the public one.
DELEGATE allows the
association of a catalog to a specific type of public identifier.
DELEGATE is very similar to the
CATALOG, except for the fact that it doesn't do
anything until a specific pattern is specified.
If a document starts with a type of document, but
has no public identifier and no system identifier the clause
DOCTYPE associates this document
with a specific DTD.
The following sample catalog was provided by Martin A. Brown.
Example B.3. Sample XML Catalog file
<?xml version="1.0"?> <!DOCTYPE catalog PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<public publicId="-//OASIS//DTD DocBook XML V4.2//EN" uri="/home/mabrown/docbook/dtds/4.2/docbookx.dtd"/>
<uri name="http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" uri="/home/mabrown/docbook/dtds/4.2/docbookx.dtd"/>
<uri name="http://docbook.sourceforge.net/release/xsl/current/xhtml/docbook.xsl" uri="/home/mabrown/docbook/xsl/xhtml/docbook.xsl"/>
<uri name="http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl" uri="/home/mabrown/docbook/xsl/xhtml/chunk.xsl"/>
<uri name="http://docbook.sourceforge.net/release/xsl/current/xhtml/profile-chunk.xsl" uri="/home/mabrown/docbook/xsl/xhtml/profile-chunk.xsl"/>
You can use nsgmls, which is part of the jade suite (on Debian apt-get the docbook-utils package, see Section 4.2, “The docbook-utils Package”), to validate SGML or XML documents.
If there are no issues, you'll just get your command
prompt back. The
nsgmls to show only the errors.
If you get errors about a function not being found, or
something about an ISO character not having an
authoritative source, you may
need to point nsgmls to your
xml.dcl file. For Red Hat 9, it
will look like this:
For more information on processing files with Jade/OpenJade please read DocBook XML/SGML Processing Using OpenJade.
This is an alternative to nsgmls. It ships
with the OpenJade package. This program gives more options than nsgmls
and allows you to quietly ignore a number of problems that arise while
trying to validate an XML file (as opposed to an SGML file). This also
means you don't have to type out the location of your
xml.dcl file each time.
I was able to simply use the following to validate a file with only error messages that were related to my markup errors.
According to Bob Stayton you can also turn off specific error messages. The following example turns off XML-specific error messages.
You can also use the xmllint command-line tool from the libxml2 package to validate your documents. This tool does a simple check on completeness of tags and whether all tags that are opened, are also closed again. By default xmllint will output a results tree. So if your document comes out until the last line, you know there are no heavy errors having to do with tag mismatches, opening and closing errors and the like.
To prevent printing the entire document to your screen, add the
If nothing is returned, your document contains no syntax errors. Else, start with the first error that was reported. Fix that one error, and run the tool again on your document. If it still returns output, again fix the first error that you see, don't botter with the rest since further errors are usually generated because of the first one.
If you would like to check your document for any errors which are
specific to your Document Type Definition, add
The xmllint tool may also be used for checking errors in the XML catalogs, see the man pages for more info on how to set this behavior.
If you are a Mac OSX or Windows user, you may also want to check out tkxmllint, a GUI version of xmllint. More information is available from: http://tclxml.sourceforge.net/tkxmllint.html.
Example B.4. Debugging example using xmllint
The example below shows how you can use xmllint to check your documents. I've created some errors that I made a lot, as a beginning XML writer. At first, the document doesn't come through, and errors are shown:
ldp-history.xmlldp-history.xml:22: error: Opening and ending tag mismatch: articlinfo line 6 and articleinfo </articleinfo> ^ ldp-history.xml:37: error: Opening and ending tag mismatch: listitem line 36 and orderedlist </orderedlist> ^ ldp-history.xml:39: error: Opening and ending tag mismatch: orderedlist line 34 and sect2 </sect2> ^ ldp-history.xml:46: error: Opening and ending tag mismatch: sect1 line 41 and para for many authors to contribute their part in their area of specialization.</para ^ ldp-history.xml:57: error: Opening and ending tag mismatch: para line 55 and sect1 </sect1> ^ ldp-history.xml:59: error: Opening and ending tag mismatch: sect2 line 31 and article </article> ^ ldp-history.xml:61: error: Premature end of data in tag sect1 line 24 ^ ldp-history.xml:61: error: Premature end of data in tag article line 5 ^
Now, as we already mentioned, don't worry about anything except the first error. The first error says there is an inconsistency between the tags on line 6 and line 22 in the file. Indeed, on line 6 we left out the “e” in “articleinfo”. Fix the error, and run xmllint again. The first complaint now is about the offending line 37, where the closing tag for list items has been forgotten. Fix the error and run the validation tool again, until all errors are gone. Most common errors include forgetting to open or close the paragraph tag, spelling errors in tags and messed up sections.