- Reference manual
- SWI-Prolog SGML/XML parser
- Bluffer's Guide
- Predicate Reference
- Stream encoding issues
- library(xpath): Select nodes in an XML DOM
- Processing Indexed Files
- External entities
- library(pwp): Prolog Well-formed Pages
- Writing markup
- Unsupported SGML features
- SWI-Prolog SGML/XML parser
library(sgml_write) provides the inverse of
the parser, converting the parser's output back into a file. This
process is fairly simple for XML, but due to the power of the SGML DTD
it is much harder to achieve a reasonable generic result for SGML.
These predicates can write the output in two encoding schemas depending on the encoding of the Stream. In UTF-8 mode, all characters are encoded using UTF-8 sequences. In ISO Latin-1 mode, characters outside the ISO Latin-1 range are represented using a named character entity if provided by the DTD or a numeric character entity.
- xml_write(+Stream, +Term, +Options)
- Write the XML header with encoding information and the content of the
document as represented by Term to Stream. This
predicate deals with XML with or without namespaces. If namespace
identifiers are not provided they are generated. This predicate defines
the following Options
- Specify the DTD. In SGML documents the DTD is required to distinguish between elements that are declared empty in the DTD and elements that just happen to have no content. Further optimisation (shortref, omitted tags, etc.) could be considered in the future. The DTD is also used to find the declared named character entities.
- Document type to include in the header. When omitted it is taken from the outer element.
- If Bool is
false, the XML header is suppressed. Useful for embedding in other XML streams.
- Do/do not emit layout characters to make the output readable, Default is
to emit layout. With layout enabled, elements only containing other
elements are written using increasing indentation. This introduces
(depending on the mode and defined whitespace handling) CDATA sequences
with only layout between elements when read back in. If
false, no layout characters are added. As this mode does not need to analyse the document it is faster and guarantees correct output when read back. Unfortunately the output is hardly human readable and causes problems with many editors.
- Set the initial element indentation. It more than zero, the indent is written before the document.
- Set the initial namespace map. Map is a list of
Name = URI. This option, together with
identis added to use xml_write/3 to generate XML that is embedded in a larger XML document.
- Use/do not use Null End Tags. For XML, this applies only to
empty elements, so you get
net(false)). For SGML, this applies to empty elements, so you get
<foo>(if foo is declared to be
EMPTYin the DTD),
net(true)). In SGML code, short character content not containing
can be emitted as
- sgml_write(+Stream, +Term, +Options)
- Write the SGML
DOCTYPEheader and the content of the document as represented by Term to Stream. The Options are described with xml_write/3.
- html_write(+Stream, +Term, +Options)
- Same as sgml_write/3, but passes the HTML DTD as obtained from dtd/2. The Options are described with xml_write/3.
In most cases, the preferred way to create an XML document is to
create a Prolog tree of
element(Name, Attributes, Content)
terms and call xml_write/3
to write this to a stream. There are some exceptions where one might not
want to pay the price of the intermediate representation. For these
cases, this library contains building blocks for emitting markup data.
The quote funtions return a version of the input text into one that
contains entities for characters that need to be escaped. These are the
XML meta characters and the characters that cannot be expressed by the
document encoding. Therefore these predicates accept an encoding
argument. Accepted values are
Versions with two arguments are provided for backward compatibility,
making the safe
ascii encoding assumption.
- xml_quote_attribute(+In, -Quoted, +Encoding)
- Map the characters that may not appear in XML attributes to entities.
Currently these are
<>&".4Older versions also mapped
'. Characters that cannot represented in Encoding are mapped to XML character entities.
- xml_quote_attribute(+In, -Quoted)
- Backward compatibility version for xml_quote_attribute/3.
- xml_quote_cdata(+In, -Quoted, +Encoding)
- Very similar to xml_quote_attribute/3, but does not quote the single- and double-quotes.
- xml_quote_cdata(+In, -Quoted)
- Backward compatibility version for xml_quote_cdata/3.
- xml_name(+In, +Encoding)
- Succeed if In is an atom or string that satisfies the rules for a valid XML element or attribute name. As with the other predicates in this group, if Encoding cannot represent one of the characters, this function fails. Character classification is based on http://www.w3.org/TR/2006/REC-xml-20060816.
- Backward compatibility version for xml_name/2.
The predicates in this section translate between values and their lexical forms for XML-Schema data types. They are implementated in C to achieve the best possible performance.
- [det]xsd_number_string(?Number, ?String)
- This predicate is similar to number_string/2,
but accepts floating point numbers according to the XML syntax rather
than the Prolog syntax. In particular, XML does not require a `0' (zero)
before and after the decimal dot and accepts the constants
INF. If a Prolog float is converted into a string it returns the XML canonical form. This form always has one digit before the decimal dot, at least one digit after it and an exponential component using the capital
E. This predicate behaves as number_string/2 for integers.
syntax_error(xsd_number)if String is given and is not a well-formed XSD number.
- [det]xsd_time_string(?DateTime, ?Type, ?String)
- Serialize and deserialize the XSD date and time formats. The converion
is represented by the table below.
Prolog term Type XSD string date(Y,M,D) xsd:date YYYY-MM-DD date_time(Y,M,D,H,Mi,S) xsd:dateTime YYYY-MM-DDTHH-MM-SS date_time(Y,M,D,H,Mi,S,0) xsd:dateTime YYYY-MM-DDTHH-MM-SSZ date_time(Y,M,D,H,Mi,S,TZ) xsd:dateTime YYYY-MM-DDTHH-MM-SS[+-]HH:MM time(H,M,S) xsd:time HH:MM:SS year_month(Y,M) xsd:gYearMonth YYYY-MM month_day(M,D) xsd:gMonthDay MM-DD D xsd:gDay DD M xsd:gMonth MM Y xsd:gYear YYYY
For the Prolog term all variables denote integers except for S, which represents seconds as either an integer or float. The TZ argument is the offset from UTC in seconds. The Type is written as xsd:name, but is in fact the full URI of the XSD data type, e.g.,
http://www.w3.org/2001/XMLSchema#date. In the XSD string notation, the letters YMDHS denote digits. The notation SS is either a two-digit integer or a decimal number with two digits before the floating point, e.g.
05.3to denote 5.3 seconds.
For most conversions, Type may be specified unbound and is unified with the resulting type. For ambiguous conversions, Type must be specified or an instantiation_error is raised. When converting from Prolog to XSD serialization, D, M and Y are ambiguous. When convertion from XSD serialization to Prolog, only DD and MM are ambiguous. If Type and String are both given and String is a valid XSD date/time representation but not matching Type a syntax error with the shape
syntax_error(Type)is raised. If DateTime and Type are both given and DateTime does not satisfy Type a domain_error of the shape
domain_error(xsd_time(Type), DateTime)is raised.
The domain of numerical values is verified and a corresponding domain_error exception is raised if the domain is violated. There is no test for the existence of a date and thus
"2016-02-31", although non-existing is accepted as valid.
C14n2 specifies a canonical XML document. This library writes such a document from an XML DOM as returned by the XML (or SGML) parser. The process takes two steps:
- Normalise the DOM
- Call xml_write/3 with appropriate flags
- [det]xml_write_canonical(+Stream, +DOM, +Options)
- Write an XML DOM using the canonical conventions as defined
by C14n2. Namespace declarations in the canonical document depend on the
original namespace declarations. For this reason the input document must
be parsed (see load_structure/3)
using the dialect
xmlnsand the option