2 Bluffer's Guide
This package allows you to parse SGML, XML and HTML data into a
Prolog data structure. The high-level interface defined in library(sgml)
provides access at the file-level, while the low-level interface defined
in the foreign module works with Prolog streams. Please use the source
of sgml.pl
as a starting point for dealing with data from
other sources than files, such as SWI-Prolog resources, network-sockets,
character strings, etc. The first example below loads an HTML
file.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title>Demo</title> </head> <body> <h1 align=center>This is a demo</title> Paragraphs in HTML need not be closed. This is called `omitted-tag' handling. </body> </html>
?- load_html('test.html', Term, []), pretty_print(Term). [ element(html, [], [ element(head, [], [ element(title, [], [ 'Demo' ]) ]), element(body, [], [ '\n', element(h1, [ align = center ], [ 'This is a demo' ]), '\n\n', element(p, [], [ 'Paragraphs in HTML need not be closed.\n' ]), element(p, [], [ 'This is called `omitted-tag\' handling.' ]) ]) ]) ].
The document is represented as a list, each element being an atom to
represent CDATA
or a term element(Name, Attributes,
Content)
. Entities (e.g. <
) are expanded and
included in the atom representing the element content or attribute
value.1Up to SWI-Prolog 5.4.x,
Prolog could not represent wide characters and entities that
did not fit in the Prolog characters set were emitted as a term number(+Code)
.
With the introduction of wide characters in the 5.5 branch this is no
longer needed.
2.1 ‘Goodies' Predicates
These predicates are for basic use of the library, converting entire and self-contained files in SGML, HTML, or XML into a structured term. They are based on load_structure/3.
- load_sgml(+Source, -ListOfContent, :Options)
- Calls load_structure/3
with the given Options, using the default option
dialect(sgml)
- load_xml(+Source, -ListOfContent, :Options)
- Calls load_structure/3
with the given Options, using the default option
dialect(xml)
- load_html(+Source, -ListOfContent, :Options)
- Calls load_structure/3
with the given Options, using the default options
dialect(HTMLDialect)
, where HTMLDialect ishtml4
orhtml5
(default), depending on the Prolog flaghtml_dialect
. Both imply the optionshorttag(false)
. The optiondtd(DTD)
is passed, where DTD is the HTML DTD as obtained usingdtd(html, DTD)
. See dtd/2.