3.4 DTD-Handling
The DTD (Document Type Definition) is a separate entity in sgml2pl, that can be created, freed, defined and inspected. Like the parser itself, it is filled by opening it as a Prolog output stream and sending data to it. This section summarises the predicates for handling the DTD.
- new_dtd(+DocType, -DTD)
- Creates an empty DTD for the named DocType. The returned DTD-reference is an opaque term that can be used in the other predicates of this package.
- free_dtd(+DTD)
- Deallocate all resources associated to the DTD. Further use of DTD is invalid.
- load_dtd(+DTD, +File)
- Define the DTD by loading the SGML-DTD file File. Same as load_dtd/3 with empty option list.
- load_dtd(+DTD, +File, +Options)
- Define the DTD by loading File. Defined options are the
dialectoption from open_dtd/3 and theencodingoption from open/4. Notably thedialectoption must match the dialect used for subsequent parsing using this DTD. - open_dtd(+DTD, +Options, -OutStream)
- Open a DTD as an output stream. See load_dtd/2
for an example. Defined options are:
- dialect(Dialect)
- Define the DTD dialect. Default is
sgml. Usingxmlorxmlnsprocesses the DTD case-sensitive.
- dtd(+DocType, -DTD)
- Find the DTD representing the indicated doctype. This predicate
uses a cache of DTD objects. If a doctype has no associated dtd, it
searches for a file using the file search path
dtdusing the call:..., absolute_file_name(dtd(Type), [ extensions([dtd]), access(read) ], DtdFile), ...Note that DTD objects may be modified while processing errornous documents. For example, loading an SGML document starting with
<?xml ...?>switches the DTD to XML mode and encountering unknown elements adds these elements to the DTD object. Re-using a DTD object to parse multiple documents should be restricted to situations where the documents processed are known to be error-free.The DTD
htmlis handled separately. The Prolog flaghtml_dialectspecifies the default html dialect, which is eitherhtml4orhtml5(default).3Note that HTML5 has no DTD. The loaded DTD is an informal DTD that includes most of the HTML5 extensions (http://www.cs.tut.fi/~jkorpela/html5-dtd.html). In addition, the parser sets thedialectflag of the DTD object. This is used by the parser to accept HTML extensions. Next, the corresponding DTD is loaded. - dtd_property(+DTD, ?Property)
- This predicate is used to examine the content of a DTD. Property is one
of:
- doctype(DocType)
- An atom representing the document-type defined by this DTD.
- elements(ListOfElements)
- A list of atoms representing the names of the elements in this DTD.
- element(Name, Omit, Content)
- The DTD contains an element with the given name. Omit is a
term of the format
omit(OmitOpen, OmitClose), where both arguments are booleans (trueorfalserepresenting whether the open- or close-tag may be omitted. Content is the content-model of the element represented as a Prolog term. This term takes the following form:- empty
- The element has no content.
- cdata
- The element contains non-parsed character data. All data up to the matching end-tag is included in the data (declared content).
- rcdata
- As
cdata, but entity-references are expanded. - any
- The element may contain any number of any element from the DTD in any order.
- #pcdata
- The element contains parsed character data .
- element(A)
- n element with this name.
*(SubModel)- 0 or more appearances.
?(SubModel)- 0 or one appearance.
+(SubModel)- 1 or more appearances.
,(SubModel1, SubModel2)- SubModel1 followed by SubModel2.
- &(SubModel1, SubModel2)
- SubModel1 and SubModel2 in any order.
(SubModel1, SubModel2)|- SubModel1 or SubModel2.
- attributes(Element, ListOfAttributes)
- ListOfAttributes is a list of atoms representing the attributes of the element Element.
- attribute(Element, Attribute, Type, Default)
- Query an element. Type is one of
cdata,entity,id,idref,name,nmtoken,notation,numberornutoken. For DTD types that allow for a list, the notationlist(Type)is used. Finally, the DTD construct(a|b|...)is mapped to the termnameof(ListOfValues).Default describes the sgml default. It is one
required,current,conreforimplied. If a real default is present, it is one ofdefault(Value)orfixed(Value). - entities(ListOfEntities)
- ListOfEntities is a list of atoms representing the names of the defined entities.
- entity(Name, Value)
- Name is the name of an entity with given value. Value is one
of
- Atom
- If the value is atomic, it represents the literal value of the entity.
- system(Url)
- Url is the URL of the system external entity.
- public(Id, Url)
- For external public entities, Id is the identifier. If an URL is provided this is returned in Url. Otherwise this argument is unbound.
- notations(ListOfNotations)
- Returns a list holding the names of all
NOTATIONdeclarations. - notation(Name, Decl)
- Unify Decl with a list if
system(+File)and/orpublic(+PublicId).
3.4.1 The DOCTYPE declaration
As this parser allows for processing partial documents and process the DTD separately, the DOCTYPE declaration plays a special role.
If a document has no DOCTYPE declaraction, the parser returns a list holding all elements and CDATA found. If the document has a DOCTYPE declaraction, the parser will open the element defined in the DOCTYPE as soon as the first real data is encountered.