SWI-Prolog -- Manual

Documentation
- Reference manual
- Packages
  - SWI-Prolog Semantic Web Library 3.0
    - Plugin modules for rdf_db

4.6 library(semweb/rdf_persistency): Providing persistent storage

The library(semweb/rdf_persistency) provides reliable persistent storage for the RDF data. The store uses a directory with files for each source (see rdf_source/1) present in the database. Each source is represented by two files, one in binary format (see rdf_save_db/2) representing the base state and one represented as Prolog terms representing the changes made since the base state. The latter is called the journal.

rdf_attach_db(+Directory, +Options)

Attach Directory as the persistent database. If Directory does not exist it is created. Otherwise all sources defined in the directory are loaded into the RDF database. Loading a source means loading the base state (if any) and replaying the journal (if any). The current implementation does not synchronise triples that are in the store before attaching a database. They are not removed from the database, nor added to the presistent store. Different merging options may be supported through the Options argument later. Currently defined options are:

concurrency(+PosInt): Number of threads used to reload databased and journals from the files in Directory. Default is the number of physical CPUs determined by the Prolog flag cpu_count or 1 (one) on systems where this number is unknown. See also concurrent/3.
max_open_journals(+PosInt): The library maintains a pool of open journal files. This option specifies the size of this pool. The default is 10. Raising the option can make sense if many writes occur on many different named graphs. The value can be lowered for scenarios where write operations are very infrequent.
silent(Boolean): If true, supress loading messages from rdf_attach_db/2.
log_nested_transactions(Boolean): If true, nested log transactions are added to the journal information. By default (false), no log-term is added for nested transactions.

The database is locked against concurrent access using a file lock in Directory. An attempt to attach to a locked database raises a permission_error exception. The error context contains a term rdf_locked(Args), where args is a list containing time(Stamp) and pid(PID). The error can be caught by the application. Otherwise it prints:

ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB'
ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748

rdf_detach_db

Detaches the persistent store. No triples are removed from the RDF triple store.

rdf_current_db(-Directory)

Unify Directory with the current database directory. Fails if no persistent database is attached.

rdf_persistency(+DB, +Bool)

Change presistency of named database (4th argument of rdf/4). By default all databases are presistent. Using false, the journal and snapshot for the database are deleted and further changes to triples associated with DB are not recorded. If Bool is true a snapshot is created for the current state and further modifications are monitored. Switching persistency does not affect the triples in the in-memory RDF database.

rdf_flush_journals(+Options)

Flush dirty journals. With the option min_size(KB) only journals larger than KB Kbytes are merged with the base state. Flushing a journal takes the following steps, ensuring a stable state can be recovered at any moment.

Save the current database in a new file using the extension .new.
On success, delete the journal
On success, atomically move the .new file over the base state.

Note that journals are not merged automatically for two reasons. First of all, some applications may decide never to merge as the journal contains a complete changelog of the database. Second, merging large databases can be slow and the application may wish to schedule such actions at quiet times or scheduled maintenance periods.

4.6.1 Enriching the journals

The above predicates suffice for most applications. The predicates in this section provide access to the journal files and the base state files and are intented to provide additional services, such as reasoning about the journals, loaded files, etc.^{3A
library library(rdf_history) is under development
exploiting these features supporting wiki style editing of RDF.}

Using rdf_transaction(Goal, log(Message)), we can add additional records to enrich the journal of affected databases with Term and some additional bookkeeping information. Such a transaction adds a term begin(Id, Nest, Time, Message) before the change operations on each affected database and end(Id, Nest, Affected) after the change operations. Here is an example call and content of the journal file mydb.jrn. A full explanation of the terms that appear in the journal is in the description of rdf_journal_file/2.

?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).

start([time(1183540570)]).
begin(1, 0, 1183540570.36, by(jan)).
assert(s, p, o).
end(1, 0, []).
end([time(1183540578)]).

Using rdf_transaction(Goal, log(Message, DB)), where DB is an atom denoting a (possibly empty) named graph, the system guarantees that a non-empty transaction will leave a possibly empty transaction record in DB. This feature assumes named graphs are named after the user making the changes. If a user action does not affect the user's graph, such as deleting a triple from another graph, we still find record of all actions performed by some user in the journal of that user.

rdf_journal_file(?DB, ?JournalFile)

True if File is the absolute file name of an existing named graph DB. A journal file contains a sequence of Prolog terms of the following format.^{4Future versions
of this library may use an XML based language neutral format.}

start(Attributes): Journal has been opened. Currently Attributes contains a term time(Stamp).
end(Attributes): Journal was closed. Currently Attributes contains a term time(Stamp).
assert(Subject, Predicate, Object): A triple {Subject, Predicate, Object} was added to the database.
assert(Subject, Predicate, Object, Line): A triple {Subject, Predicate, Object} was added to the database with given Line context.
retract(Subject, Predicate, Object): A triple {Subject, Predicate, Object} was deleted from the database. Note that an rdf_retractall/3 call can retract multiple triples. Each of them have a record in the journal. This allows for‘undo'.
retract(Subject, Predicate, Object, Line): Same as above, for a triple with associated line info.
update(Subject, Predicate, Object, Action): See rdf_update/4.
begin(Id, Nest, Time, Message): Added before the changes in each database affected by a transaction with transaction identifier log(Message). Id is an integer counting the logged transactions to this database. Numbers are increasing and designed for binary search within the journal file. Nest is the nesting level, where‘0' is a toplevel transaction. Time is a time-stamp, currently using float notation with two fractional digits. Message is the term provided by the user as argument of the log(Message) transaction.
end(Id, Nest, Others): Added after the changes in each database affected by a transaction with transaction identifier log(Message). Id and Nest match the begin-term. Others gives a list of other databases affected by this transaction and the Id of these records. The terms in this list have the format DB:Id.

rdf_db_to_file(?DB, ?FileBase)

Convert between DB (see rdf_source/1) and file base-file used for storing information on this database. The full file is located in the directory described by rdf_current_db/1 and has the extension .trp for the base state and .jrn for the journal.