- Documentation
- Reference manual
- Packages
- SWI-Prolog Semantic Web Library 3.0
- Plugin modules for rdf_db
- Hooks into the RDF library
- library(semweb/rdf_zlib_plugin): Reading compressed RDF
- library(semweb/rdf_http_plugin): Reading RDF from a HTTP server
- library(semweb/rdf_cache): Cache RDF triples
- library(semweb/rdf_litindex): Indexing words in literals
- library(semweb/rdf_persistency): Providing persistent storage
- Plugin modules for rdf_db
- SWI-Prolog Semantic Web Library 3.0
4.6 library(semweb/rdf_persistency): Providing persistent storage
The library(semweb/rdf_persistency)
provides reliable persistent storage for the RDF data. The store uses a
directory with files for each source (see rdf_source/1)
present in the database. Each source is represented by two files, one in
binary format (see rdf_save_db/2)
representing the base state and one represented as Prolog terms
representing the changes made since the base state. The latter is called
the journal.
- rdf_attach_db(+Directory, +Options)
- Attach Directory as the persistent database. If Directory
does not exist it is created. Otherwise all sources defined in the
directory are loaded into the RDF database. Loading a source means
loading the base state (if any) and replaying the journal (if any). The
current implementation does not synchronise triples that are in the
store before attaching a database. They are not removed from the
database, nor added to the presistent store. Different merging options
may be supported through the Options argument later.
Currently defined options are:
- concurrency(+PosInt)
- Number of threads used to reload databased and journals from the files
in Directory. Default is the number of physical CPUs
determined by the Prolog flag
cpu_count
or 1 (one) on systems where this number is unknown. See also concurrent/3. - max_open_journals(+PosInt)
- The library maintains a pool of open journal files. This option specifies the size of this pool. The default is 10. Raising the option can make sense if many writes occur on many different named graphs. The value can be lowered for scenarios where write operations are very infrequent.
- silent(Boolean)
- If
true
, supress loading messages from rdf_attach_db/2. - log_nested_transactions(Boolean)
- If
true
, nested log transactions are added to the journal information. By default (false
), no log-term is added for nested transactions.
The database is locked against concurrent access using a file
lock
in Directory. An attempt to attach to a locked database raises apermission_error
exception. The error context contains a termrdf_locked(Args)
, where args is a list containingtime(Stamp)
andpid(PID)
. The error can be caught by the application. Otherwise it prints:ERROR: No permission to lock rdf_db `/home/jan/src/pl/packages/semweb/DB' ERROR: locked at Wed Jun 27 15:37:35 2007 by process id 1748
- rdf_detach_db
- Detaches the persistent store. No triples are removed from the RDF triple store.
- rdf_current_db(-Directory)
- Unify Directory with the current database directory. Fails if no persistent database is attached.
- rdf_persistency(+DB, +Bool)
- Change presistency of named database (4th argument of rdf/4).
By default all databases are presistent. Using
false
, the journal and snapshot for the database are deleted and further changes to triples associated with DB are not recorded. If Bool istrue
a snapshot is created for the current state and further modifications are monitored. Switching persistency does not affect the triples in the in-memory RDF database. - rdf_flush_journals(+Options)
- Flush dirty journals. With the option
min_size(KB)
only journals larger than KB Kbytes are merged with the base state. Flushing a journal takes the following steps, ensuring a stable state can be recovered at any moment.- Save the current database in a new file using the extension
.new
. - On success, delete the journal
- On success, atomically move the
.new
file over the base state.
Note that journals are not merged automatically for two reasons. First of all, some applications may decide never to merge as the journal contains a complete changelog of the database. Second, merging large databases can be slow and the application may wish to schedule such actions at quiet times or scheduled maintenance periods.
- Save the current database in a new file using the extension
4.6.1 Enriching the journals
The above predicates suffice for most applications. The predicates in
this section provide access to the journal files and the base state
files and are intented to provide additional services, such as reasoning
about the journals, loaded files, etc.3A
library library(rdf_history)
is under development
exploiting these features supporting wiki style editing of RDF.
Using rdf_transaction(Goal, log(Message))
, we can add
additional records to enrich the journal of affected databases with Term
and some additional bookkeeping information. Such a transaction adds a
term
begin(Id, Nest, Time, Message)
before the change operations
on each affected database and end(Id, Nest, Affected)
after
the change operations. Here is an example call and content of the
journal file mydb.jrn
. A full explanation of the terms that
appear in the journal is in the description of rdf_journal_file/2.
?- rdf_transaction(rdf_assert(s,p,o,mydb), log(by(jan))).
start([time(1183540570)]). begin(1, 0, 1183540570.36, by(jan)). assert(s, p, o). end(1, 0, []). end([time(1183540578)]).
Using rdf_transaction(Goal, log(Message, DB))
, where DB
is an atom denoting a (possibly empty) named graph, the system
guarantees that a non-empty transaction will leave a possibly empty
transaction record in DB. This feature assumes named graphs are named
after the user making the changes. If a user action does not affect the
user's graph, such as deleting a triple from another graph, we still
find record of all actions performed by some user in the journal of that
user.
- rdf_journal_file(?DB, ?JournalFile)
- True if
File is the absolute file name of an existing named graph
DB. A journal file contains a sequence of Prolog terms of the
following format.4Future versions
of this library may use an XML based language neutral format.
- start(Attributes)
- Journal has been opened. Currently Attributes contains a term
time(Stamp)
. - end(Attributes)
- Journal was closed. Currently Attributes contains a term
time(Stamp)
. - assert(Subject, Predicate, Object)
- A triple {Subject, Predicate, Object} was added to the database.
- assert(Subject, Predicate, Object, Line)
- A triple {Subject, Predicate, Object} was added to the database with given Line context.
- retract(Subject, Predicate, Object)
- A triple {Subject, Predicate, Object} was deleted from the database. Note that an rdf_retractall/3 call can retract multiple triples. Each of them have a record in the journal. This allows for‘undo'.
- retract(Subject, Predicate, Object, Line)
- Same as above, for a triple with associated line info.
- update(Subject, Predicate, Object, Action)
- See rdf_update/4.
- begin(Id, Nest, Time, Message)
- Added before the changes in each database affected by a transaction with
transaction identifier
log(Message)
. Id is an integer counting the logged transactions to this database. Numbers are increasing and designed for binary search within the journal file. Nest is the nesting level, where‘0' is a toplevel transaction. Time is a time-stamp, currently using float notation with two fractional digits. Message is the term provided by the user as argument of thelog(Message)
transaction. - end(Id, Nest, Others)
- Added after the changes in each database affected by a transaction with
transaction identifier
log(Message)
. Id and Nest match the begin-term. Others gives a list of other databases affected by this transaction and the Id of these records. The terms in this list have the format DB:Id.
- rdf_db_to_file(?DB, ?FileBase)
- Convert between DB (see rdf_source/1)
and file base-file used for storing information on this database. The
full file is located in the directory described by rdf_current_db/1
and has the extension
.trp
for the base state and.jrn
for the journal.