Samizdat RDF Storage
====================

Mapping RDF Model to SQL Schema
-------------------------------

Resources

Core representation of Samizdat content is RDF. Canonic uriref for any
Samizdat resource is http://<site-url>/<resource-id>, where <site-url>
is a base URL of the site and <resource-id> is unique (within site)
numeric id of the resource.

   Comment: RDF requires that uriref is mapped one-to-one to the
   resource, but it does not mandate that resource is accessible by its
   uriref; in Samizdat, resource uriref points to the location where
   resource was first published, and not necessarily to where it can be
   located at any time in future.

Basis of SQL representation of RDF resources is "Resource" table with
"id" primary key field holding <resource-id>, and "label" field
representing resource label. Semantics of label value depend on resource
type and relation to the site.

Literal value (including typed literals) is stored directly in the
"label" field and marked with "literal" boolean field. External resource
label is the resource uriref marked with "uriref" boolean field.
Internal resource is mapped into the table with name corresponding to
resource class name stored in the "label" field, "id" field referencing
back to the "Resource" table, and other fields holding values of
internal properties for this resource class, represented as literals or
references to other resources stored in "Resource" table.

Additional information about internal resources and all information
about non-internal resources is extracted from statements about
properties of the resource, as described below.

Properties

RDF statements about properties of resources are represented in SQL
either as field values in the record of the table corresponding to the
resource class, with id of the record referencing to the <resource-id>
record in the "Resource" table, or as {subject, predicate, object}
triples in the "Statement" table. Since every such triple is treated as
a reified statement in RDF semantics, i.e. resource of its own, it also
is assigned a <resource-id> and a record in the "Resource" table.

To determine what information about internal resource can be stored in
and extracted from class-specific table, Samizdat storage should consult
site-specific mapping of RDF property names into SQL table and field
names.

Default mapping of internal properties is located in the "Map" section
in the "config.yaml" file. This mapping should be kept consistent with
database schema ("database/create.sql"), and should be backed up with
constraints and triggers that maintain references from internal resource
tables to "Resource" table, as well as relations between resources as
defined by RDF schema described in Concepts document.


RDF Query Language
------------------

RDF graph of all resources and properties stored by the site (site
knowledge base) can be queried using an RDF query language. Samizdat
storage uses Squish as a basis for its query language and extends it
with facilities for adding information into site KB, and with some
SQL-level constructs for enhanced access to internal resources stored in
relational tables.

Synopsys

    SELECT ?node [, ...]
    WHERE (predicate subject object [ FILTER condition ]) [...]
    [ OPTIONAL (predicate subject object [ FILTER condition ]) [...] ]
    [ LITERAL condition ]
    [ ORDER BY expression ]
    [ USING prefix FOR namespace [...] ]

Must-bind List

SELECT section of a query contains comma-separated list of one or more
blank nodes that must be grounded to resources or literals in the query
answer.

Query Pattern

Main part of a Squish query is the query pattern, recorded as a sequence
of "(predicate subject object)" triples in WHERE section. Query pattern
defines an RDF graph that should be matched against site KB for query
resolution.

Query pattern must include at least one mandatory triple. Query pattern
may include optional triples following the OPTIONAL keyword: if optional
part of the pattern does not match, it will create no bindings but will
not eliminate the solution. All blank nodes from the must-bind list
must be defined in the mandatory part of the pattern.

Each triple in the pattern may be followed by FILTER condition that will
be imposed on the triple during query resolution (see also "SQL-level
Constructs" below).

Following Squish syntax, blank node names are denoted by question mark
prefix, and namespace prefixes defined in USING section are expanded to
produce full resource urirefs from shortcut <prefix>::<name> form.

Query Result

In response to an RDF query, Samizdat storage should return sequence or
stream of SQL tuples representing all possible query answers. Each query
answer is such binding of all must-bind blank nodes to literals and
resources which, when applied to the query pattern, produces an answer
pattern entailed by the site KB.

SQL-level Constructs

Samizdat extends Squish query syntax with two optional sections
containing constructs in SQL syntax: conditional expression in LITERAL
section may be used to place arbitrary constraints on values of query
pattern blank nodes; ORDER BY section may be used to sort query answers
in ascending or descending order by value of any blank node or by
expression derived from blank node values.

In current implementation, both sections are directly executed at the
database level and rely on underlying database capabilities for type
conversion and operations on literals; in literal context, resources are
represented by integer "id" reference to the Resource table.

Example

    SELECT ?msg, ?title, ?name, ?date, ?rating
    WHERE (dc::title ?msg ?title)
          (dc::creator ?msg ?creator)
          (s::fullName ?creator ?name)
          (dc::date ?msg ?date)
          (rdf::subject ?stmt ?msg)
          (rdf::predicate ?stmt dc::relation)
          (rdf::object ?stmt focus::Quality)
          (s::rating ?stmt ?rating)
    LITERAL ?rating >= -1
    ORDER BY ?rating DESC
    USING rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
          dc FOR http://purl.org/dc/elements/1.1/
          s FOR http://www.nongnu.org/samizdat/rdf/schema#
          focus FOR http://www.nongnu.org/samizdat/rdf/focus#


RDF Data Manipulation Language
------------------------------

Unlike SQL, Squish operates on whole site KB and can span across
multiple relations, thus, Samizdat Squish data manipulation (assertion)
statement can include both INSERT and UPDATE sections, and can insert
several resources and update property values of existing and newly
created resources, all in one query.

Synopsys

    [ INSERT node [, ...] ]
    [ UPDATE node = value [, ...] ]
    WHERE (predicate subject object [ FILTER condition ]) [...]
    [ USING prefix FOR namespace [...] ]

Inserted Nodes List

INSERT section contains comma-separated list of blank nodes in assertion
that may not be ground to resources already present in site KB, and
should be added to the site KB instead.

Assignment Section

UPDATE section contains comma-separated list of assignments of new
values to blank nodes in assertion.

Assertion

Assertion is recorded as a sequence of "(predicate subject object)"
triples in WHERE section, same as in Squish query. All nodes in the
INSERT section, and the logical difference between assertion and site KB
(the part of assertion that is not entailed by site KB) is inserted into
site KB; expanded site KB is updated according to UPDATE section for all
possible bindings of blank nodes in assertion.

Examples

    UPDATE ?rating = 2
    WHERE (rdf::subject ?stmt base::5)
          (rdf::predicate ?stmt dc::relation)
          (rdf::object ?stmt focus::Quality)
          (s::voteProposition ?vote ?stmt)
          (s::voteMember ?vote base::1)
          (s::voteRating ?vote ?rating)
    USING rdf FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#
          dc FOR http://purl.org/dc/elements/1.1/
          s FOR http://www.nongnu.org/samizdat/rdf/schema#
          focus FOR http://www.nongnu.org/samizdat/rdf/focus#
          base FOR http://localhost/samizdat/

    UPDATE ?email = 'angdraug@debian.org', ?name = 'Dmitry Borodaenko'
    WHERE (s::email base::1 ?email)
          (s::fullName base::1 ?name)
    USING s FOR http://www.nongnu.org/samizdat/rdf/schema#
          base FOR http://localhost/samizdat/

    INSERT ?msg
    UPDATE ?title = 'Test Message', ?content = 'Some text.'
    WHERE (dc::creator ?msg base::1)
          (dc::title ?msg ?title)
          (s::content ?msg ?content)
          (s::thread ?msg ?msg)
    USING dc FOR http://purl.org/dc/elements/1.1/
          s FOR http://www.nongnu.org/samizdat/rdf/schema#
          base FOR http://localhost/samizdat/

