[Versa] re: Deserializing Document Definition Statements to the
User Model
Chimezie Ogbuji
chimezie at gmail.com
Sun Oct 30 09:19:21 MST 2005
> Thats my thought. We've known for quite a while that the single
> statement table persistence in 4RDF would not handle really large
> models. Regardless of splitting the models. Its rumored that Oracle
> is using their spatial cartridge to make a native RDF store which could
> be very cool...
I just wanted to update this old but important thread on the state of
the art in RDF persistence. Oracles approach ended up not being that
revolutionary (see:
http://download.oracle.com/otndocs/tech/semantic_web/pdf/rdfrm.pdf)
Their schema isn't context-aware (i.e. it stores RDF triples all in a
single Model) right now - will be in the next iteration perhaps and
the only major advantage is that literals are interened, a native
querying language that is very much like SPARQL and can be used
interchangeably with vanilla SQL, and stored procedures for managing
RDF (adding, removing triples). Otherwise, their approach isn't that
much removed from how most RDBMS / SQL databases are implementing RDF.
> Another thing I found that did help (with large models) is splitting
> the statement table (this would be at the 4RDF level). Have one called
> "resource_statements" and one called "literal_statements". Its a
> bummer for some forms of complete, but for others (ones I've found to
> be more common) it is a huge speed increase. Some of the "bummer"
> forms of complete can go away if there is a little bit of typing in the
> model (or atleast the queries). ie complete(resource,rdf:type,None)
> can be smart enough to only look in the resource statement table.
This is almost exactly how I implemented the rdflib MySQL backend
(http://svn.rdflib.net/trunk/rdflib/backends/MySQL.py). I have two
tables one for non rdf:type statements and the other or rdf:type
statements. The driver has to monitor the predicate in queries in
order to determine which table to select from. I changed the
fschema:type predicate to rdf:type and initialized the repository
using this experimental driver and afterwards noticed quite a speed
increase.
I imagine I could have taken it a step further and broke out a
seperate table for statements where the object is a literal and
statements where the object is a resource, but that would make the
logic for dispatching SQL queries even more complex.
Chimezie
More information about the Versa
mailing list