[4suite-dev] Substantial (unit-tested) refactoring of Server, Server/Drivers, Server/SCore

Chimezie Ogbuji ogbujic at bio.ri.ccf.org
Tue Mar 21 14:59:47 MST 2006


I recently submitted a substantial patch to the Ft.Server source tree that 
essentially refactors the repository driver (as well as SCore) to use 
precompiled XPath evaluation against the metadata documents associated 
with each resource instead of RDF queries (which do not scale well).  The 
motivation was mostly to allow the repo to scale in a production 
environment where it was simply not responsive on large scales the way it 
was currently setup (very heavily dependent on RDF).  In particular, 
statements with the predicates below (which were previously being 
generated for *every* resource) are no longer being generated:

- Schema.CREATION_DATE
- Schema.MODIFIED_DATE
- Schema.IMT
- Schema.CONTENT_SIZE
- Schema.OWNER

Also RDF metadata associated with Server resources are also no longer 
reflected into RDF:

- Schema.SERVER_NAME
- Schema.SERVER_HANDLER
- Schema.SERVER_RUNNING

So, API's that used to rely on RDF queries to fetch system-level metadata 
now use precompiled XPath evaluations (setup in 
Ft.Server.Server.Drivers.Constants) against the metadata document DOM. 
The speed increase is very evident, and the main reason for this is 
because FtssDriver already heavily caches these metadata DOM's, so they 
are already available for *fast* XPath evaluations.  The space saved (and 
query performance at large volumes) is also evident when you consider the 
amount of statements (per resource) that are no longer reflected 
(redundantly) into RDF.

Also the Schema.Type predicate was deprecated in favor of rdf:type, and 
finally (unavoidable) system RDF queries now use the scope = 
Schema.SYSTEM_SOURCE_URI constraint to further optimize these very 
low-level (and very frequently dispatched queries).

Other changes:

- Ft.Server.Server.Drivers.Constants has been updated to include 
precompiled XPath expressions for extracting metadata
- The signatures for the low level APIS (ResourceManager's xupdateContent, 
MetadataManager's xupdateMetaData/setMetaData/updateMetaData) have been 
updated to include a keyword parameter to prevent the attempt to recreate 
system RDF statements associated with the resource content (not the 
metadata).  This is useful for resources that *don't* have any statements 
associated with their content (so this step becomes redundant)


I plan on posting follow-up patches to 
Ft.Server.Common.Install.InstallUtil and the test suites (which were 
updated in order to get this refactoring effort through the repository 
unit tests).  The latter was relying on an upfront fetch of all the IMT, 
Owners, and DocDefs (via an inefficient RDF query) to prepopulate a cache 
used by the install command.

The files effected by this patch:

Index: Server/Controller.py
Index: Server/Drivers/Constants.py
Index: Server/Drivers/FtssDriver.py
Index: Server/Drivers/MetadataManager.py
Index: Server/Drivers/ResourceManager.py
Index: Server/Drivers/Util.py
Index: Server/SCore/AliasImp.py
Index: Server/SCore/ContainerImp.py
Index: Server/SCore/DocumentDefinitionImp.py
Index: Server/SCore/GroupImp.py
Index: Server/SCore/RawFileImp.py
Index: Server/SCore/RepositoryImp.py
Index: Server/SCore/ResourceMetaDataImp.py
Index: Server/SCore/UriReferenceFileImp.py
Index: Server/SCore/XmlDocumentImp.py
Index: Server/SCore/XsltDocumentImp.py


Chimezie Thomas-Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic at ccf.org



More information about the 4suite-dev mailing list