[4suite-dev] Substantial (unit-tested) refactoring of Server,
Server/Drivers, Server/SCore
Chimezie Ogbuji
ogbujic at bio.ri.ccf.org
Tue Mar 21 14:59:47 MST 2006
I recently submitted a substantial patch to the Ft.Server source tree that
essentially refactors the repository driver (as well as SCore) to use
precompiled XPath evaluation against the metadata documents associated
with each resource instead of RDF queries (which do not scale well). The
motivation was mostly to allow the repo to scale in a production
environment where it was simply not responsive on large scales the way it
was currently setup (very heavily dependent on RDF). In particular,
statements with the predicates below (which were previously being
generated for *every* resource) are no longer being generated:
- Schema.CREATION_DATE
- Schema.MODIFIED_DATE
- Schema.IMT
- Schema.CONTENT_SIZE
- Schema.OWNER
Also RDF metadata associated with Server resources are also no longer
reflected into RDF:
- Schema.SERVER_NAME
- Schema.SERVER_HANDLER
- Schema.SERVER_RUNNING
So, API's that used to rely on RDF queries to fetch system-level metadata
now use precompiled XPath evaluations (setup in
Ft.Server.Server.Drivers.Constants) against the metadata document DOM.
The speed increase is very evident, and the main reason for this is
because FtssDriver already heavily caches these metadata DOM's, so they
are already available for *fast* XPath evaluations. The space saved (and
query performance at large volumes) is also evident when you consider the
amount of statements (per resource) that are no longer reflected
(redundantly) into RDF.
Also the Schema.Type predicate was deprecated in favor of rdf:type, and
finally (unavoidable) system RDF queries now use the scope =
Schema.SYSTEM_SOURCE_URI constraint to further optimize these very
low-level (and very frequently dispatched queries).
Other changes:
- Ft.Server.Server.Drivers.Constants has been updated to include
precompiled XPath expressions for extracting metadata
- The signatures for the low level APIS (ResourceManager's xupdateContent,
MetadataManager's xupdateMetaData/setMetaData/updateMetaData) have been
updated to include a keyword parameter to prevent the attempt to recreate
system RDF statements associated with the resource content (not the
metadata). This is useful for resources that *don't* have any statements
associated with their content (so this step becomes redundant)
I plan on posting follow-up patches to
Ft.Server.Common.Install.InstallUtil and the test suites (which were
updated in order to get this refactoring effort through the repository
unit tests). The latter was relying on an upfront fetch of all the IMT,
Owners, and DocDefs (via an inefficient RDF query) to prepopulate a cache
used by the install command.
The files effected by this patch:
Index: Server/Controller.py
Index: Server/Drivers/Constants.py
Index: Server/Drivers/FtssDriver.py
Index: Server/Drivers/MetadataManager.py
Index: Server/Drivers/ResourceManager.py
Index: Server/Drivers/Util.py
Index: Server/SCore/AliasImp.py
Index: Server/SCore/ContainerImp.py
Index: Server/SCore/DocumentDefinitionImp.py
Index: Server/SCore/GroupImp.py
Index: Server/SCore/RawFileImp.py
Index: Server/SCore/RepositoryImp.py
Index: Server/SCore/ResourceMetaDataImp.py
Index: Server/SCore/UriReferenceFileImp.py
Index: Server/SCore/XmlDocumentImp.py
Index: Server/SCore/XsltDocumentImp.py
Chimezie Thomas-Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic at ccf.org
More information about the 4suite-dev
mailing list