[4suite-dev] URIs in the repo, revisited
Mike Brown
mike at skew.org
Sun Dec 8 14:55:36 MST 2002
Last night, I was working on URI stuff.. trying to make things work more
correctly. e.g. making the BaseUriResolver use that absolutize() thing I
checked in, which does (I think) 100% correct normalization. Things were going
OK until it came to the repo. In particular, FtssInputSource.
What's happening is ... The default normalization routine for URIs now (after
my changes) always returns an absolute (scheme-having) URI. So if you feed it
'/some/schemeless/base' and '../path/to/foo.xml' you get
'file:///some/path/to/foo.xml' whereas you used to get '/some/path/to/foo.xml'
which was generally repo-safe.
In general, if I'm not mistaken, any FtssInputSource that we initially create
will have a relative (schemeless) URI like '/repo/path/to/file' or an absolute
URI that's a URN like 'urn:uuid:...'. I really don't like the former (repo
paths masquerading as URIs) because they break principles like the definition
of relative URIs as well as the interface of the InputSource base class.
When the entity that the FtssInputSource wraps contains a URI reference that
needs to be resolved (xsl:include/import, document(), xi:include, external
entities, etc.), the FtssInputSource's _openStream() method is called, and a
new input source of some type is created for that other resource.
FtssInputSource's _openStream() tests to see if the scheme and host parts of
the URI ref are the same as that of the original URI. If they are the same,
then it is assumed to be a reference to something in the repo.
Otherwise, it is assumed to be something outside of the repo, so it falls back
to the regular InputSource's _openStream() to try.
If it was assumed to be a reference to something in the repo, then the URI is
converted to a repo path (PathImp) (although this seems like it would break on
absolute (scheme-having) URIs), and a fetchResource() attempt is made. If the
fetchResource() fails, then, if I am interpreting correctly, the regular
InputSource is given a crack at it.
Something about that last part doesn't seem cool. The regular InputSource
would have access to the local filesystem. And indeed, I just loaded an XSLT
doc on my server that can read /etc/passwd via an xinclude with
href="file:///etc/passwd" parse="text".
So anyway... the conundrum is that a schemeless URI, in repo land, means a
repo resource... if URI resolution is cleaned up as I'm trying to do, such
that it's impossible to use a schemeless URI as a base (e.g., if we really do
assume 'file:' like we say we do), then, effectively, 'file:' URIs in the repo
will be referring to repo resource 100% of the time. Currently, they ('file'
URI refs) refer to a repo resource only if the base (the URI of the
FtssInputSource) was also a 'file' URI with the same host part, if any.
So one matter that I need clarification on is whether 'file' URIs are ever
supposed to be referencing the local filesystem when you're in the repo. If
they aren't, then we need to close that hole.
If they are referencing the local filesystem, then I think we need to revisit
the idea of using a repo-specific URI scheme so that we can differentiate
between the two and also not have this weird fallback behaviour of "if it's
not in the repo, check the local filesystem". I think the way we have it now,
it's almost as if there is an implicit repo-specific URI scheme that's not
'file'. We would also need to figure out how to handle the situation when the
FtssInputSource is created with a 'file' URI (because then refs with the same
scheme & host would, by the current rules, also be considered repo resources).
The other option I was thinking of was to just override the FtssInputSource's
URI normalization such that it continues to behave in the old way. The thing
is, I don't think this idea will really work because there are too many
chances for it to fall back on strict normalization when it decides to use the
regular InputSource.
Mike
--
Mike J. Brown | http://skew.org/~mike/resume/
Denver, CO, USA | http://skew.org/xml/
More information about the 4suite-dev
mailing list