[4suite-dev] URIs in the repo, revisited

Mike Brown mike at skew.org
Sun Dec 8 14:55:36 MST 2002


Last night, I was working on URI stuff.. trying to make things work more 
correctly. e.g. making the BaseUriResolver use that absolutize() thing I 
checked in, which does (I think) 100% correct normalization. Things were going 
OK until it came to the repo. In particular, FtssInputSource.

What's happening is ... The default normalization routine for URIs now (after 
my changes) always returns an absolute (scheme-having) URI. So if you feed it 
'/some/schemeless/base' and '../path/to/foo.xml' you get 
'file:///some/path/to/foo.xml' whereas you used to get '/some/path/to/foo.xml' 
which was generally repo-safe.

In general, if I'm not mistaken, any FtssInputSource that we initially create 
will have a relative (schemeless) URI like '/repo/path/to/file' or an absolute 
URI that's a URN like 'urn:uuid:...'. I really don't like the former (repo 
paths masquerading as URIs) because they break principles like the definition 
of relative URIs as well as the interface of the InputSource base class.

When the entity that the FtssInputSource wraps contains a URI reference that 
needs to be resolved (xsl:include/import, document(), xi:include, external 
entities, etc.), the FtssInputSource's _openStream() method is called, and a 
new input source of some type is created for that other resource.

FtssInputSource's _openStream() tests to see if the scheme and host parts of 
the URI ref are the same as that of the original URI. If they are the same, 
then it is assumed to be a reference to something in the repo.

Otherwise, it is assumed to be something outside of the repo, so it falls back 
to the regular InputSource's _openStream() to try.

If it was assumed to be a reference to something in the repo, then the URI is 
converted to a repo path (PathImp) (although this seems like it would break on 
absolute (scheme-having) URIs), and a fetchResource() attempt is made. If the 
fetchResource() fails, then, if I am interpreting correctly, the regular 
InputSource is given a crack at it.

Something about that last part doesn't seem cool. The regular InputSource 
would have access to the local filesystem. And indeed, I just loaded an XSLT 
doc on my server that can read /etc/passwd via an xinclude with 
href="file:///etc/passwd" parse="text".

So anyway... the conundrum is that a schemeless URI, in repo land, means a 
repo resource... if URI resolution is cleaned up as I'm trying to do, such 
that it's impossible to use a schemeless URI as a base (e.g., if we really do 
assume 'file:' like we say we do), then, effectively, 'file:' URIs in the repo 
will be referring to repo resource 100% of the time. Currently, they ('file' 
URI refs) refer to a repo resource only if the base (the URI of the 
FtssInputSource) was also a 'file' URI with the same host part, if any.

So one matter that I need clarification on is whether 'file' URIs are ever 
supposed to be referencing the local filesystem when you're in the repo. If 
they aren't, then we need to close that hole.

If they are referencing the local filesystem, then I think we need to revisit 
the idea of using a repo-specific URI scheme so that we can differentiate 
between the two and also not have this weird fallback behaviour of "if it's 
not in the repo, check the local filesystem". I think the way we have it now, 
it's almost as if there is an implicit repo-specific URI scheme that's not 
'file'. We would also need to figure out how to handle the situation when the 
FtssInputSource is created with a 'file' URI (because then refs with the same 
scheme & host would, by the current rules, also be considered repo resources).

The other option I was thinking of was to just override the FtssInputSource's 
URI normalization such that it continues to behave in the old way. The thing 
is, I don't think this idea will really work because there are too many 
chances for it to fall back on strict normalization when it decides to use the 
regular InputSource.


Mike

-- 
  Mike J. Brown   |  http://skew.org/~mike/resume/
  Denver, CO, USA |  http://skew.org/xml/



More information about the 4suite-dev mailing list