[4suite-dev] checkin - URIs URIs URIs

Mike Brown mike at skew.org
Tue Dec 17 17:09:44 MST 2002


What follows is a discussion between Uche and me on IRC the other night,
preserved here for posterity. I reformatted and edited it a bit for
readability. The indented text is Uche and the rest is me.

> Help me understand your reasoning for making all file: access point
> to the repo. There was nothing in doscussion in the list that
> suggested this is something that should be done. Jeremy suggested it,
> but both MikeO and I said we found it confusing. So unless I'm
> missing something, you should not have checked that in.

It sounded to me like Jeremy and MikeO agreed that the security concerns 
trumped the usability concerns, in general (Jeremy clarified his position to 
me off-list). The idea is not to get rid of anything but just to have a 
consistent basis that we can poke holes in where we need to.

However, the real reason that change was needed was due to FtssInputSource 
breaking the InputSource API by blurring the distinction between URIs and repo 
paths, and relying on the broken normalization and resolution of the 
InputSource it extends and falls back upon.

The FtssInputSource was incompatible with proper file: URIs or handled them 
unpredictably (depending on various factors like how it was created, what was 
in the repo, and the behavior of urllib, etc.). Absolute, properly-formatted 
file URIs are much more ubiquitous now that the normalization, resolution, and 
path conversion stuff in Uri.py has been tightened up. I could not tighten up 
BaseUriResolver without also tightening up FtssInputSource.

If you really want to go back to what prompted this endeavour, it was a post 
on xsl-list from Ken Holman re: an XSLT compliance issue, involving the 
generated ID of the stylesheet root node. When I tested w/4xslt and 
investigated the odd results I was getting, it became apparent that file: URIs 
and URI resolution in general was still problematic, and that it would be 
difficult to engineer workarounds if we continued to rely on the looseness of 
the facilities in Uri.py, which were based partly on the looseness of/bugs in 
urllib and partly on what I believe was a misguided hope that repo paths and 
various types of URI could all be handled generically.

I'd like to succinctly enumerate the problems with FtssInputSource and URIs, 
but it's difficult to condense, in part because the problems are not discrete.

> I think you're still mixing up the two issues.
>
> 1) Should file: URIs mean local file system or repo path?
> 2) *Independent of 1*, should users have access to the file system?
>
> Mike and Jeremy did agree on (2), but Mike agreed with me on (1),
> i.e. that file: should mean local file system, even if it is
> inaccessible to the user. Read
> http://lists.fourthought.com/pipermail/4suite-dev/2002-December/000902.html
> carefully to see what I mean.
>
> I have no problem with a consistent basis; I just don't want the
> basis that you happen to have chosen.

The thing is, the meaning of a scheme changes when you create an 
FtssInputSource. For the lifespan of the FtssInputSource, the scheme+host from 
the wrapped doc are used as the basis for determining whether a URI reference 
encountered within the doc is supposed to be in the repo.

If you create an FtssInputSource with a file: URI, as will happen when relying 
on the tightened-up resolver, you are making file URIs always mean repo 
paths... notwithstanding the dubious fallback behavior if there's nothing in 
the repo at that path, which was that the normal resolver would be invoked, 
and you'd get access to the filesystem.

Normally we don't create an FtssInputSource with a file: URI (with scheme).. 
but it can clone itself when a URI ref is encountered, and the URI it uses for 
the new FtssInputSource is one obtained via the generic normalizer.. which now 
isn't so loose.

> We can always curtail the fallback behavior we do not want. That's
> easy enough that I don't think it affects the meaning of "file:".
>
> Don't get me wrong, I agree with you that the play between the
> various resolvers is a mess. And I do appreciate the work you're
> doing to straighten things out. I just think there are several ways
> in which we can clean it up.

OK.. well I think I made it clear that I didn't see my changes as being the 
whole story. Much like exception handling, it's an iterative process, 
sometimes two steps forward, one step back.

> I think the crux of the problem is that we need a *proper* URI for
> repo paths. I think we're in agreement here. I just don't think it
> should be file:///. I did suggest file://host/, but MikeO thought
> that confusing.
>
> Should we brainstorm on possible ways to spell that? Maybe we can
> find one people don't find confusing.
>
> I just thought of: file://localhost:8803/
> since 8803 is the FtRpc port. Maybe still confusing, except that I
> don't know why Mike finds it confusing.
>
> We could also invent our own scheme, ftss:///, but this is not
> kosher. Though since we won't expect in-repo URIs to be portable, it
> may not matter.
>
> We could also use URN space: urn:ftss:///
> which is marginally more kosher, but the problem is a repo path is
> really a locator, not a name... aliases and all that.

Yes, as I was suggesting, completely separate URI schemes are my preference... 
One that always means local filesystem, one that always means repo (and that's 
what I meant in my email by completely unambiguous, not overloading the 
meaning of 'file:'). But this would not really alleviate the issue with the 
repo needing to treat same-scheme-and-host as another repo reference, thus 
making it possible to overload either scheme anyway.

And yeah, I don't like the URN space because then you need a custom 
normalizer.

> If we settle on a scheme, then I would ditch the current
> same-scheme-and-host behavior.

> The deadlock issue MikeO pointed out will not apply if we have a
> specific scheme for repo paths, I think. Or rather, we would be able
> to see and avoid it. The deadlock breaker would be that we normalize
> all HTTP URIs to a ftss:/// or whatever URI, and if they compared the
> same, we would not seek a new lock.

Why are you singling out HTTP?

> Because that was the deadlock issue
> http://lists.fourthought.com/pipermail/4suite-dev/2002-December/000889.html
>
> The problem is that the system does not recognize that
> http://localhost/b is just aliased for b.

Oh I would think that it would occur no matter what server was being used... 
many ways into the repo = many different URI schemes ... possibly all mapping 
to the same repo path. It seems like there needs to be an intermediate step.

> Yes.  But my point is that if we had a canonical scheme, and thus
> mapping from each other scheme to the canaonical one, we could detect
> deadlock and break it.

Yes that's what I mean. OK, we agree.

> The problem is that now we have no way of detecting deadlock. That's
> why I'm cooking up all the ftss:///, urn:ftss:/// and all that to be
> the canonical scheme. My *only* problem with what you did is that you
> chose file:/// to be the canonical scheme. So yes, I think we agree
> :-)

Just to be clear, it wasn't really that I relish file:/// being canon, it was 
more just that I needed to plug the holes and make the FtssInputSource work 
consistently with the improved/more strict normalization and resolution, while 
still retaining its loose API (where it's often initially constructed with 
repo paths where URIs should be).

> OK. We're on the same page now. I'm just eager to find a permanent
> replacement for the file: workaround  :-)

* mjb imagines ftss://user:pass@host:ftrpcport/

> I too am leaning heavily towards ftss://user:pass@host:ftrpcport/.
> Yes it is strictly bogus, but unless we are willing to register a
> protocol, we have a choice between bogosity and file:.
>
> I think all the options of using file: are too confusing and/or
> limiting. Ergo, I'd rather be somewhat bogus with the ftss: scheme.
>
> As I said, I don't forsee any interop problems, because that scheme
> would never be used outside repo context.

Why is it bogus?

> Well, I might be wrong. I thought you can't just invent your own
> top-level scheme.

...

Research into scheme name selection followed, leading me to post
http://lists.w3.org/Archives/Public/uri/2002Dec/0006.html. So far, we got a
response from Dan Connolly, but no definitive advice as to choosing a good
scheme name. It seems the process of giving the IANA authority over
vendor-specific scheme names has stalled indefinitely, so we can probably
squat on whatever we want as long as we publish an Internet Draft about it for
the IETF. If they pick it up again, they'd probably want us to use vnd.ft.ftss
or vnd.fourthought.ftss rather than just ftss. So do we play nice or use the
short name?


Mike

-- 
  Mike J. Brown   |  http://skew.org/~mike/resume/
  Denver, CO, USA |  http://skew.org/xml/



More information about the 4suite-dev mailing list