Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several issues with EXPath File spec #131

Open
abelbraaksma opened this issue Feb 21, 2019 · 4 comments
Open

Several issues with EXPath File spec #131

abelbraaksma opened this issue Feb 21, 2019 · 4 comments

Comments

@abelbraaksma
Copy link

I'm not 100% sure whether I should report it here or on the W3 mailing list for EXPath, if the latter is preferred, I can continue on the mailing list instead.

I found a few issues with the File Module (http://expath.org/spec/file) by glancing over it. Hopefully there's still room to address them (I may find more if I look deeper at the spec).

  1. The functions file:is-file and file:is-dir: the description does not say whether it returns true only if there is an actual file (it uses the phrase "points to", but this is not specified and open for interpretation). If no file needs to physically exist, it is impossible to implement, because, for instance, file:is-file('c:\dir\test') can be a file named test or a dir named test. If it isn't already created, then it is not possible to determine. The one exception is perhaps if there's a trailing slash or backslash. The same problem exists for XPath's fn:resolve-uri, which essentially removes the file part if and only if it ends in a trailing slash, but a URI has no notion of directory vs file.

  2. Several functions define an action that returns empty-sequence(). This may lead to problems for implementations, as in XPath, anything that returns an empty sequence can be optimized away. There's currently no section that explains how to deal with this special-casing of functions. Perhaps this could be solved by having them return a boolean instead, with false on fail. Since it will also raise, this won't really change the behavior, and it potentially adds a feature on top where processors can allow an implementation defined setting to always return and not raise, for a try/catch free programming style. Alternatively, let them just return the input, or return the filename they are writing to. If such change is not an option, then I suggest we devote a section on explaining the expectations for implementors.

  3. Some functions seem under-specified, for instance, file:create-temp-dir has arguments $prefix and $suffix, but these are not explained in the text. And from the names I don't know what they are supposed to do. Same is true for file:create-temp-file

  4. A small chapter on encodings would be nice. Something like: implementers should support the same encodings as the hosting language (XQuery, XSLT), or just a minimum like in XPath: UTF8/16/ASCII.

  5. It would be nice, if still possible, to make the 'current working directory' (CWD) dependent on the static base URI, or the current base URI (i.e. from xml:base) from XPath's DC or SC. Also, I'd suggest we add a line saying that CWD can be absent (i.e., if a processor runs in a protected env, or is created from a memory stream), in which most functions with relative paths would fail, while still allowing absolute paths. A processor can, of course, provide an option to set the CWD.

I apologize if these comments come too late in the process, I'm realizing you are close to publishing the Module to Final stage.

@michaelhkay
Copy link
Member

michaelhkay commented Feb 21, 2019 via email

@abelbraaksma
Copy link
Author

The EXPath file spec has been stable for some years and there are a number of implementations in use in production applications. So clarifications of the text are certainly possible, but don't expect any changes in function signatures or radical changes in semantics. You may find that there are test cases in the test suite that help to resolve some of your questions.

Thanks, I'll have a look.

I suspect the author chose the phrase "points to" to make it clear that the function returns true only if the name is the name of an actually existing file or directory; the function is not simply looking at the syntax of the name and asking whether it takes the syntactic form of a file name or a directory name.

Clear, thanks.

The file module defines functions with side-effects. Implementations that don't have a mechanism to cope with this should probably not be implementing the module.

Just realizing that in XSLT, xsl:result-document also "returns" an empty sequence. I'll have to check how this would work with the optimizers, but as you point out, they have side effects and somehow we just need a mechanism that prevents these functions to be optimized away.

I think the spec authors were wise to avoid getting bogged down in this quagmire, and just leaving it to implementors to do the best they could.

"quagmire", made me smile ;). But yes, short of defining something like a monadic execution context in the spec, this is probably best.

Some functions seem under-specified, for instance, file:create-temp-dir has arguments $prefix and $suffix, but these are not explained in the text. Same is true for file:create-temp-file

There's certainly room for clearer descriptions. You may find that the test suite aids understanding. For the Saxon implementation we've added our own documentation, see for example http://www.saxonica.com/documentation/index.html#!functions/expath-file/create-temp-dir which at least shows how we have read the spec; you might find that useful. (I don't claim that our implementation is definitive, though we do pass all the tests.)

Thanks, just looked up your description, that helps. Though I would like to propose these as candidates for fixing in the File Module, even if it is late in the game. Would be nice if the spec, at a minimum, describes the expected input to these functions.

A small chapter on encodings would be nice.

Would implementors take any notice of such a statement?

Probably not ;)

The notion of current directory is certainly problematic, especially on Windows, which doesn't really have this concept. But your implementation could align the concept with static base URI if that's what you want to do. (Actually, static base URI is a pretty horrible concept itself; it's the current directory at run-time that we're interested in, not the current directory at compile time).

Thanks. I agree that static base URI might be a bad choice, it was the first that came to mind. Windows, btw, does have Environment.CurrentDirectory which is per-process and comes pretty close.

I apologize if these comments come too late in the process, I'm realizing you are close to publishing the Module to Final stage.

I think the spec has been "close to final" for several years and is likely to remain so for several more, unless you want to volunteer to move it forwards... Michael Kay Saxonica

I could certainly try to come up with a few lines for the missing bits of information in the mentioned functions. I think those are relatively unobtrusive editorial changes, and I'd hope you agree, these ought to be in the spec, instead of coming from interpretation of the test sets.

@ChristianGruen
Copy link
Member

Thanks for the discussion. I’ll give some feedback when I’ll be online again (in appr. a week).

@ChristianGruen
Copy link
Member

Hi Abel,

First, thanks to Michael for giving a comprehensive reply; if I see this correctly, most of your questions have already been answered.

As Michael stated, the specification has been in a final stage for a few years now; it’s mostly the last editorial steps that haven’t been tackled so far. If you want to contribute missing information, your input is welcome, and I’ll be happy to update the public document (with some help by @fgeorges). The current version of the spec files can be found here; you can either send pull requests or pass us on some raw lines of text.

Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants