-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several issues with EXPath File spec #131
Comments
The EXPath file spec has been stable for some years and there are a number of implementations in use in production applications. So clarifications of the text are certainly possible, but don't expect any changes in function signatures or radical changes in semantics.
You may find that there are test cases in the test suite that help to resolve some of your questions.
On 21 Feb 2019, at 21:09, Abel Braaksma ***@***.***> wrote:
I'm not 100% sure whether I should report it here or on the W3 mailing list for EXPath, if the latter is preferred, I can continue on the mailing list instead.
I found a few issues with the File Module (http://expath.org/spec/file <http://expath.org/spec/file>) by glancing over it. Hopefully there's still room to address them (I may find more if I look deeper at the spec).
The functions file:is-file and file:is-dir: the description does not say whether it returns true only if there is an actual file (it uses the phrase "points to", but this is not specified and open for interpretation). If no file needs to physically exist, it is impossible to implement, because, for instance, file:is-file('c:\dir\test') can be a file named test or a dir named test. If it isn't already created, then it is not possible to determine. The one exception is perhaps if there's a trailing slash or backslash. The same problem exists for XPath's fn:resolve-uri, which essentially removes the file part if and only if it ends in a trailing slash, but a URI has no notion of directory vs file.
I suspect the author chose the phrase "points to" to make it clear that the function returns true only if the name is the name of an actually existing file or directory; the function is not simply looking at the syntax of the name and asking whether it takes the syntactic form of a file name or a directory name.
Several functions define an action that returns empty-sequence(). This may lead to problems for implementations, as in XPath, anything that returns an empty sequence can be optimized away. There's currently no section that explains how to deal with this special-casing of functions. Perhaps this could be solved by having them return a boolean instead, with false on fail. Since it will also raise, this won't really change the behavior, and it potentially adds a feature on top where processors can allow an implementation defined setting to always return and not raise, for a try/catch free programming style. Alternatively, let them just return the input, or return the filename they are writing to. If such change is not an option, then I suggest we devote a section on explaining the expectations for implementors.
The file module defines functions with side-effects. Implementations that don't have a mechanism to cope with this should probably not be implementing the module. I think the spec authors were wise to avoid getting bogged down in this quagmire, and just leaving it to implementors to do the best they could. Defining a boolean as the result type wouldn't solve the problem that the functions have side effects and calls need to occur in a particular order of execution.
Some functions seem under-specified, for instance, file:create-temp-dir has arguments $prefix and $suffix, but these are not explained in the text. And from the names I don't know what they are supposed to do. Same is true for file:create-temp-file
There's certainly room for clearer descriptions. You may find that the test suite aids understanding. For the Saxon implementation we've added our own documentation, see for example http://www.saxonica.com/documentation/index.html#!functions/expath-file/create-temp-dir which at least shows how we have read the spec; you might find that useful. (I don't claim that our implementation is definitive, though we do pass all the tests.)
A small chapter on encodings would be nice. Something like: implementers should support the same encodings as the hosting language (XQuery, XSLT), or just a minimum like in XPath: UTF8/16/ASCII.
Would implementors take any notice of such a statement?
It would be nice, if still possible, to make the 'current working directory' (CWD) dependent on the static base URI, or the current base URI (i.e. from xml:base) from XPath's DC or SC. Also, I'd suggest we add a line saying that CWD can be absent (i.e., if a processor runs in a protected env, or is created from a memory stream), in which most functions with relative paths would fail, while still allowing absolute paths. A processor can, of course, provide an option to set the CWD.
The notion of current directory is certainly problematic, especially on Windows, which doesn't really have this concept. But your implementation could align the concept with static base URI if that's what you want to do. (Actually, static base URI is a pretty horrible concept itself; it's the current directory at run-time that we're interested in, not the current directory at compile time).
I apologize if these comments come too late in the process, I'm realizing you are close to publishing the Module to Final stage.
I think the spec has been "close to final" for several years and is likely to remain so for several more, unless you want to volunteer to move it forwards...
Michael Kay
Saxonica
… —
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#131>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACSIIurXqB99DqiC_aeTwt6TD-E5uKGVks5vPwsOgaJpZM4bIVMn>.
|
Thanks, I'll have a look.
Clear, thanks.
Just realizing that in XSLT,
"quagmire", made me smile ;). But yes, short of defining something like a monadic execution context in the spec, this is probably best.
Thanks, just looked up your description, that helps. Though I would like to propose these as candidates for fixing in the File Module, even if it is late in the game. Would be nice if the spec, at a minimum, describes the expected input to these functions.
Probably not ;)
Thanks. I agree that static base URI might be a bad choice, it was the first that came to mind. Windows, btw, does have
I could certainly try to come up with a few lines for the missing bits of information in the mentioned functions. I think those are relatively unobtrusive editorial changes, and I'd hope you agree, these ought to be in the spec, instead of coming from interpretation of the test sets. |
Thanks for the discussion. I’ll give some feedback when I’ll be online again (in appr. a week). |
Hi Abel, First, thanks to Michael for giving a comprehensive reply; if I see this correctly, most of your questions have already been answered. As Michael stated, the specification has been in a final stage for a few years now; it’s mostly the last editorial steps that haven’t been tackled so far. If you want to contribute missing information, your input is welcome, and I’ll be happy to update the public document (with some help by @fgeorges). The current version of the spec files can be found here; you can either send pull requests or pass us on some raw lines of text. Christian |
I'm not 100% sure whether I should report it here or on the W3 mailing list for EXPath, if the latter is preferred, I can continue on the mailing list instead.
I found a few issues with the File Module (http://expath.org/spec/file) by glancing over it. Hopefully there's still room to address them (I may find more if I look deeper at the spec).
The functions
file:is-file
andfile:is-dir
: the description does not say whether it returns true only if there is an actual file (it uses the phrase "points to", but this is not specified and open for interpretation). If no file needs to physically exist, it is impossible to implement, because, for instance,file:is-file('c:\dir\test')
can be a file namedtest
or a dir namedtest
. If it isn't already created, then it is not possible to determine. The one exception is perhaps if there's a trailing slash or backslash. The same problem exists for XPath'sfn:resolve-uri
, which essentially removes the file part if and only if it ends in a trailing slash, but a URI has no notion of directory vs file.Several functions define an action that returns
empty-sequence()
. This may lead to problems for implementations, as in XPath, anything that returns an empty sequence can be optimized away. There's currently no section that explains how to deal with this special-casing of functions. Perhaps this could be solved by having them return a boolean instead, withfalse
on fail. Since it will also raise, this won't really change the behavior, and it potentially adds a feature on top where processors can allow an implementation defined setting to always return and not raise, for a try/catch free programming style. Alternatively, let them just return the input, or return the filename they are writing to. If such change is not an option, then I suggest we devote a section on explaining the expectations for implementors.Some functions seem under-specified, for instance,
file:create-temp-dir
has arguments$prefix
and$suffix
, but these are not explained in the text. And from the names I don't know what they are supposed to do. Same is true forfile:create-temp-file
A small chapter on encodings would be nice. Something like: implementers should support the same encodings as the hosting language (XQuery, XSLT), or just a minimum like in XPath: UTF8/16/ASCII.
It would be nice, if still possible, to make the 'current working directory' (CWD) dependent on the static base URI, or the current base URI (i.e. from
xml:base
) from XPath's DC or SC. Also, I'd suggest we add a line saying that CWD can be absent (i.e., if a processor runs in a protected env, or is created from a memory stream), in which most functions with relative paths would fail, while still allowing absolute paths. A processor can, of course, provide an option to set the CWD.I apologize if these comments come too late in the process, I'm realizing you are close to publishing the Module to Final stage.
The text was updated successfully, but these errors were encountered: