Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an f-specification argument in ltcmd #1525

Draft
wants to merge 4 commits into
base: develop
Choose a base branch
from
Draft

Conversation

josephwright
Copy link
Member

See latex3/latex3#591.

Internal housekeeping

Status of pull request

  • Feedback wanted
  • Under development
  • Ready to merge

Checklist of required changes before merge will be approved

  • Test file(s) added
  • Version and date string updated in changed source files
  • Relevant \changes entries in source included
  • Relevant changes.txt updated
  • Rollback provided (if necessary)?
  • ltnewsX.tex (and/or latexchanges.tex) updated

@josephwright josephwright marked this pull request as draft November 1, 2024 12:56
@josephwright
Copy link
Member Author

Currently this is a draft: there are some clear open questions to answer before it is ready to merge

  • Do we want to cover chars 0-32 as filecontents does?
  • Do we want a more complex approach to a preceding optional argument. For example, fancyvrb sets the catcode of ^^M prior to the \@ifnextchar to ensure it is tokenizes in a recognisable way
  • Do we want to auto-apply ! to any preceding optional arg
  • Is the letter f the best choice? I can see an argument for c ('collected'), x (charcode of b + v), w (v-squaredv), ...

base/doc/usrguide.tex Outdated Show resolved Hide resolved
@Skillmon
Copy link
Contributor

Skillmon commented Nov 2, 2024

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

@Skillmon
Copy link
Contributor

Skillmon commented Nov 2, 2024

Is it possible in the architecture of ltcmd to change the behaviour of an argument grabber based on the next argument type? If so the optional argument should be searched with the full category code regime of f in place, and if it's found the regime should be reverted, the optional argument collected, the verbatim catcode regime reactivated and the body collected. And an optional argument should have otherwise the same observable rules as it would have with a b body (so should be found even with a new line between \begin{foo} and the optional argument).

Cf. the handling of the optional argument in enverb (which as the only strange point also finds the opening bracket of the optional argument after multiple blank lines -- maybe I should revise this :)).

@Skillmon
Copy link
Contributor

Skillmon commented Nov 2, 2024

As for the argument name, what about an upper case V?

@josephwright
Copy link
Member Author

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

Thinking here is that reading the arg. spec. should be clear: collecting an environment will allow newlines, so should be +f not just f. That's why I've currently made it a warning - at a technical level, it's not an issue.

@josephwright
Copy link
Member Author

As for the argument name, what about an upper case V?

I thought about that - but the pattern we have at the moment is all uppercase specs relate to the matching lowercase in that they take an extra {<default>} argument. That's not the situation here, which is why at the moment I went with something else.

@josephwright
Copy link
Member Author

Is it possible in the architecture of ltcmd to change the behaviour of an argument grabber based on the next argument type? If so the optional argument should be searched with the full category code regime of f in place, and if it's found the regime should be reverted, the optional argument collected, the verbatim catcode regime reactivated and the body collected. And an optional argument should have otherwise the same observable rules as it would have with a b body (so should be found even with a new line between \begin{foo} and the optional argument).

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

@FrankMittelbach
Copy link
Member

As for the argument name, what about an upper case V?

I thought about that - but the pattern we have at the moment is all uppercase specs relate to the matching lowercase in that they take an extra {<default>} argument. That's not the situation here, which is why at the moment I went with something else.

that pattern is strong and easy to understand so I don't think we should break it, thus that would rule V out in my opinion

@FrankMittelbach
Copy link
Member

Maybe a dumb Saturday evening idea, but what about using a special sort of processor? Right now >{...}is a processor that is applied after grabbing, but we could have <{...} for before, or less general c{named-catcode-regiment}. The you would have c{verbatim}b+ for proces the body with "verbatim" catcodes. And fancyvrb could define its own named set. The whole thing would then work also with other arg specifiers. Handwaving ...

@FrankMittelbach
Copy link
Member

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

I would think this is bound to produce problems forever. Besides, what you do with respect to chars being %? Could be done, as you say but would be slow and probably riddled with errors in weird cases that we hadn't thought about.

@josephwright
Copy link
Member Author

Maybe a dumb Saturday evening idea, but what about using a special sort of processor? Right now >{...}is a processor that is applied after grabbing, but we could have <{...} for before, or less general c{named-catcode-regiment}. The you would have c{verbatim}b+ for proces the body with "verbatim" catcodes. And fancyvrb could define its own named set. The whole thing would then work also with other arg specifiers. Handwaving ...

I see the idea, but we have a few things that mean I suspect it's not quite right

  • We need to fiddle with \endlinechar
  • We may want to 'set and reset' the catcodes (see other replies)
  • It leaves finding the end point a bit awkward
  • There's the entire question of nesting

However, it does suggest something we could do - see parallel reply.

@josephwright
Copy link
Member Author

Whilst we don't currently do this, it would be possible to do a 'pre-scan' to establish if special settings are needed. The issue I think is that unlike say fancyvrb or enverb, we don't know the nature of the optional argument(s) - they don't have to be delimited by [..]. So for example if someone has +d^$ +f, we need to be able to cope with that in a (somewhat) predictable way. We also have the case of multiple optional arguments, so it's all a bit more 'interesting' - in the end this could all be done, it's a question of what we want to provide and how well that fits more widely into the LaTeX landscape.

I would think this is bound to produce problems forever. Besides, what you do with respect to chars being %? Could be done, as you say but would be slow and probably riddled with errors in weird cases that we hadn't thought about.

I think @Skillmon's approach could work, with a logic for seeking an optional arg:

  1. Set all specials and ^^M verbatim
  2. Reset the catcode of the peek token to normal
  3. Peek ahead - loop to remove spaces/newlines if needed
  4. Reset catcodes and grab the optional arg if found

This could be done by adding an additional specifier (lets say *) which means 'use a verbatim-safe approach to grabbing an optional arg'. That would allow us to avoid needed to unpick everything to decide if to use this slower approach.

@Skillmon
Copy link
Contributor

Skillmon commented Nov 3, 2024

Performance is not that bad, it's just \begingroup\expanded{\noexpand\dospecials\catcode\peekchar=\the\catcode\peekchar}\@ifnextchar\peekchar (sort of) and then, if the argument is found \endgroup and collect it, else grab the verbatim argument. And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

@josephwright
Copy link
Member Author

Performance is not that bad, it's just \begingroup\expanded{\noexpand\dospecials\catcode\peekchar=\the\catcode\peekchar}\@ifnextchar\peekchar (sort of) and then, if the argument is found \endgroup and collect it, else grab the verbatim argument. And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

I don't think we can use a group, and we have to deal with looping for spaces, but yes, I guess you are broadly right about performance.

@josephwright
Copy link
Member Author

And: To match current behaviour of ltcmd you should only strip the first newline (because two consecutives are normally a \par hence don't match the optional argument's start-token). And depending on the stripping of the first newline in the f-argument you need to conditionally add the ignored newline to the verbatim argument if no optional argument is found.

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

@josephwright
Copy link
Member Author

I'd maybe drop the warning on +f and make f just behave correctly. There is no good reason not to, imho, it's also not +b for multiline bodies of environments.

Thinking here is that reading the arg. spec. should be clear: collecting an environment will allow newlines, so should be +f not just f. That's why I've currently made it a warning - at a technical level, it's not an issue.

One might well argue that both b and f should act the same way, + or not - an easy adjustment if that's the overall agreement. If we do that, I'd normalise to one agreed form.

@Skillmon
Copy link
Contributor

Skillmon commented Nov 4, 2024

I don't think we can use a group, and we have to deal with looping for spaces, but yes, I guess you are broadly right about performance.

Of course you can use a group! You're doing right now for the f-argument as well. Again see enverb (use of \enverb@body@setup and \enverb@search@oarg). And yes, you need to loop, obviously, but it's a tight loop that only needs to check three tokens (active space, ^^M, and the optional argument's begin-token, and as I said it needs to keep track of the tokens it ignored and whether it already gobbled up a ^^M and re-add those if it doesn't find the optional argument).

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

Yes, applying * to v would then also be desirable. Though ltcmd could check for that itself I guess it's much less complicated to let the user decide and not let ltcmd peek ahead for the structure of (possibly all) following arguments.

One might well argue that both b and f should act the same way, + or not - an easy adjustment if that's the overall agreement. If we do that, I'd normalise to one agreed form.

My argumentation is that +v means multiple lines are allowed, that meaning doesn't make sense for f, as restricting an environment to a single line sounds rather odd. And +b means that b might take a \par token. That's not applicable to f as ^^M is set to other. So the only valid meaning of f vs. +f would be f not allowing empty/blank lines in the body and +f doing so. Still odd for verbatim material, but at least understandable. The current behaviour is just odd (if it still is the current behaviour, haven't taken another look at the changes).

@josephwright
Copy link
Member Author

Good point about multiple ^^M. I suspect space skipping here is not desirable anyway. Perhaps then best to go with my * suggestion, which could also be applied to before v?

Yes, applying * to v would then also be desirable. Though ltcmd could check for that itself I guess it's much less complicated to let the user decide and not let ltcmd peek ahead for the structure of (possibly all) following arguments.

I think I'll implement the suggestion, we can look at it then decide if we need automation - that would mainly sit in a different part of the code so would still require the underlying * code.

base/doc/usrguide.tex Outdated Show resolved Hide resolved
@josephwright josephwright force-pushed the verb-env branch 2 times, most recently from 79ff942 to 4600c98 Compare December 11, 2024 16:32
Copy link
Contributor

@muzimuzhi muzimuzhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos. Only read the non-code part.

base/doc/ltnews41.tex Outdated Show resolved Hide resolved
base/ltcmd.dtx Outdated Show resolved Hide resolved
base/ltcmd.dtx Outdated Show resolved Hide resolved
@josephwright
Copy link
Member Author

I've made two adjustments here:

  • f-type are always 'long'
  • Use @Skillmon's suggestion to grab optional arguments 'verb safe'

The second idea is partly-done. At present, you have to turn this on using a * specifier. If the mechanism looks OK, I can automate this - but probably only for an immediately-preceeding D-type argument. I'll do that, and document, once we have agreement this looks OK. See the test file for demos.

@josephwright
Copy link
Member Author

We could make life a little easier e.g. by saying that optional args before f-type have to use 'other' chars, or using a charcode peek test (more risky). But on balance, I'd rather go with an automated version of what I already have -

@Skillmon
Copy link
Contributor

For the tokens in front of \end{env}: enverb uses their count for its auto gobbling feature which makes it automatically work in .dtx files or in natural indentation. I'm not suggesting to automatically do this for f-type, but making something similar possible would be nice (maybe by putting the contents into a documented str-variable, or via an argument processor).

Regarding the tokens after it: There are two possibilities, imho, either throw an error or throw a warning (I think listings does that).

base/doc/ltnews41.tex Outdated Show resolved Hide resolved
base/doc/ltnews41.tex Outdated Show resolved Hide resolved
base/doc/usrguide.tex Outdated Show resolved Hide resolved
base/doc/usrguide.tex Outdated Show resolved Hide resolved
base/doc/usrguide.tex Outdated Show resolved Hide resolved
base/testfiles-ltcmd/ltcmd009.lvt Show resolved Hide resolved
base/testfiles-ltcmd/ltcmd009.lvt Show resolved Hide resolved
base/testfiles-ltcmd/ltcmd009.lvt Show resolved Hide resolved
base/testfiles-ltcmd/ltcmd009.lvt Show resolved Hide resolved
base/ltcmd.dtx Outdated Show resolved Hide resolved
@Skillmon
Copy link
Contributor

Also, we currently have no special handling of tokens in the same line as \begin{env} here. Many (most?) of the established packages/solutions for verbatim environments do throw an error or a warning for these (and filecontents throws a Missing \begin{document}).

@josephwright
Copy link
Member Author

For the tokens in front of \end{env}: enverb uses their count for its auto gobbling feature which makes it automatically work in .dtx files or in natural indentation. I'm not suggesting to automatically do this for f-type, but making something similar possible would be nice (maybe by putting the contents into a documented str-variable, or via an argument processor).

Based on the general ltcmd model, I'd expect the (start of the) last line to simply form part of #1 if passed through. But as that would not end in a \obeyedline, it should be easy enough to pick up if required. If more structure is required, I'd expect it to be within #1 using some marker token: \finalline perhap?

Regarding the tokens after it: There are two possibilities, imho, either throw an error or throw a warning (I think listings does that).

I think it's reasonable to exclude stuff after \end{foo} from #1, the question is whether simply to rescan it or to throw it away, and then as you say if that's an error or simply a warning/info situation. If it's say info, a consumer could redirect to an error if they want.

@josephwright
Copy link
Member Author

Also, we currently have no special handling of tokens in the same line as \begin{env} here. Many (most?) of the established packages/solutions for verbatim environments do throw an error or a warning for these (and filecontents throws a Missing \begin{document}).

Like the end-of-env case, we get a choice here. We could special-case line one, in which case I would be minded to say that optional arguments don't need to worry about spaces (\begin{Verbatim} [...] for example works fine). On the other hand, we could simply collect everything and leave to the consumer to make a decision: here we are talking tokens before the first \obeyedline so easy to pick up.

I can see an argument for special-casing both the first and last line, but want to be sure we have consensus - all of this is doable, as I say.

@davidcarlisle
Copy link
Member

For the tokens in front of \end{env}: enverb uses their count for its auto gobbling feature which makes it automatically work in .dtx files or in natural indentation. I'm not suggesting to automatically do this for f-type, but making something similar possible would be nice (maybe by putting the contents into a documented str-variable, or via an argument processor).

Based on the general ltcmd model, I'd expect the (start of the) last line to simply form part of #1 if passed through. But as that would not end in a \obeyedline, it should be easy enough to pick up if required. If more structure is required, I'd expect it to be within #1 using some marker token: \finalline perhap?

If I understand @Skillmon's comment correctly the command handling the grabbed f argument would need to know the last line before processing of the returned argument starts (so the prefix could be stripped from all lines) so returning an inline markup would be usable but not especially convenient

@Skillmon
Copy link
Contributor

Just an idea: Special handling of the first and last line could also be implemented as an argument processor and then split the f-argument in three arguments (the contents on the first line, the body minus the two special lines, the contents on the last line). If the processor isn't used you just get the entire contents in a single argument.

Not sure how the policy is on argument processors only really usable on a single argument type.

@josephwright
Copy link
Member Author

Just an idea: Special handling of the first and last line could also be implemented as an argument processor and then split the f-argument in three arguments (the contents on the first line, the body minus the two special lines, the contents on the last line). If the processor isn't used you just get the entire contents in a single argument.

Yes, that is certainly a possible- although if the feeling is there should be a 'kernel position' on what syntax makes sense, then this would not quite be the way.

Not sure how the policy is on argument processors only really usable on a single argument type.

I think provided we can find a suitable non-clashing name, that's not a major concern.

@josephwright
Copy link
Member Author

My feeling on the various questions I've left open

  • I'd be happy to say that the first line cannot have 'content', only optional argument(s) - would make some of the parsing easier
  • I would go with allowing tokens before but not after the \end{foo} marker: any tokens after the end should I think be a kernel-level error (i.e. go for consistency)
  • As a result, I'd not support ! - with the two positions above, it's not needed
  • I would automate adding my current * hack - but only for the immediately preceding D-type: I can see how to do this automatically (track the last-added argument type and allow unwinding) - I would not allow it to be set manually as something like *d^^ *d__ would be a nightmare to support

but I am happy to listen to alternative views.

Copy link
Member

@FrankMittelbach FrankMittelbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approve (perhaps with a bit more docu around the issues when using it in certain ways)

{ \@@_grab_D_call:Nw #1 }
{ \@@_add_arg:o \c_novalue_tl }
\cs_set_eq:NN \do \char_set_catcode_other:N
\dospecials
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the definition of "the first line"? The line on which \beginappears? In a document it is easy to say no tokens on that line are supported, but if the environment is placed into another definition (as we have seen in one example, there is no "line ending" within the definition.

\group_begin:
\@@_grab_D_verb_safe:N #1
}
#5 #1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be enough to document the technical issue, but perhaps with a bit more information than just saying "for technical reasons". More on the line o f and the like is not a good idea, because it means ... in such and such ... situations, so better always use ...

@FrankMittelbach
Copy link
Member

In other news - do we want rollback or rather roll-forward? I'm not looking forward to it ...

it is a straight extension that allows documents to use syntax that wasn't allowed before, so doesn't really need a rollback.
Roll forward is kind of desirable, but it is rather difficult and often doesn't quite work. So I think probably go without is ok.

@FrankMittelbach
Copy link
Member

My feeling on the various questions I've left open

I don't have any strong views on that (and haven't thought about it much to be honest)

* I'd be happy to say that the first line cannot have 'content', only optional argument(s) - would make some of the parsing easier

might be ok (to keep it simple), but it might need explaining that if "content" is delivered as part of a macro definition it is always on the "first line". I think the starting point was exactly such a scenario wasn't it? (with the environment being inside some other definition and som intro text inside) -- if so what would be the documented advice?

* I would go with allowing tokens before but not after the `\end{foo}` marker: any tokens after the end should I think be a kernel-level error (i.e. go for consistency)

You have to allow for tokens and treat it as the last line of the content, but more importantly I think you need to allow for tokens after the \end{foo}and handle them.

Again, my argument is usage in an outer environment definitions where you may reasonably end up with \end{foo}<more code> and you then have that <more code> automatically on the same line as far as TeX is concerned

* As a result, I'd not support `!` - with the two positions above, it's not needed

guess that still applies

* I would automate adding my current `*` hack - but only for the immediately preceding `D`-type: I can see how to do this automatically (track the last-added argument type and allow unwinding) - I would not allow it to be set manually as something like `*d^^ *d__` would be a nightmare to support

Write up the user documentation and see if this is nicely explainable (and then decide if this syntax addition is really a good idea).

@josephwright
Copy link
Member Author

* I'd be happy to say that the first line cannot have 'content', only optional argument(s) - would make some of the parsing easier

might be ok (to keep it simple), but it might need explaining that if "content" is delivered as part of a macro definition it is always on the "first line". I think the starting point was exactly such a scenario wasn't it? (with the environment being inside some other definition and som intro text inside) -- if so what would be the documented advice?

As the idea here is 'filecontents-like', the putative f-type envs can't go into a macro args - we need line ends. That should be documented, I guess.

* I would go with allowing tokens before but not after the `\end{foo}` marker: any tokens after the end should I think be a kernel-level error (i.e. go for consistency)

You have to allow for tokens and treat it as the last line of the content, but more importantly I think you need to allow for tokens after the \end{foo}and handle them.

Again, my argument is usage in an outer environment definitions where you may reasonably end up with \end{foo}<more code> and you then have that <more code> automatically on the same line as far as TeX is concerned

Same point as above: you can't hide verb-like stuff in macro args.

@FrankMittelbach
Copy link
Member

Same point as above: you can't hide verb-like stuff in macro args.

I'm not talking about verb in argument, I'm talking about a case like this:

\documentclass{article}

\NewDocumentEnvironment{foo}{+b}{{\ttfamily #1}}{}

\NewDocumentEnvironment{baz}{}{\begin{foo} AAA}{xxx\end{foo} more code}

\begin{document}

\begin{foo}
 abc
\end{foo}

\begin{baz}
  xyz
\end{baz}
\end{document}

@josephwright
Copy link
Member Author

Same point as above: you can't hide verb-like stuff in macro args.

I'm not talking about verb in argument, I'm talking about a case like this:

\documentclass{article}

\NewDocumentEnvironment{foo}{+b}{{\ttfamily #1}}{}

\NewDocumentEnvironment{baz}{}{\begin{foo} AAA}{xxx\end{foo} more code}

\begin{document}

\begin{foo}
 abc
\end{foo}

\begin{baz}
  xyz
\end{baz}
\end{document}

Try that with filecontents( or verbatim) ;)

@FrankMittelbach
Copy link
Member

FrankMittelbach commented Dec 30, 2024

Try that with filecontents( or verbatim) ;)

there is no expectation that filecontents can be used or should be used for this (that is a simple packaging tool) and verbatim works (with the verbatim package) if you use the syntax without \begin, eg from the doc:

\newenvironment{myverbatim}%
{\endgraf\noindent MYVERBATIM:%
 \endgraf\verbatim}%
 {\endverbatim}

This is a very comon use case and it handles tokens after \endverbatim inside the definition.

@josephwright
Copy link
Member Author

Try that with filecontents( or verbatim) ;)

there is no expectation that filecontents can be used or should be used for this (that is a simple packaging tool) and verbatim works (with the verbatim package) if you use the syntax without \begin, eg from the doc:

\newenvironment{myverbatim}%
{\endgraf\noindent MYVERBATIM:%
 \endgraf\verbatim}%
 {\endverbatim}

This is a very comon use case and it handles tokens after \endverbatim inside the definition.

I guess we need to sort this first then :)

I modelled the current code on filecontents or verbatim from the kernel, which both grab entire lines and require ^^M to be present. That is I think the same as fancyverb or listings, both of which don't allow hiding the env. in another: you are expected to use a 'generator' to make new verb-like envs.

I've not looked at how the verbatim package works - is this done by using a delimited arg.? I'm not super-keen on suggesing \foo ... \endfoo.

@Skillmon
Copy link
Contributor

You have to allow for tokens and treat it as the last line of the content, but more importantly I think you need to allow for tokens after the \end{foo} and handle them.

Again, my argument is usage in an outer environment definitions where you may reasonably end up with \end{foo} and you then have that automatically on the same line as far as TeX is concerned

The grabbing of a nested f-type (if supported) would have to be done using a non-environment form (like for verbatim) because otherwise it's not distinguishable from \begin{minipage}{\linewidth}\begin{myverbatim}... and wouldn't know for which \end to look. That said it doesn't need to support things following the matching \end, the nested usage is something completely different from it (as the things will already be tokenised and not grabbed during parsing in the verbatim category code regime).

@josephwright
Copy link
Member Author

I've squashed and rebased the work so far: I plan to add some new commits to adjust behaviour so they can be reviewed with a clear starting point :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants