-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
commas in activity lead to strange behavior #753
Comments
This is one of the underestimated regressions introduced by #663, |
TL;DR: I think the new behavior is problematic. Details: Apology: Sorry, I only looked for related existing issues after the 3.0.3 release. Therefore I did not know of the existing extensive discussion. Also, I found GUI user: As being mainly a user of the GUI, I have to state that unfortunately the restrictions that are put on the input for particular fields are not obvious/intuitive. Actually, as far as I can see, the GUI does not indicate any restriction at all and does not prevent input of unsupported characters. Why commas in activities?: In order to add to the statistics, let me summarize my use case: I use two categories named Tags in description in GUI buggy: After reading the help and skimming through the pages linked by @ederag, I learned that tags are now also extracted from the description. So I dried putting a #-prefixed word into the description and unfortunately I ended up with a non-clickable Save button again, as described in the initial report. Thoughts about tag extraction: I know, also the extraction of tags from the description was discussed before, but let me try making another argument against them: Until now, I assumed that the description is free text. The GUI suggest that, as it provides a multi-line input field, where I actually can type anything without any side effect except defining a description. (After reading the discussions about various delimiters, especially double-comma, I am not sure anymore how careful I have to be when writing a description.) In general, I think automatic extraction of tags is a bad idea as words starting with hash might not be meant as tag. For example, I actually often put links into my descriptions and links might contain anchors; e.g., try putting How to resolve?: Eventually, if tags should be extracted from the description is a matter of taste. So I focus on the restrictions on fields and the cmdline parsing issue here. My impression is, the cmdline is a central intermediate representation of activities even when using the GUI. Hence, the cmdline must be able to encode anything that can be input in the other fields (description, start, end, category, activity, tags). A simple (however IMO not preferable solution) would be to restrict the inputs in the other fields. Unfortunately, this seems not intuitive and might break existing use cases. Alternatively, the cmdline format could be changed. My initial idea was to introduce a way for escaping delimiters or meta symbols in general. I still think this would be a good idea as this would allow to encode arbitrary strings in any field giving the user maximal freedom. However, I see two downsides/complications: 1. Users might need to adapt existing cmdlines (e.g., when they put cmdlines in scripts) when they happened to have used the yet-to-define escape character. 2. I noticed that the cmdline is currently processed by regular expressions. This is probably not possible, at least not easily, when dealing with escaping. So the cmdline processing would need a fundamental change. In my experience, often a simple hand-woven state machine processing the input string from left to right is often enough to parse simple syntax. However, this might not be the case here since start/end seem to allow various ways of being specified. Nevertheless, a simple left-to-right approach might be useful for a kind of lexing step where the cmdline is just disassembled into the start/end, category, activity, tags, and description parts while undoing the escaping. The detailed processing of the parts can then be done in separate steps. Anyways, these are just some thoughts after having a bird eye’s view so I apologize if they do not make any sense at all and I only wasted your time while you read this. :) |
@Flupp: Nice report and detailed follow-up analysis. Realistically, I fear that neither the reversal suggested by ederag nor the parsing improvements you suggest are going to happen any time soon. Therefore, your short term options are to go back to 3.0.2 or stop modifying or adding activities with an embedded comma. Note that old activities with commas are fine as long as you don't try to edit and save them. They should continue to appear as is in reports and data extractions. Replacing all instances of commas in activity names in the hamster.db by some other character is fairly easy if you care to go that route. Find your database from a terminal session with the command I confirm that typing a #tag in the Description field of the fact editor is broken, as you have reported. This should be filed as a separate issue in my opinion. |
Pinging @rhertzog concerning #tag harvesting. |
The comma in activities has gone after extensive discussion. We can't change the entry syntax back and forth all the time. But the different behavior between typing and pasting into the activities field should be fixed. I believe the greyed-out "save" button plus error tooltip is the right approach. Also, even if fixing the DB is relatively simple, typing SQL commands is not everybody's favorite pastime. We may want to fix our database backend such that it detects commas and removes them automatically (or replaces them by some other character, ";" for example) when a user opens an entry with commmas; after all, in the DB itself, commas don't do any harm. |
@Flupp I fully agree with your nice report and analysis. hamster/src/hamster/lib/parsing.py Line 34 in ff1ab0e
|
Yes it is. This dates back to the original design of hamster. Before hamster 3.0, there was no complex activity entry dialog like we have now. The activity entry was just a single text input field, similar to what we still have e.g. in the GNOME extension. The design was focused on keyboard and autocompletion. It was possible to start new activities extremely quickly with just a few key strokes, with minimal distraction from actual work. I think this was brilliant, the best UI hamster ever had. Also, it worked just as well on the terminal command line as in the GUI. With this design came deliberately simplified syntax, like forbidding a comma in the description. The rationale was not to keep the parser code simple, but to make it as hassle-free as possible to type the command line (by avoiding the need for quoting characters). This suited the need of some people (like myself) perfectly, but others less so. Then came the Separate input fields PR, which is basically what we still have. Rather providing a fully graphical UI, the design tried to combine the simplicity of the "command line" input field with the versatility of the separate fields, and IMO that's what has lead to the current confusion: the "command line" field must be able to express the same content that is expressed by the 6 other fields. With the separate input fields in place, in theory there are no syntax restrictions any more for any field except date & time. We could simply drop the command line field, or at least the necessity for it to be semantically equivalent to the other input methods. It would be possible e.g. to grey out the command line field as soon as some characters were entered in other fields that the command line doesn't support, but still allow saving the activity. Alternatively, someone could design an extended command line syntax with quote characters and automatically re-format the command line field as soon as something was entered in other fields that couldn't be expressed in simple syntax. In this case, it would be very important to make sure that the original simple syntax still works. So the parse would need to be able to figure out which Syntax is being used. Also, the complex syntax should use standard quoting techniques like single or double quote characters rather than invent some hamster-specific idosyncracies like the "double comma". |
I'm here to give informations, The double comma may be called "idiosyncrasy" (it is indeed special to hamster) and be disliked by some people,
|
As of v3.0.3 hamster harvests single word #hash tags from the description field. This PR tweaks the regex used to extract the tags, requiring whitespace before the # character. This will prevent harvesting a tag from a pattern like example.com/page#anchor, as mentioned here: projecthamster#753 (comment). This does not affect any of the test cases in test_stuff.py nor the example from the input.page documentation.
Hey, @ederag's 👎 is back 😁 Not sure which part of my commend earned me the thumbs down this time, but I'm kind of used to getting it from ederag, so whatever. I have no interest in restarting the UI discussion, either. I just wanted to explain that the command line is important for historic reasons, and provide my €0.02 on how I think the current situation could be resolved without making yet another U-turn wrt the "double comma". |
I have opened #757 and #758 which are two valid issues that have been reported by @Flupp in this ticket. I certainly don't agree with the request to switch back the logic, but we should fix those small inconsistencies. I'm not convinced of the need of any automatic database upgrade for the persons who have comma in their activity field thus I don't have opened such an issue, but I will certainly not forbid anyone to work on this if they believe it to be important (in which case please open a separate issue). Also I don't support the idea to add quoting to be able to support arbitrary strings anywhere... the beauty of hamster is its simplicity of use, such a complicated syntax would undermine the user-friendliness of the tool. I believe this issue should be closed now that we have extracted the important bits of this useful user feedback. Thanks @Flupp !
FWIW, I don't understand why you didn't chose to put your "title" in the description field instead of in the activity field... |
@rhertzog wrote:
Yeah, quoting might be a bit too much. That’s why I originally suggested escaping, which would be rather lightweight IMO. For example, using
If a user does not want to put any meta-symbols into the fields, no escaping would be needed and the cmdline would look as it looks with the current implementation. Unfortunately I have not found the time yet to implement a prototype to see how complicated the parsing code would actually get.
The reason is simple: Because the hamster GUI provides nice sums for activities and categories. This allows me to easily see how much I worked in general (by looking at the category sum for |
... and you (the human being) are able to parse this correctly? |
@mwilck wrote:
I suppose you mean
Well, as a computer scientist, I can hardly say no if I want to keep my job. 😉 |
Today I managed to prototypically implement some escaping support: Flupp/hamster@master...Flupp:hamster:escaping. I am not really satisfied with the code yet; anyways, I think it’s good enough to evaluate the feasibility of the idea. Nevertheless, I do not expect that you adopt my idea if I am the only affected user. While doing this, I noticed some more edge cases with the current implementation (which are also still present in the prototype):
As far as I can tell, currently, there is no fail mechanism in the current parser implementation. The parser always returns something and IMO it is hard to tell for a user how the parser interprets some edge cases. However, fixing any of the above edge cases breaks backwards compatibility (again). So I do not really have an idea how to proceed here. For doing this right, one would probably have to entirely rethink the command line syntax considering parseability for humans and computers from the beginning. But this is probably not a real option. So 🤷. Nevertheless, I did not want to leave my findings undocumented. |
Complements for this work @Flupp. If it allows users to embed
|
The shell extension performs only (very trivial) parsing to separate the start time and the rest of the command line. It uses hamster |
@Flupp, can you please share the syntax rules you implemented here? How exactly does your escaping algorithm work?
I don't follow. In my version of hamster, if I enter |
I haven't found time yet to fully dive into this issue, but I've been reading the comments with interest. Wrt escaping: Adding that could potentially make the format and parser even more simpler and remove all limitations. As for the escape character, one thought I had was to consider doubling characters instead of prefixing them with
Note the As for allowing newlines - I'm all for allowing as much as possible, so allowing newlines makes sense. It might be tricky to render/edit when splitting into fields, though. |
Ah, ok, thanks. I wonder if it's possible to enter a value like this in the GUI. I haven't found a way to do it.
I tend to disagree. Writing a robust parser that allows using control characters is notoriously hard. We shouldn't do this without and ample set of hard test cases. I think doing this right would require lots of effort with rather little benefit. And even if done right, it might confuse users more than it would help. On my system, the UI displays ... and it looks yet different in the GNOME extension's UI, of course. |
I'm with @mwilck here, we should not aim to support special characters (i.e. I would rather disallow this than officialize its support, I consider it more a bug that it sort of works), that way lies madness. I'm also against the idea of adding escaping. KISS is a good principle. If hamster doesn't work for the 0,1% of persons that absolutely want an activity description with newline characters and commas, then so be it. |
I guess what I meant is that we should aim to support as much as possible, especially wrt UTF-8/unicode characters. I.e. it can be tempting to have whitelist of allowed characters and thereby excluding tons of (non-western) useful characters. If we would just explicitly disallow ASCII control characters and allow everything else, that would be fine by me (but we should probably make sure that this is then enforced at all entry points, and we should accept that existing databases can still contain these characters). |
For unicode, I think we'd be on the safe side if we cleanly use python utf8 decode/encode everywhere. I am not sure if we do, though. Add-ons like the GNOME extension are a separate matter. AFAICS, the GNOME extension's input field doesn't support entering special characters using either the compose key or the unicode ctrl-shift-u method, while Hamster itself supports both. |
In the GUI, when adding/editing an activity, and when putting a comma in the activity field, the “Save” button gets disabled and the hover-tooltip shows “Fact could not be parsed back”. When copying and reinserting the (broken?) cmdline field, it seems to be interpreted wrongly and a part of the cmdline appears in the description, however, the Save button is enabled again.
For example:
baz
13:37
Work
foo, bar
leads to cmdline
13:37 foo, bar@Work
. Reinserting this cmdline leads tobar@Work
(replacing the previous existing description)13:37
Work
foo
(lacking, bar
)I started noticing this problem with version 3.0.3. I did not notice any problems in that regard until version 3.0.2 (and I actually have a lot of old database entries with commas in the activity).
The text was updated successfully, but these errors were encountered: