Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commas in activity lead to strange behavior #753

Open
Flupp opened this issue Jan 9, 2024 · 22 comments
Open

commas in activity lead to strange behavior #753

Flupp opened this issue Jan 9, 2024 · 22 comments

Comments

@Flupp
Copy link

Flupp commented Jan 9, 2024

In the GUI, when adding/editing an activity, and when putting a comma in the activity field, the “Save” button gets disabled and the hover-tooltip shows “Fact could not be parsed back”. When copying and reinserting the (broken?) cmdline field, it seems to be interpreted wrongly and a part of the cmdline appears in the description, however, the Save button is enabled again.

For example:

  • description: baz
  • start: 13:37
  • end: empty
  • categpory: Work
  • activity: foo, bar
  • tags: empty

leads to cmdline 13:37 foo, bar@Work. Reinserting this cmdline leads to

  • description: bar@Work (replacing the previous existing description)
  • start: 13:37
  • end: empty
  • categpory: Work
  • activity: foo (lacking , bar)
  • tags: empty

I started noticing this problem with version 3.0.3. I did not notice any problems in that regard until version 3.0.2 (and I actually have a lot of old database entries with commas in the activity).

@ederag
Copy link
Collaborator

ederag commented Jan 10, 2024

This is one of the underestimated regressions introduced by #663,
see in particular #663 (comment).
Given that the v3 parser was both more capable and perfectly robust,
I'd advise to revert #663.

@Flupp
Copy link
Author

Flupp commented Jan 10, 2024

TL;DR: I think the new behavior is problematic.

Details:

Apology: Sorry, I only looked for related existing issues after the 3.0.3 release. Therefore I did not know of the existing extensive discussion. Also, I found help/C/input.page only now. (Unfortunately, as KDE user, the GUI help button opens khelpcenter informing me that “Documentation not Found”. I managed to view the help after installing yelp and using this to open the help manually. Anyways, that’s different topic.)

GUI user: As being mainly a user of the GUI, I have to state that unfortunately the restrictions that are put on the input for particular fields are not obvious/intuitive. Actually, as far as I can see, the GUI does not indicate any restriction at all and does not prevent input of unsupported characters.

Why commas in activities?: In order to add to the statistics, let me summarize my use case: I use two categories named Work (for work-related activities) and Non-Work (e.g., for breaks). For activities, I use the syntax “⟨identifier⟩: ⟨title⟩” where ⟨identifier⟩ and ⟨title⟩ are literal copies from another system so I can later match my hamster entries with entries in that system. Of course, I could manually remove commas from ⟨title⟩ manually, as the the ⟨title⟩ is only informative in my case. However, that would be tedious and/or would reduce readability.

Tags in description in GUI buggy: After reading the help and skimming through the pages linked by @ederag, I learned that tags are now also extracted from the description. So I dried putting a #-prefixed word into the description and unfortunately I ended up with a non-clickable Save button again, as described in the initial report.

Thoughts about tag extraction: I know, also the extraction of tags from the description was discussed before, but let me try making another argument against them: Until now, I assumed that the description is free text. The GUI suggest that, as it provides a multi-line input field, where I actually can type anything without any side effect except defining a description. (After reading the discussions about various delimiters, especially double-comma, I am not sure anymore how careful I have to be when writing a description.) In general, I think automatic extraction of tags is a bad idea as words starting with hash might not be meant as tag. For example, I actually often put links into my descriptions and links might contain anchors; e.g., try putting https://example.com/#example into a description. There might be other non-tag use cases for hash-prefixed words so any workaround for a particular use case might still not be enough for another.

How to resolve?: Eventually, if tags should be extracted from the description is a matter of taste. So I focus on the restrictions on fields and the cmdline parsing issue here. My impression is, the cmdline is a central intermediate representation of activities even when using the GUI. Hence, the cmdline must be able to encode anything that can be input in the other fields (description, start, end, category, activity, tags). A simple (however IMO not preferable solution) would be to restrict the inputs in the other fields. Unfortunately, this seems not intuitive and might break existing use cases. Alternatively, the cmdline format could be changed. My initial idea was to introduce a way for escaping delimiters or meta symbols in general. I still think this would be a good idea as this would allow to encode arbitrary strings in any field giving the user maximal freedom. However, I see two downsides/complications: 1. Users might need to adapt existing cmdlines (e.g., when they put cmdlines in scripts) when they happened to have used the yet-to-define escape character. 2. I noticed that the cmdline is currently processed by regular expressions. This is probably not possible, at least not easily, when dealing with escaping. So the cmdline processing would need a fundamental change.

In my experience, often a simple hand-woven state machine processing the input string from left to right is often enough to parse simple syntax. However, this might not be the case here since start/end seem to allow various ways of being specified. Nevertheless, a simple left-to-right approach might be useful for a kind of lexing step where the cmdline is just disassembled into the start/end, category, activity, tags, and description parts while undoing the escaping. The detailed processing of the parts can then be done in separate steps. Anyways, these are just some thoughts after having a bird eye’s view so I apologize if they do not make any sense at all and I only wasted your time while you read this. :)

@GeraldJansen
Copy link
Contributor

@Flupp: Nice report and detailed follow-up analysis. Realistically, I fear that neither the reversal suggested by ederag nor the parsing improvements you suggest are going to happen any time soon. Therefore, your short term options are to go back to 3.0.2 or stop modifying or adding activities with an embedded comma. Note that old activities with commas are fine as long as you don't try to edit and save them. They should continue to appear as is in reports and data extractions.

Replacing all instances of commas in activity names in the hamster.db by some other character is fairly easy if you care to go that route. Find your database from a terminal session with the command find ~ -name hamster.db and make a backup copy before proceeding. Assuming the location is ~/.local/share/hamster, you could use the following commands from a terminal session:
sqlite3 ~/.local/share/hamster/hamster.db
sqlite> update activities set name=replace(name,',',';') where name like '%,%';

I confirm that typing a #tag in the Description field of the fact editor is broken, as you have reported. This should be filed as a separate issue in my opinion.

@GeraldJansen
Copy link
Contributor

Pinging @rhertzog concerning #tag harvesting.

@mwilck
Copy link
Contributor

mwilck commented Jan 11, 2024

The comma in activities has gone after extensive discussion. We can't change the entry syntax back and forth all the time.

But the different behavior between typing and pasting into the activities field should be fixed. I believe the greyed-out "save" button plus error tooltip is the right approach.

Also, even if fixing the DB is relatively simple, typing SQL commands is not everybody's favorite pastime. We may want to fix our database backend such that it detects commas and removes them automatically (or replaces them by some other character, ";" for example) when a user opens an entry with commmas; after all, in the DB itself, commas don't do any harm.

@ederag
Copy link
Collaborator

ederag commented Jan 11, 2024

@Flupp I fully agree with your nice report and analysis.
The approach I came up with in v3.0 was a kind of a "left to right" solution you describe in the end,
except the string is consumed from the left for range, then from the right for tags and description,
the remainder being parsed for activity@category:

def parse_fact(text, range_pos="head", default_day=None, ref="now"):

@mwilck
Copy link
Contributor

mwilck commented Jan 12, 2024

My impression is, the cmdline is a central intermediate representation of activities even when using the GUI.

Yes it is. This dates back to the original design of hamster. Before hamster 3.0, there was no complex activity entry dialog like we have now. The activity entry was just a single text input field, similar to what we still have e.g. in the GNOME extension. The design was focused on keyboard and autocompletion. It was possible to start new activities extremely quickly with just a few key strokes, with minimal distraction from actual work. I think this was brilliant, the best UI hamster ever had. Also, it worked just as well on the terminal command line as in the GUI.

With this design came deliberately simplified syntax, like forbidding a comma in the description. The rationale was not to keep the parser code simple, but to make it as hassle-free as possible to type the command line (by avoiding the need for quoting characters). This suited the need of some people (like myself) perfectly, but others less so.

Then came the Separate input fields PR, which is basically what we still have. Rather providing a fully graphical UI, the design tried to combine the simplicity of the "command line" input field with the versatility of the separate fields, and IMO that's what has lead to the current confusion: the "command line" field must be able to express the same content that is expressed by the 6 other fields.

With the separate input fields in place, in theory there are no syntax restrictions any more for any field except date & time. We could simply drop the command line field, or at least the necessity for it to be semantically equivalent to the other input methods. It would be possible e.g. to grey out the command line field as soon as some characters were entered in other fields that the command line doesn't support, but still allow saving the activity.

Alternatively, someone could design an extended command line syntax with quote characters and automatically re-format the command line field as soon as something was entered in other fields that couldn't be expressed in simple syntax. In this case, it would be very important to make sure that the original simple syntax still works. So the parse would need to be able to figure out which Syntax is being used. Also, the complex syntax should use standard quoting techniques like single or double quote characters rather than invent some hamster-specific idosyncracies like the "double comma".

@ederag
Copy link
Collaborator

ederag commented Jan 13, 2024

I'm here to give informations,
but do not intend to enter that kind of discussion about the UI with @mwilck again,
so let me refer to #574 (comment), in particular the "skepticism" section and the given link.

The double comma may be called "idiosyncrasy" (it is indeed special to hamster) and be disliked by some people,
but the end of #657 (comment) shed some light on it:

Clear and simple, except perhaps for someone used to a single comma before description.

The parser then became fully robust and general enough.
The only drawback is that the same key has to be hit twice where once was needed (before description).
Apart from that, the new parser accepts all of the old syntax.

GeraldJansen added a commit to GeraldJansen/hamster that referenced this issue Jan 13, 2024
As of v3.0.3 hamster harvests single word #hash tags from the description
field. This PR tweaks the regex used to extract the tags, requiring
whitespace before the # character. This will prevent harvesting a tag from
a pattern like example.com/page#anchor, as mentioned here:
projecthamster#753 (comment).

This does not affect any of the test cases in test_stuff.py nor the example
from the input.page documentation.
@mwilck
Copy link
Contributor

mwilck commented Jan 15, 2024

Hey, @ederag's 👎 is back 😁 Not sure which part of my commend earned me the thumbs down this time, but I'm kind of used to getting it from ederag, so whatever.

I have no interest in restarting the UI discussion, either. I just wanted to explain that the command line is important for historic reasons, and provide my €0.02 on how I think the current situation could be resolved without making yet another U-turn wrt the "double comma".

@rhertzog
Copy link
Contributor

I have opened #757 and #758 which are two valid issues that have been reported by @Flupp in this ticket. I certainly don't agree with the request to switch back the logic, but we should fix those small inconsistencies.

I'm not convinced of the need of any automatic database upgrade for the persons who have comma in their activity field thus I don't have opened such an issue, but I will certainly not forbid anyone to work on this if they believe it to be important (in which case please open a separate issue).

Also I don't support the idea to add quoting to be able to support arbitrary strings anywhere... the beauty of hamster is its simplicity of use, such a complicated syntax would undermine the user-friendliness of the tool.

I believe this issue should be closed now that we have extracted the important bits of this useful user feedback. Thanks @Flupp !

For activities, I use the syntax “⟨identifier⟩: ⟨title⟩” where ⟨identifier⟩ and ⟨title⟩ are literal copies from another system so I can later match my hamster entries with entries in that system. Of course, I could manually remove commas from ⟨title⟩ manually, as the the ⟨title⟩ is only informative in my case. However, that would be tedious and/or would reduce readability.

FWIW, I don't understand why you didn't chose to put your "title" in the description field instead of in the activity field...

@Flupp
Copy link
Author

Flupp commented Jan 26, 2024

@rhertzog wrote:

[…]

Also I don't support the idea to add quoting to be able to support arbitrary strings anywhere... the beauty of hamster is its simplicity of use, such a complicated syntax would undermine the user-friendliness of the tool.

Yeah, quoting might be a bit too much. That’s why I originally suggested escaping, which would be rather lightweight IMO. For example, using \ as escape character you could write

  • cmdline 13:37 act\@act@cat\, cat, #tag1 #tag2 for
  • start 13:37,
  • category cat, cat,
  • activity act@act, and
  • tags tag1 and tag2.

If a user does not want to put any meta-symbols into the fields, no escaping would be needed and the cmdline would look as it looks with the current implementation. Unfortunately I have not found the time yet to implement a prototype to see how complicated the parsing code would actually get.

[…]

For activities, I use the syntax “⟨identifier⟩: ⟨title⟩” where ⟨identifier⟩ and ⟨title⟩ are literal copies from another system so I can later match my hamster entries with entries in that system. Of course, I could manually remove commas from ⟨title⟩ manually, as the the ⟨title⟩ is only informative in my case. However, that would be tedious and/or would reduce readability.

FWIW, I don't understand why you didn't chose to put your "title" in the description field instead of in the activity field...

The reason is simple: Because the hamster GUI provides nice sums for activities and categories. This allows me to easily see how much I worked in general (by looking at the category sum for Work) and how much time I spent for particular activities.

@mwilck
Copy link
Contributor

mwilck commented Jan 26, 2024

cmdline 13:37 act@act@cat, cat, #tag1 #tag2 for

... and you (the human being) are able to parse this correctly?

@Flupp
Copy link
Author

Flupp commented Jan 28, 2024

@mwilck wrote:

cmdline 13:37 act@act@cat, cat, #tag1 #tag2 for

I suppose you mean 13:37 act\@act@cat\, cat, #tag1 #tag2.

... and you (the human being) are able to parse this correctly?

Well, as a computer scientist, I can hardly say no if I want to keep my job. 😉
More seriously: I think it is readable enough for the edge case when a user really wants to use special characters in the activity or category. When they use only “regular” characters, they do not have to bother.

@Flupp
Copy link
Author

Flupp commented Jan 28, 2024

Today I managed to prototypically implement some escaping support: Flupp/hamster@master...Flupp:hamster:escaping. I am not really satisfied with the code yet; anyways, I think it’s good enough to evaluate the feasibility of the idea. Nevertheless, I do not expect that you adopt my idea if I am the only affected user.

While doing this, I noticed some more edge cases with the current implementation (which are also still present in the prototype):

  • When (accidentally?) putting commas between tags and forgetting a # for a tag, it is silently ignored. For example, hamster add 'act, #foo, bar, #baz' results in activity act and tags foo and baz, but bar is ignored. It might also be surprising that the commas after tags are removed considering that tags are supposed to be separated by whitespace (if I understand the syntax correctly).
  • The parser does not complain about control characters in activity, category, and tags. I am not sure if this is intended. For example try hamster add $'act\nti\tvity@ca\tte\ngory, #tag\nnl\tab'. It might be surprising that, e.g., you can add newlines but no commas in the activity.
  • Leading commas in the description are ignored, e.g., try hamster add 'act,,,,,desc' results in activity act and description desc.
  • The parser accepts additional @ characters and considers them part of the activity, e.g., hamster add 'one@two@three@four' results in activity one@two@three and category four.

As far as I can tell, currently, there is no fail mechanism in the current parser implementation. The parser always returns something and IMO it is hard to tell for a user how the parser interprets some edge cases.

However, fixing any of the above edge cases breaks backwards compatibility (again). So I do not really have an idea how to proceed here. For doing this right, one would probably have to entirely rethink the command line syntax considering parseability for humans and computers from the beginning. But this is probably not a real option. So 🤷. Nevertheless, I did not want to leave my findings undocumented.

@GeraldJansen
Copy link
Contributor

Complements for this work @Flupp. If it allows users to embed \, and \@ without breaking current usage, why not. If you care to make a pull request I would be willing to play with it. A few comments:

  • Help -> Input page should have a footnote about this (which doesn't confuse things even more for new users)
  • need to check behavior from command line and in GUI?
  • need to check that hamster-shell-extension and xfce4-hamster-plugin don't break?
  • tags can have spaces, just not those collected from the description field; tags cannot have commas
  • the other edge cases you discovered are of marginal interest in my opinion

@mwilck
Copy link
Contributor

mwilck commented Jan 29, 2024

need to check that hamster-shell-extension and xfce4-hamster-plugin don't break?

The shell extension performs only (very trivial) parsing to separate the start time and the rest of the command line. It uses hamster GetActvities() DBus method to obtain a list of activities for autocompletion. When the user hints Enter, it calls hamster's DBus AddFactRemote() method to do the parsing.

@mwilck
Copy link
Contributor

mwilck commented Jan 29, 2024

@Flupp, can you please share the syntax rules you implemented here? How exactly does your escaping algorithm work?

The parser does not complain about control characters

I don't follow. In my version of hamster, if I enter foo\nbar@baz, I don't see any control characters embedded. I see literal \ and n characters in the activity.

@matthijskooijman
Copy link
Member

I haven't found time yet to fully dive into this issue, but I've been reading the comments with interest.

Wrt escaping: Adding that could potentially make the format and parser even more simpler and remove all limitations. As for the escape character, one thought I had was to consider doubling characters instead of prefixing them with \. Even though the \ is commonly used for escaping, it is also quite ugly (IMHO). Also, using it also means that \ itself must now also be escaped. By doubling, I mean e.g. one,, two,, three@category (instead of `one, two, three@category)

I don't follow. In my version of hamster, if I enter foo\nbar@baz, I don't see any control characters embedded. I see literal \ and n characters in the activity.

Note the $'' in the hamster add $'act\nti\tvity@ca\tte\ngory, #tag\nnl\tab' command that @Flupp posted. This is bash syntax that interprets \n and passes an actual newline character to the hamster command.

As for allowing newlines - I'm all for allowing as much as possible, so allowing newlines makes sense. It might be tricky to render/edit when splitting into fields, though.

@mwilck
Copy link
Contributor

mwilck commented Jan 29, 2024

Note the $'' in the hamster add $'act\nti\tvity@ca\tte\ngory, #tag\nnl\tab' command that @Flupp posted. This is bash syntax that interprets \n and passes an actual newline character to the hamster command.

Ah, ok, thanks. I wonder if it's possible to enter a value like this in the GUI. I haven't found a way to do it.

As for allowing newlines - I'm all for allowing as much as possible

I tend to disagree. Writing a robust parser that allows using control characters is notoriously hard. We shouldn't do this without and ample set of hard test cases. I think doing this right would require lots of effort with rather little benefit. And even if done right, it might confuse users more than it would help.

On my system, the UI displays \n using an arrow symbol similar to , but it can't display most other control characters. I tried hamster add $'a\ab\be\ef\fn\nr\rt\tv\v@Nonsense', see result below.

Screenshot from 2024-01-29 16-50-46

... and it looks yet different in the GNOME extension's UI, of course.

@rhertzog
Copy link
Contributor

rhertzog commented Feb 1, 2024

I'm with @mwilck here, we should not aim to support special characters (i.e. I would rather disallow this than officialize its support, I consider it more a bug that it sort of works), that way lies madness.

I'm also against the idea of adding escaping. KISS is a good principle. If hamster doesn't work for the 0,1% of persons that absolutely want an activity description with newline characters and commas, then so be it.

@matthijskooijman
Copy link
Member

I guess what I meant is that we should aim to support as much as possible, especially wrt UTF-8/unicode characters. I.e. it can be tempting to have whitelist of allowed characters and thereby excluding tons of (non-western) useful characters. If we would just explicitly disallow ASCII control characters and allow everything else, that would be fine by me (but we should probably make sure that this is then enforced at all entry points, and we should accept that existing databases can still contain these characters).

@mwilck
Copy link
Contributor

mwilck commented Feb 1, 2024

For unicode, I think we'd be on the safe side if we cleanly use python utf8 decode/encode everywhere. I am not sure if we do, though. Add-ons like the GNOME extension are a separate matter. AFAICS, the GNOME extension's input field doesn't support entering special characters using either the compose key or the unicode ctrl-shift-u method, while Hamster itself supports both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants