Use winnow instead of nom #9

spoutn1k · 2024-09-18T19:18:15Z

Changes the parser combinator library to use the more actively maintained winnow fork of nom. The changes in winnow allow for lesser nesting, and better error reporting - this PR aims to enable those features for the nodeset parsing.

fdiakh · 2024-09-19T17:28:11Z

Hi,

Thank you for contributing !

I'm not against switching to winnow, especially if it improves error reporting which is not great at the moment. However please also consider performance while refining this PR: I ran a test in which I feed a file containing a1b1 a2b1 ... a1000b100 to ns fold (through stdin): on my machine the time went from 70ms to 1.9s by switching to this branch. Do you see the same performance regression ?

Thanks !

spoutn1k · 2024-09-21T17:06:21Z

Hey ! Thank you for a great library !

I am noticing it as well. It looks like most of the time lost is spent copying stuff in vecs - most likely an oversight in my implementation. Flamegraph attached for that ..a1000b100 test.

spoutn1k · 2024-09-21T17:20:48Z

Found the issue. An error that needed a string made me copy a subslice of the input, and the framework just ran with it. Removing this temporarily for refactoring, the performance drops back to the same range:

nodeset> cat all.list | /dev/shm/cargo/release/ns fold
a[1-1000]b[1-100]
nodeset> hyperfine "cat all.list | /dev/shm/cargo/release/ns fold"
Benchmark 1: cat all.list | /dev/shm/cargo/release/ns fold
  Time (mean ± σ):      53.2 ms ±   1.9 ms    [User: 46.6 ms, System: 7.3 ms]
  Range (min … max):    49.2 ms …  58.5 ms    53 runs

spoutn1k · 2024-09-25T15:05:32Z

So there is a entire parser for nodelists, with precedence climbing and stateful parsing, but reading the clustershell configuration is done using sh and sed ? I wanted to implement errors for undefined groups and I came upon this. Is it subject to change ? Or is it a design choice and would rather keep it that way ?

fdiakh · 2024-09-25T18:54:17Z

Mmmh, I'm not sure what you mean exactly ?

Clustershell uses a mix of ini and yaml files which we parse with Rust crates.

The only use of sed I can see in this repo is in the configuration files that are used as input for some tests.

Clustershell allows you to configure dynamic group sources for which groups and their members are generated dynamically by calling commands that the user provides in configuration files. By default clustershell ships with a configuration file defining a dynamic group source that calls sed to find group definitions in a file, as you can see here: https://github.com/cea-hpc/clustershell/blob/3ab6cbd043ee5f451be25a46f6abea162761b0a0/conf/groups.conf

As part of the tests we validate that the default configuration file of clustershell is interpreted correctly.

spoutn1k · 2024-09-25T20:02:02Z

That is exactly it. I had no idea the default group configuration from the /etc/clustershell/groups.d/local.cfg was parsed by a sed call internally. It made it kind of unreliable to differentiate between an empty and non-existing group.

nodeset> strace 2>&1 -ff -s8192 /dev/shm/cargo/release/ns list @oss | grep sed                   ~/Projects/nodeset::master
read(3, "# ClusterShell node groups main configuration file\n#\n# Please see `man 5 groups.conf` and\n# http://clustershell.
[pid 15631] execve("/bin/sh", ["/bin/sh", "-c", "[ -f /etc/clustershell/groups ] && f=/etc/clustershell/groups || f=/etc/clu
[pid 15631] newfstatat(AT_FDCWD, "/home/jb/.cargo/bin/sed", 0x7ffe481224b0, 0) = -1 ENOENT (No such file or directory)
[pid 15631] newfstatat(AT_FDCWD, "/home/jb/.local/bin/sed", 0x7ffe481224b0, 0) = -1 ENOENT (No such file or directory)
[pid 15631] newfstatat(AT_FDCWD, "/usr/local/bin/sed", 0x7ffe481224b0, 0) = -1 ENOENT (No such file or directory)
[pid 15631] newfstatat(AT_FDCWD, "/usr/bin/sed", {st_mode=S_IFREG|0755, st_size=108912, ...}, 0) = 0
[pid 15631] newfstatat(AT_FDCWD, "/usr/bin/sed", {st_mode=S_IFREG|0755, st_size=108912, ...}, 0) = 0
[pid 15631] access("/usr/bin/sed", X_OK) = 0
[pid 15631] newfstatat(AT_FDCWD, "/usr/bin/sed", {st_mode=S_IFREG|0755, st_size=108912, ...}, 0) = 0
[pid 15631] access("/usr/bin/sed", R_OK) = 0
[pid 15631] execve("/usr/bin/sed", ["sed", "-n", "s/^oss:\\(.*\\)/\\1/p", "/etc/clustershell/groups.d/local.cfg"], 0x5f0c9a

Makes a lot more sense with your explanation of why I was seeing sed calls being made. Not convinced this is good practice from clustershell ...

fdiakh · 2024-09-29T09:59:25Z

Clustershell has been able parse static group source definition in yaml files for a while now (see the autodirs configuration), but I guess they are keeping the old format by default for backwards compatibility. There is indeed a risk that new users may keep using the less efficient method...

Speaking of backwards compatibility, I think the current behaviour for non-existing group/sources was implemented to match clustershell (only return an error for non-existing sources). Do you see a strong rationale for changing this ?

spoutn1k · 2024-10-01T14:37:43Z

No I do not. My rationale was to go bottom-up for errors, as I wanted to make sure parse errors include more information, and I though the error was missing and not omitted by choice. If you are telling me this is the latter, it will be reverted.

This is going really slow because I am spread thin at the moment, sorry.

Use winnow instead of nom

867cbb8

Temporarily remove padding mismatch information

63a9c3d

spoutn1k added 2 commits September 25, 2024 18:18

Error on unknown group

51d667e

Better padding error

f4dd163

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use winnow instead of nom #9

Use winnow instead of nom #9

spoutn1k commented Sep 18, 2024

fdiakh commented Sep 19, 2024

spoutn1k commented Sep 21, 2024

spoutn1k commented Sep 21, 2024

spoutn1k commented Sep 25, 2024

fdiakh commented Sep 25, 2024

spoutn1k commented Sep 25, 2024

fdiakh commented Sep 29, 2024

spoutn1k commented Oct 1, 2024

Use winnow instead of nom #9

Are you sure you want to change the base?

Use winnow instead of nom #9

Conversation

spoutn1k commented Sep 18, 2024

fdiakh commented Sep 19, 2024

spoutn1k commented Sep 21, 2024

spoutn1k commented Sep 21, 2024

spoutn1k commented Sep 25, 2024

fdiakh commented Sep 25, 2024

spoutn1k commented Sep 25, 2024

fdiakh commented Sep 29, 2024

spoutn1k commented Oct 1, 2024