Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use portable C++ RNG #11627

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bryanhonof
Copy link
Member

Motivation

Fixes #10541

Context

I took inspiration from "A Tour of C++, Third Edition" (ISBN-10 0136816487) to implement the Random class. Making it a bit easier for users to generate random numbers using C++'s portable RNG. The use of std::random_device might be a bit overkill, and unnecessary, but I was having fun. 😅

I'm also aware that this constantly creates instances of Random, and then instantly throws them away again. I thought about making this a singleton in some way, but then again, maybe the compiler is smart enough about this.

There are other distribution mechanisms that we could use, I just went with std::uniform_int_distribution, since that's what the book uses.

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the store Issues and pull requests concerning the Nix store label Oct 1, 2024
@Ericson2314
Copy link
Member

@bryanhonof can you try the thing I suggested in the issue of making IndirectRootStore own the random number generator?

Also, can you remove the srand and srandom we don't need anymore?

@bryanhonof
Copy link
Member Author

bryanhonof commented Oct 1, 2024

@bryanhonof can you try the thing I suggested in the issue of making IndirectRootStore own the random number generator?

Sure, although I didn't quite understand why it'd need to be part of that t.b.h. I liked the way you could use this implementation as a functor, and have the constructor define the range, giving people working on the Nix codebase a nice way to generate random numbers in the future, without doing the whole init dance. We'll lose that behavior if I'm going to instantiate it in IndirectRootStore, I think.

Also, can you remove the srand and srandom we don't need anymore?

I believe I already did, rg -- 'srand\(' returned nothing for me from the root of the project.

@bryanhonof
Copy link
Member Author

bryanhonof commented Oct 1, 2024

Added a generic Random as a public member. lmkwyt.

@bryanhonof bryanhonof force-pushed the bryanhonof.use-cpp-prng branch 2 times, most recently from 8400a1f to 5d60ccc Compare October 1, 2024 21:29
@@ -258,7 +259,7 @@ void handleSQLiteBusy(const SQLiteBusy & e, time_t & nextWarning)
is likely to fail again. */
checkInterrupt();
/* <= 0.1s */
std::this_thread::sleep_for(std::chrono::milliseconds { rand() % 100 });
std::this_thread::sleep_for(std::chrono::milliseconds { Random{0, 100}() });
Copy link
Member

@Ericson2314 Ericson2314 Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make SQLite require a reference to a Random and then for the LocalStore use the one you just added, and then add one on NarInfoDiskCacheImpl (the other user of SQLite)?

Make sure to add some doxygen to the Random & rng saying what it is used for :)

Copy link
Member Author

@bryanhonof bryanhonof Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function calling that Random, handleSQLiteBusy(), isn't a member of the SQLite struct. Would it be okay to just keep the functor call here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll need a bit of help on how to resolve this one. I'm not quite sure where to initialize the RandomNumberGenerator when it comes to this specific function. Maybe we could create a thread_local global rng for use-cases like this, but is that a suitable solution?

@bryanhonof
Copy link
Member Author

Should I also replace the code in src/libstore/filetransfer.cc with a Random?

std::random_device rd;
std::mt19937 mt19937;

@Ericson2314
Copy link
Member

@bryanhonof

Sure, although I didn't quite understand why it'd need to be part of that t.b.h. I liked the way you could use this implementation as a functor, and have the constructor define the range, giving people working on the Nix codebase a nice way to generate random numbers in the future, without doing the whole init dance. We'll lose that behavior if I'm going to instantiate it in IndirectRootStore, I think.

I think it is good if the call-site specifies the distribution, the state is just to avoid consulting the somewhat icky device random global variable. Feel free to change Random to separate the "storing the seed" part, from the "choosing what distribution to sample from" part.

@Ericson2314
Copy link
Member

@bryanhonof Sure feel free to change FileTransfer

I guess getFileTransfer and makeFileTransfer would need to be changed to pass in the seed?

@NaN-git
Copy link

NaN-git commented Oct 1, 2024

I don't think that using std::random_device is a good choice. Its behavior is basically undefined:

std::random_device may be implemented in terms of an implementation-defined pseudo-random number engine if a non-deterministic source (e.g. a hardware device) is not available to the implementation. In this case each std::random_device object may generate the same number sequence.

The question is: Which properties shall this RNG have? For testing a PRNG is useful, i.e. the same seed shall generate the same sequence of numbers. If the generator is non-deterministic, then this is not possible.
Is anything else than uniformly distributed numbers needed?
Please stay away from Mersenne-Twister, especially std::mt19937, because it's huge and slow. If not seeded correctly, then it will output a long sequence of "bad" random numbers.

@bryanhonof
Copy link
Member Author

@NaN-git

I don't think that using std::random_device is a good choice. Its behavior is basically undefined:

I did see that comment, I honestly wouldn't know what else to use. Do you maybe think having a second constructor that accepts a std::seed_seq might be better? Or perhaps even a plain int, and use that in combination with some time mechanism?

The question is: Which properties shall this RNG have? For testing a PRNG is useful, i.e. the same seed shall generate the same sequence of numbers. If the generator is non-deterministic, then this is not possible.

I'm honestly not sure, might be that @Ericson2314 knows?
But, when it comes to testing, that class has the seed() member function. So that we can just set the seed to something we know.

Is anything else than uniformly distributed numbers needed?

As far as I can see, no. I don't expect to support generating random characters any time soon. Maybe floats or doubles.

Please stay away from Mersenne-Twister, especially std::mt19937, because it's huge and slow. If not seeded correctly, then it will output a long sequence of "bad" random numbers.

I've used std::default_random_engine, which seems to default to std::minstd_rand, at least on LLVM. The code in src/libstore/filetransfer.cc does use std::mt19937, but I'm trying to replace that with just the Random class.

@Ericson2314
Copy link
Member

If it uses std::mt19937 today, I think it could be fine to keep it that way (but also fine to change it). I care more about avoiding global variables for seeding than sampling at this point, since the rand/random baseline is also somewhat vaguely defined.

@bryanhonof bryanhonof force-pushed the bryanhonof.use-cpp-prng branch 4 times, most recently from 42acd1c to 0b0d5d8 Compare October 3, 2024 01:27
src/libutil/rng.hh Outdated Show resolved Hide resolved

// Inspired by the book "A Tour of C++, Third Edition" (ISBN-10 0136816487)
template<typename T, typename Distribution, typename Engine>
struct RandomNumberGenerator
Copy link
Member

@Mic92 Mic92 Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pseudo random generator, right? We should probably put this in the name so people not use it in the wrong place.

Copy link
Member Author

@bryanhonof bryanhonof Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any random number generator really a true random number generator? 😁
No, in all seriousness, I'm not sure. If people would want to use this for cryptographic reasons, I don't think this header is the right choice. Linking against openssl, and using its facilities, would be way better. But PseudoRandomNumberGenerator starts becoming very verbose as well. Maybe mention it in the comments of this header that it isn't meant for cryptographic reasons? Perhaps use Doxygen docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mentioning in the header that this should not be used for cryptographic purpose would be a good start.

tmpPath (the replacement), so we have to move it out of the
way first. We'd better not be interrupted here, because if
we're repairing (say) Glibc, we end up with a broken system. */
Path oldPath = fmt("%1%.old-%2%-%3%", storePath, getpid(), getLocalStore().rng());
Copy link
Member

@Mic92 Mic92 Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be part of this PR, but on linux, we can use renameat2 with RENAME_EXCHANGE to make the whole thing atomic without using a temporary directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make a seperate issue for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ericson2314
Copy link
Member

I'm going to push a commit with the things I want.

@Ericson2314
Copy link
Member

@bryanhonof I am afraid I am a bit back to the drawing board now that I don't see any PRNG "splitting" of the sort mentioned in https://www.tweag.io/blog/2020-06-29-prng-test/ in the C++ standard library.

@Enzime
Copy link
Member

Enzime commented Nov 13, 2024

There's a related Lix commit that may fix #7273 that you may wish to backport instead

https://gerrit.lix.systems/c/lix/+/2100

I used "A Tour of C++, Third Edition" (ISBN-10 0136816487) as inspiration for the Random class.
This replaces all occurrences of `rand()` and `srand()`.

Co-authored-by: John Ericson <[email protected]>
Co-authored-by: Eelco Dolstra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
store Issues and pull requests concerning the Nix store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use portable C++ Pseudorandom number generator
6 participants