Could not remove pg_stat - unsatisfied constraints (Directory not empty) #251

ocharles · 2020-02-05T15:52:34Z

I have been happily using tmp-postgres locally, but on CI I got:

Exception: /tmp/tmp-postgres-data-4de8de8b7e43c151/pg_stat: removeDirectoryRecursive:removeContentsRecursive:removePathRecursive:removeContentsRecursive:removeDirectory: unsatisfied constraints (Directory not empty)

I've never seen this before, any idea what could have happened?

The text was updated successfully, but these errors were encountered:

jfischoff · 2020-02-05T17:39:07Z

ah crap.

This is a bug I'm really struggling to kill.

I think that as removeDirectoryRecursive is running new files are being added to the directory.

This is the silly hack I have left in.

https://github.com/jfischoff/tmp-postgres/blob/master/src/Database/Postgres/Temp/Internal/Config.hs#L393

The other thing I have tried is rename folder before removing but that also did not work.

I guess I should try the idea I had of locking the directories ... but I am not really sure what I was thinking because I would lock in the ~~same process~~ (I see now ... it is postgres that is writing into the folder so it would be a different process).

I could also just try deleting over and over again for some amount of time.

ocharles · 2020-02-05T18:00:23Z

Ah, you have experienced this too then?

jfischoff · 2020-02-05T19:03:28Z

Not recently but yes. I was pretty sure I had not solved it.

ocharles · 2021-02-21T15:03:52Z

We're hitting this a lot at the moment, so I'm doing some investigation into this. Some findings:

Disabling copy-on-write doesn't stop the problem happening. Or more, copyOnWrite = False and copyOnWrite = cowCheck both cause the bug (I haven't confirmed what cowCheck is on the failing machine yet).
The tests are failing in CI builds, which are all Nix builds. Interestingly when we don't have the Nix sandbox enabled (currently our default), tests fail with this error about 90% of the time. With the Nix sandbox enabled, I am yet to see this bug happen. So it seems to be sensitive to a tainted environment.
Next, I note that find /tmp -name 'tmp-postgres-data-*' -type d | wc -l = 2653, so there's a lot of junk lying around from previous runs!
I ran a test suite until it failed with this bug. I already had a list of all existing /tmp/tmp-postgres-data directories, and the directory that failed to be removed wasn't in the list - so it's not like the random directory names are picking existing directories.
I moved /tmp/tmp-postgres-* to /tmp/old-tmp-postgres and ran a failing test suite again. This test suite still failed, and I was left with 6 tmp-postgres-data directories. This is a somewhat surprising number - this test suite uses snapshots and opens 11 connections, so why are almost half of the directories still around? I only observed one exception, too! This makes me think that sometimes this cleanup function entirely runs and then PostgreSQL writes more files. So are we running the clean up function too early?
I notice that it's only ever a complaint about the pg_stat directory. Reading https://www.postgresql.org/docs/current/monitoring-stats.html, we see

When the server shuts down cleanly, a permanent copy of the statistics data is stored in the pg_stat subdirectory, so that statistics can be retained across server restarts.

So I'm again wondering if we're doing the cleanup too early.
Looking at the cleanup code, I see that the cleanup routine is (probably) ultimately called from:

tmp-postgres/src/Database/Postgres/Temp/Internal.hs

Line 366 in 8d1fe38

Async.concurrently_ (stopPlan dbPostgresProcess) $ cleanupConfig dbResources

Now I'm suspicious - we know that shutting the server down can write files. This change was introduced in parallel stop #214 #215 which was introduced before this issue - so could this be the culprit?
I try reverting parallel stop #214 #215 circuithub@b90ed91 and rebuild using my fork.
With this change, I've managed to run the test suite 50 times in succession. 48 passed without exception. Something is still going wrong, but it appears to be a different problem (unrelated to tmp-postgres entirely).

I think my sandboxing stuff is just luck that causes the exception to happen less frequently.

I suggest that #215 is reverted, or least configurable. I'll live with an 8ms penalty if tests actually work reliably. Also, this time is spent per test, and we run tests in parallel anyway

jfischoff · 2021-02-21T17:02:05Z

Thanks @ocharles. I think you are probably right and reverting #215 is the problem. I'll revert it.

the-dr-lazy · 2021-03-14T21:42:18Z

As a temporary hack for anyone who have same issue:

Just catch the close action.

If there is a better solution, I'd be happy to see.

ocharles · 2021-03-14T22:45:22Z

The better solution is to just revert b90ed91

ocharles · 2021-08-11T09:14:33Z

@jfischoff Do you still plan to revert the above mentioned commit? We haven't had any more problems since we reverted it (we've been running https://github.com/circuithub/tmp-postgres since my last comment)

codygman · 2021-09-16T14:41:13Z

@jfischoff My team is also affected by this, it looks like our fix will be a temporary fork of tmp-postgres reverting b90ed91 since @ocharles said it's been working for some time.

This error started showing up for us seemingly out of nowhere.

@ocharles

At least, from context that seems an accurate summary of what's happening. This was recommended by @ocharles and has been working for CircuitHub per: jfischoff#251 (comment)

The sandbox was (probably) enabled due to the .stack/shell.nix files: it made use of buildStackProject, which requires to run outside of the sandbox for some reason ( https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/haskell-modules/generic-stack-builder.nix#L25 ). Now that this file is not used anymore, we can reenable the sandbox. It should help fix in particular this recent issue we’re facing in the CI: jfischoff/tmp-postgres#251

ocharles · 2022-08-11T08:02:30Z

@jfischoff Polite ping. I still think reverting #215 is worth doing.

jfischoff · 2022-08-15T18:03:07Z

@ocharles I haven't had time to work on this project recently, but I should have time this week. I'll take a look. Thanks for the ping.

parsonsmatt · 2024-05-08T22:23:35Z

I just received our first report on this. Is there anything I can do to help diagnose or get the #215 revert moving along?

jfischoff added the bug label Aug 11, 2020

jfischoff mentioned this issue Aug 11, 2020

Error on cleanup #266

Open

ismaelbouyaf mentioned this issue Oct 14, 2021

Remove sandbox=false fretlink/docker-nix#11

Merged

jfischoff mentioned this issue Mar 3, 2022

Tmp directory tests are failing on macos #259

Open

brprice mentioned this issue May 6, 2022

tmp-postgres tests occasionally fail hackworthltd/primer#275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not remove pg_stat - unsatisfied constraints (Directory not empty) #251

Could not remove pg_stat - unsatisfied constraints (Directory not empty) #251

ocharles commented Feb 5, 2020

jfischoff commented Feb 5, 2020 •

edited

Loading

ocharles commented Feb 5, 2020

jfischoff commented Feb 5, 2020 •

edited

Loading

ocharles commented Feb 21, 2021 •

edited

Loading

jfischoff commented Feb 21, 2021

the-dr-lazy commented Mar 14, 2021 •

edited

Loading

ocharles commented Mar 14, 2021

ocharles commented Aug 11, 2021

codygman commented Sep 16, 2021

ocharles commented Aug 11, 2022

jfischoff commented Aug 15, 2022 •

edited

Loading

parsonsmatt commented May 8, 2024

Could not remove pg_stat - unsatisfied constraints (Directory not empty) #251

Could not remove pg_stat - unsatisfied constraints (Directory not empty) #251

Comments

ocharles commented Feb 5, 2020

jfischoff commented Feb 5, 2020 • edited Loading

ocharles commented Feb 5, 2020

jfischoff commented Feb 5, 2020 • edited Loading

ocharles commented Feb 21, 2021 • edited Loading

jfischoff commented Feb 21, 2021

the-dr-lazy commented Mar 14, 2021 • edited Loading

ocharles commented Mar 14, 2021

ocharles commented Aug 11, 2021

codygman commented Sep 16, 2021

ocharles commented Aug 11, 2022

jfischoff commented Aug 15, 2022 • edited Loading

parsonsmatt commented May 8, 2024

jfischoff commented Feb 5, 2020 •

edited

Loading

jfischoff commented Feb 5, 2020 •

edited

Loading

ocharles commented Feb 21, 2021 •

edited

Loading

the-dr-lazy commented Mar 14, 2021 •

edited

Loading

jfischoff commented Aug 15, 2022 •

edited

Loading