Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent parallel writes to rotational media #176

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

zmeyc
Copy link
Contributor

@zmeyc zmeyc commented Mar 8, 2021

During parallel plotting when copying multiple plots to dest simultaneously, if dest is HDD it'll result in slowdown and FS fragmentation.

This patch tries to detect whether dest is rotational media and if so, uses flock() to prevent multiple simultaneous writes to dest.

Linux only, on other platforms the behavior is unchanged.

Adds a log message notifying if parallel writing is enabled:

Starting plotting progress into temporary dirs: . and .
Final dir: . (parallel writing: disabled)
ID: 23023404333717545b0a6f0c0dde9710e4d3fe2d5cc6cc0a090a0b818bab0f17
Plot size is: 18
Buffer size is: 11MiB
Using 16 buckets
Using 2 threads of stripe size 4000

@zmeyc zmeyc force-pushed the no-parallel-writes-on-rotational-media branch 3 times, most recently from 61db4da to e1208a5 Compare March 8, 2021 11:08
src/plotter_disk.hpp Outdated Show resolved Hide resolved
src/disk_util.hpp Outdated Show resolved Hide resolved
src/disk_util.hpp Outdated Show resolved Hide resolved
src/disk_util.hpp Outdated Show resolved Hide resolved
@zmeyc zmeyc force-pushed the no-parallel-writes-on-rotational-media branch from bdc9db5 to c2f2cb6 Compare March 8, 2021 23:34
@arvidn
Copy link
Contributor

arvidn commented Mar 9, 2021

since there's no way to gracefully abort plotting (yet), it's likely to be killed with SIGKILL. Is there any risk the lock on the directory will be orphaned and just keep it locked in that case? or is the lock associated with the process and released automatically if the process is killed?

@zmeyc
Copy link
Contributor Author

zmeyc commented Mar 9, 2021

@arvidn Yes, it's released automatically if process is killed.

@zmeyc zmeyc force-pushed the no-parallel-writes-on-rotational-media branch from c2f2cb6 to c77eb7a Compare March 9, 2021 10:41
@zmeyc
Copy link
Contributor Author

zmeyc commented Mar 9, 2021

@arvidn Added logging of time taken to acquire the lock / repushed.

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks reasonable to me. I just have some minor nitpicks. Let's see what @wjblanke thinks about it

src/disk_util.hpp Outdated Show resolved Hide resolved
src/disk_util.hpp Outdated Show resolved Hide resolved
src/disk_util.hpp Outdated Show resolved Hide resolved
src/disk_util.hpp Outdated Show resolved Hide resolved
src/plotter_disk.hpp Outdated Show resolved Hide resolved
@zmeyc zmeyc force-pushed the no-parallel-writes-on-rotational-media branch from c77eb7a to f069c53 Compare March 9, 2021 11:56
@zmeyc
Copy link
Contributor Author

zmeyc commented Mar 9, 2021

@arvidn Added all the changes and cleaned up the commits.

src/disk_util.hpp Outdated Show resolved Hide resolved
src/disk_util.hpp Outdated Show resolved Hide resolved
@zmeyc zmeyc requested a review from arvidn March 10, 2021 00:30
Base automatically changed from master to main March 15, 2021 18:39
@zmeyc zmeyc force-pushed the no-parallel-writes-on-rotational-media branch from e1e5f98 to adffb0d Compare March 21, 2021 10:18
@zmeyc
Copy link
Contributor Author

zmeyc commented Mar 21, 2021

Fixed merge conflicts.

@zmeyc zmeyc marked this pull request as draft March 21, 2021 21:56
@zmeyc
Copy link
Contributor Author

zmeyc commented Mar 21, 2021

Please do not merge, found a problem with symlink (i.e. /mnt/dest -> /mnt/real-hdd) being incorrectly being detected as non-rotational media.

When multiple plotter instances are working, use flock to prevent
them from copying files to destination disk simultaneously
on rotational media.

Assume media is non-rotational by default.

Linux only, on Windows and MacOS the behavior is unchanged.
@zmeyc zmeyc force-pushed the no-parallel-writes-on-rotational-media branch from adffb0d to 7988adc Compare March 24, 2021 00:55
@zmeyc zmeyc marked this pull request as ready for review March 24, 2021 00:57
@zmeyc
Copy link
Contributor Author

zmeyc commented Mar 24, 2021

Now works correctly with symlinks.

@zmeyc
Copy link
Contributor Author

zmeyc commented Mar 24, 2021

@arvidn It was incorrect to zero-out minor device id. I changed the logic to generate a device path using both major and minor id, resolve the resulting symlink then traverse the final path up to the root looking for "queue/rotational". If not found, media is assumed to be non-rotational.

Example:

  1. /mnt/testlink has major id 65, minor id 178

  2. Generate path (symlink): "/sys/dev/block/65:178"

  3. Resolve symlink to:
    /sys/devices/pci0000:00/0000:00:03.1/0000:09:00.0/host2/target2:1:4/2:1:4:0/block/sde/sde2

  4. Append "/queue/rotational" and check if file exists:

/sys/devices/pci0000:00/0000:00:03.1/0000:09:00.0/host2/target2:1:4/2:1:4:0/block/sde/sde2/queue/rotational - not found

  1. Try root path until found:

/sys/devices/pci0000:00/0000:00:03.1/0000:09:00.0/host2/target2:1:4/2:1:4:0/block/sde/queue/rotational - found

  1. If file contains '1' - media is rotational

@zmeyc
Copy link
Contributor Author

zmeyc commented Apr 5, 2021

@arvidn Would it be possible to merge this PR? If any changes needed, please let me know.

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there would be unit tests for as much of this as possible. A test that makes sure the directory lock works shouldn't be too difficult, and a test that ensures that the device ID returns something looking like what is expected also shouldn't be too hard.

Is there anything else that could be unit tested on CI?

#if defined(__APPLE__) || defined(_WIN32)
return false;
#else
struct stat s{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function is linux specific, right?
It should be conditioned by __linux__, since right now it's for everything except MacOS and Windows.
I also think it should have a comment explaining what it does and why it works. Ideally with links to documentation stating that the assumptions made by this function are safe to make.

namespace DiskUtil {

#if !defined(__APPLE__) && !defined(_WIN32)
inline fs::path DevicePath(dev_t dev_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function also looks linux specific, it should be conditioned by __linux__.
Is the assumptions made by this function a part of the kernel or user space distro? Presumably the former, some links to documentations would be nice for readers.

}

inline int LockDirectory(
std::string dirname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a good reason for this to be string instead of fs::path?

@@ -373,8 +381,14 @@ class DiskPlotter {
}
} else {
if (!bCopied) {
bool should_lock = DiskUtil::ShouldLock(final_dirname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be const

@@ -153,9 +156,14 @@ class DiskPlotter {
}
#endif /* defined(_WIN32) || defined(__x86_64__) */

const char is_parallel_writing_enabled = !DiskUtil::ShouldLock(final_dirname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name suggests that this should be a bool, yet, it's a char. Is there a good reason for this?

@github-actions
Copy link

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants