-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(swordfish): Memory manager #3599
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3599 +/- ##
==========================================
+ Coverage 77.80% 77.84% +0.03%
==========================================
Files 718 719 +1
Lines 88250 88359 +109
==========================================
+ Hits 68666 68785 +119
+ Misses 19584 19574 -10
|
CodSpeed Performance ReportMerging #3599 will degrade performances by 34.14%Comparing Summary
Benchmarks breakdown
|
src/common/system-info/src/lib.rs
Outdated
info: sysinfo::System, | ||
} | ||
|
||
impl Default for SystemInfo { | ||
impl Default for SystemInfoInternal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add this extra nesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SystemInfo is feature flagged for python, didn't want to propagate that in the local executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like it uses cfg_attr
rather than the feature flag. That means the second part pyclass(module = "daft.daft", frozen)
is only set if the feature is enabled.
You should be able to use SystemInfo
without the python feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[cfg(feature = "python")]
#[pymethods]
impl SystemInfo {
#[new]
#[must_use]
pub fn new() -> Self {
Self::default()
}
#[must_use]
pub fn cpu_count(&self) -> Option<u64> {
self.info.physical_core_count().map(|x| x as u64)
}
#[must_use]
pub fn total_memory(&self) -> u64 {
if let Some(cgroup) = self.info.cgroup_limits() {
cgroup.total_memory
} else {
self.info.total_memory()
}
}
}
its the methods that are flagged. I moved the internal logic to non feature flagged methods.
.await??; | ||
let result = { | ||
// MemoryPermit will be automatically dropped at the end of this scope | ||
let _permit = if let Some(mr) = memory_request { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an issue I see here is that we do the following:
- Grab a permit for the resources we're trying to allocate
- Schedule the task
- Task starts
- Task completes
- Permit is dropped
But instead I think you want the following
- Schedule the Task
- Grab a permit for the resources we're trying to allocate
- Task starts
- Permit is dropped
- Task completes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current case we are requesting resources for the right to submit rather than the right to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ideally you actually ask for the resource in the spawn of the tokio task!
use lazy_static::lazy_static; | ||
use tokio::sync::Notify; | ||
|
||
lazy_static! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
favor a OnceLock
None | ||
} | ||
|
||
pub fn get_memory_manager() -> Arc<MemoryManager> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would instead return a Arc<MemoryManager>
, up to the consume to clone if they need it
MEMORY_MANAGER.clone() | ||
} | ||
|
||
pub struct MemoryPermit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can name this OwnedMemoryPermit
and then in many cases you can also use:
MemoryPermit<'a> {
bytes: u64,
manager: &'a MemoryManager,
}
since if you are requesting the memory request in the tokio spawn, you don't need something with a static lifetime.
available_bytes: u64, | ||
} | ||
|
||
pub struct MemoryManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can limit visibility of much of this to pub(crate)
} | ||
|
||
pub async fn request_bytes(self: &Arc<Self>, bytes: u64) -> DaftResult<MemoryPermit> { | ||
if bytes == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we?
} | ||
|
||
loop { | ||
let can_allocate = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to put this code in
fn try_ request_bytes(self: &Arc<Self>, bytes: u64) -> Option<MemoryPermit> {
}
which attempts to get a permit and will otherwise return None.
THen you can loop that method in here.
Simple global memory manager that allows UDFs with memory requests to requests bytes before spawning the task.