Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to reboot pivot binary and completely wipe its memory #124

Open
Tracked by #122
emostov opened this issue Sep 20, 2022 · 3 comments
Open
Tracked by #122

Add ability to reboot pivot binary and completely wipe its memory #124

emostov opened this issue Sep 20, 2022 · 3 comments

Comments

@emostov
Copy link
Contributor

emostov commented Sep 20, 2022

Add QOS functionality to regularly reboot and wipe the memory of an Enclave Application.

A predecessor to this would be some utility that can take an arbitrary binary, spawn it, and then kill it, making sure its memory is wiped when the process gets killed.

ref: #122

@cr-tk
Copy link
Collaborator

cr-tk commented Nov 24, 2022

To clarify: reboot in this context means an actual full re-initialization of the QOS instance from the ground up, correct? Or is this targeting a lesser, partial system re-initialization, for example for performance reasons?

With regards to the memory clean up: it is likely not sufficient to check which memory the pivot binary currently holds and overwrite that the pivot binary it plausibly allocated, used and de-allocated other memory regions before (that are no longer referenced at the moment of wiping). Hence we may have to wipe all free memory after killing the process to be sure no data remains in memory.

@emostov
Copy link
Contributor Author

emostov commented Nov 25, 2022

To clarify: reboot in this context means an actual full re-initialization of the QOS instance from the ground up, correct? Or is this targeting a lesser, partial system re-initialization, for example for performance reasons?

@cr-tky Just a restart of the process we spawned to execute the pivot binary. So QOS would keep running and not be restarted, but it would halt the pivot binary process, wipe its memory, and then restart.

My thinking is that it would be too much strain on the system to constantly re-attest to a QOS Node, but re-spawning the Pivot Binary seems more reasonable since its a self contained process and the Enclave Apps we have thus far are generally designed to not need any persistent data in memory.

Hence we may have to wipe all free memory after killing the process to be sure no data remains in memory.

What do you mean by "free memory" here? Does this mean all memory or just the memory that is marked as not in use?

@cr-tk
Copy link
Collaborator

cr-tk commented Nov 28, 2022

@cr-tky Just a restart of the process we spawned to execute the pivot binary. So QOS would keep running and not be restarted, but it would halt the pivot binary process, wipe its memory, and then restart.

Understood. In that case, I think we have to be very deliberate around the wording of any "restart" or "reboot" so that it is always clear to readers which steps are and aren't performed automatically or with certainty.

I previously talked with @lrvick about various schemes of small- or large-scale restarts, and as you say they all bring their own trade-offs in terms of performance, complexity, risk and so on. Especially if we want to clean states often, for example after important operations, we need solutions that don't bring a huge overhead.

What do you mean by "free memory" here? Does this mean all memory or just the memory that is marked as not in use?

In my mind, some small part of the QOS running as pid0/pid1 hold a small amount of memory (a few megabytes !?) of the enclave VM memory as it continues running "across" the pivot binary kill and re-initialization. Therefore, not all memory in the enclave can be wiped, only the "free" memory that's not allocated to any running process (-> marked as not in use) at the point of the memory scrubbing.

I can think of a potential edge case that could be relevant to this scheme in terms of security.
Exaggerated scenario, random example numbers:

  • enclave starts up, QOS pid0/pid1 allocates 10MB of RAM for a bit of stack + heap
  • Pivot Binary gets started, allocates 100MB of RAM initially
  • Pivot Binary does various computations, allocates an additional 1000MB of heap memory and fills it with various secret computation results
  • Pivot Binary releases the 1000MB of heap memory without properly overwriting all memory pages with non-sensitive values (e.g., because this was left out in some code paths accidentally or because the compiler optimized it out)
  • for some reason, the QOS process needs some additional memory pages and grabs some unused ones that were previously filled with sensitive data by the Pivot Binary
  • the pivot target restart functionality is triggered, Pivot Binary exits voluntarily or is killed, all free memory pages in the system are overwritten with non-sensitive values
  • depending on what the QOS process has done with the special memory pages it inherited, some parts (or all) of the sensitive data values of those pages are still there and allocated
  • -> problem: not all memory was properly sanitized across the pivot target restart
  • this could become a practical issue if the Pivot Target is exploited at a later point in time and either
    • a) able to escalate privileges and read all system memory including the still-allocated pages
    • b) the sensitive memory pages were released without proper wiping and available for the Pivot Binary to allocate and analyze/misuse

I think it would be interesting to discuss how we can prevent or mitigate this situation. For example, if the QOS process never allocates new heap memory after it has started the Pivot Binary, I think this situation is avoided. Similarly, if there is a special mechanism to divide and tag memory into separate pools (pool 1: QOS system, pool 2: Pivot Binary) so that they can't share discarded memory, that would also be interesting. My intuition is that normally Linux doesn't need or provide such a functionality (which is more VM-hypervisor-like) and only offers limitations in terms of numerical resource limits, but maybe there's a way to enforce this with special cgroups settings or other native but less-known mechanisms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants