-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing an object as &Header, but reading the data past the end of the header #256
Comments
Anecdotally, I think why I thought this was allowed is it mirror how some other functionality of references behave. In particular converting Additionally, the fact that in many other cases we're allowed to "pun" memory and have it just work the way we expect gave me a false sense of security here. |
At least if using C FFIs then you want |
As you noted, this looks like a duplicate of #134. Note that all of these things are allowed to raw pointers. So all of these APIs that you describe can be implemented, as long as only raw pointer and no references are used. But for references, this would be a serious problem as it breaks a very powerful optimization: fn optimized(x: &mut i32, f: impl FnOnce()) {
*x = 5; // This write is redundant and can be removed, according to Stacked Borrows (under unwind=abort)
f();
*x = 6;
} If we allow reads outside the bounds of the given reference, we can no longer do this optimization! fn print_neighbor(x: &mut i32) {
// use some unsafe code to read the memory at `x - 4` and print that to stdout
}
fn counterexample() {
let mut pair = (0, 0);
optimized(&mut pair.0, || print_neighbor(&mut pair.1));
} The optimization changes what gets printed to stdout. Optimizations that change program behavior are illegal, so we cannot do this optimization -- except if the program has UB. That's why it is very important that the above program has UB. (Things get even worse if we also allow writes outside the range indicated by the type. At that point I do not know how to still have any useful optimizations.) |
Hmm, I have code which does this because using raw pointers everywhere results in really horrible code. It implements a "thin vec" type, where the length and capacity are stored inside the allocation. #[repr(C)]
#[repr(align(4))]
struct Header {
len: usize,
cap: usize,
} (The items are stored inline following the header) All the internal methods are implemented on the This is one of several "thin" wrapper types that each have their own header type. Rewriting them all to use raw pointers would result in very messy and hard to read code, not to mention less safe (since atm the unsafe parts can be fairly contained). This
|
|
One difference between my example and the desired use-cases is that not only is the pointer used "out of bounds", there also is another pointer that actually accesses those out-of-bounds parts. Maybe that could help. Here's the rough idea: if we have a mutable reference I am sure interesting corner cases will show up when this gets actually implemented. But this will require a significant rework of Stacked Borrows -- basically a whole new model, based on what we learned with the first one. I certainly hope to pursue this project at some point, but unfortunately that won't happen in the near future. |
As for implementation corner cases, there exists some consensus in the RFCs linked above that |
@RalfJung Yeah, I would have assumed your example was invalid, and you figured it out but it's much broader and more dodgy than what I'm asking for. That said, your suggested solution of tracking this based on the locations the references may use sounds extremely nice and easy to reason about! I think I'm already a huge fan, since it matches my mental model very closely — That is, in a situation like this where I have a Also, it feels like this model would allow #243, which is... important for memory allocators.
Honestly, just the notion that there's a potential model in the distant future that fits better with the code that is out there in the wild makes me very optimistic, since I had thought stacked borrows was mostly in a final tweaking phase and major changes weren't in the cards. I was kind of getting pretty worried about how bad the the fallout was going to be, so it makes me feel better. (Off topic, but related to you not having time: I saw you finished your PhD recently, congrats! Hope you're able to relax at some point and manage to land somewhere nice) |
Ah yes, that would be nice. I am also worried about how much the current semantics relies on computing the size of a value; if we ever get custom DST that would be a total nightmare. So something more based on "what locations is this actually used for" would also help here. OTOH some optimizations rely on introducing extra reads/writes that were not present before; I cannot imagine how those would work without taking into account the size of a type. Basically, everything related to protectors seems to rely pretty fundamentally on saying in advance that some region of memory "belongs to" this reference. But maybe it suffices to consider these a lower bound for what the reference may do, as opposed to now where this region also serves as an upper bound.
More research will be needed to show if this model is indeed viable. But as far as I am concerned, Stacked Borrows is the first word in terms of (precisely worked out) aliasing models for Rust, not the last.
Thanks! :) But what is that "relax" thing you are talking about? ;) |
Yes, that seems totally reasonable, and also fits with what unsafe code I've seen in the wild does/assumes (as well as my personal mental model). It's also easy to teach/explain why a too-large reference is bad (compiler allowed to insert speculative and spurious reads/writes), whereas it seems much harder to explain a too-small one (providence, stacked borrows). Even C tends not to have a pointer refer to a larger type than reality — although I believe it is legal for |
One suggestion I have is to go with a similar idea as C++ has with pointer-interconvertibility and reachability.
I'd propose something similar for references. If you have a reference r to a type T, it can access a byte b if the reference includes b, or if some preceding borrow item provides the access to b, such that the item is pointer-interconvertible with r, as follows (where pointer includes reference):
One thing I'd want to ask is if this should include the "immediately enclosing array" rule. |
How would |
The Slice reference sr can access size_of_val(sr) bytes.
Conversion to a pointer is not considered by these rules, nor does conversion between raw pointer types. Basically, my proposed rules would leave the provenance of referenced intact as-is, but allow for restricted extension of that provenance. The existing cases are covered by "the reference includes b"IE. r already has provenance for b (I should probably rephrase that portion of the definition to something similar to this). Any code that is currently well-defined would remain well-defined. |
Note that the question is not about how raw ptrs interact with each other. I think this is the only place where the C++ rules could possibly be useful, but Rust is maximally permissive here currently (more permissive than C++) and I see no problem with this. The hard questions are all about cases where one side uses raw pointers and the other side uses references, and we want to maintain strong aliasing guarantees on the reference side. I don't think C++ can be much of an inspiration here since C++ has nothing close to the kind of aliasing guarantees Rust does. (C has something, namely |
No C+, does not have aliasing guarantees (beyond strict-aliasing). However, this is more of a provenance question. In this case, rust is less permissive then C++ and C, where if you have the type struct Layout{
struct Header h;
std:;string name;
std::uint64_t rest;
}; The rules I mentioned with C++ allow you to |
Rust is more permissive than C and C++. This is assuming you translate C/C++ pointers to Rust raw pointers. Rust references are more restrictive, of course, but C/C++ do not really have an equivalent type. (Maybe one could make the point about C++ references? That seems unlikely though. Those references still correspond more to raw pointers in Rust than to references. In particular, they do not come with any aliasing guarantees.)
Rust does not permit this when mixing references (with strong aliasing guarantees) and raw pointers (without any aliasing guarantees). This question does not even come up in C++, so it is impossible to draw a parallel here. The issues discussed in this thread are a consequence of the aliasing guarantees, so this is very much not a "different problem". The only thing in Miri that leads to a complain about here is Stacked Borrows, which is all about aliasing. If it wasn't for the aliasing rules enforced by Stacked Borrows, the |
One thing I just thought of: when someone goes to try out the "delayed"/"on-use" aliasing restrictions, be wary of reborrows. Reborrows happen fairly often (which is obvious for I don't expect this to be an issue, but it popped into my head as a worry so I wanted to write it down. |
Currently, when not configured to emit union SomeUnion {
int32_t int32;
double float64;
void *ptr;
}; into something like the following Rust: #[repr(C)]
pub struct __BindgenUnionField<T>(::core::marker::PhantomData<T>);
impl<T> __BindgenUnionField<T> {
#[inline]
pub const fn new() -> Self {
__BindgenUnionField(::core::marker::PhantomData)
}
#[inline]
pub unsafe fn as_ref(&self) -> &T {
::core::mem::transmute(self)
}
#[inline]
pub unsafe fn as_mut(&mut self) -> &mut T {
::core::mem::transmute(self)
}
}
#[repr(C)]
pub struct SomeUnion {
pub int32: __BindgenUnionField<i32>,
pub float64: __BindgenUnionField<f64>,
pub ptr: __BindgenUnionField<*mut core::ffi::c_void>,
pub bindgen_union_field: u64,
} in order to emulate That said, this feels different in a few ways to the other examples, and my gut is that were it not common in the wild due to tooling output, it wouldn't be that important to support (especially when other alternatives exist). |
…ayout, r=oli-obk Warn on references casting to bigger memory layout This PR extends the [`invalid_reference_casting`](https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html#invalid-reference-casting) lint (*deny-by-default*) which currently lint on `&T -> &mut T` casting to also lint on `&(mut) A -> &(mut) B` where `size_of::<B>() > size_of::<A>()` (bigger memory layout requirement). The goal is to detect such cases: ```rust let u8_ref: &u8 = &0u8; let u64_ref: &u64 = unsafe { &*(u8_ref as *const u8 as *const u64) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior let mat3 = Mat3 { a: Vec3(0i32, 0, 0), b: Vec3(0, 0, 0), c: Vec3(0, 0, 0) }; let mat3 = unsafe { &*(&mat3 as *const _ as *const [[i64; 3]; 3]) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior ``` This is added to help people who write unsafe code, especially when people have matrix struct that they cast to simple array of arrays. EDIT: One caveat, due to the [`&Header`](rust-lang/unsafe-code-guidelines#256) uncertainty the lint only fires when it can find the underline allocation. ~~I have manually tested all the new expressions that warn against Miri, and they all report immediate UB.~~ r? `@est31`
…ayout, r=oli-obk Warn on references casting to bigger memory layout This PR extends the [`invalid_reference_casting`](https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html#invalid-reference-casting) lint (*deny-by-default*) which currently lint on `&T -> &mut T` casting to also lint on `&(mut) A -> &(mut) B` where `size_of::<B>() > size_of::<A>()` (bigger memory layout requirement). The goal is to detect such cases: ```rust let u8_ref: &u8 = &0u8; let u64_ref: &u64 = unsafe { &*(u8_ref as *const u8 as *const u64) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior let mat3 = Mat3 { a: Vec3(0i32, 0, 0), b: Vec3(0, 0, 0), c: Vec3(0, 0, 0) }; let mat3 = unsafe { &*(&mat3 as *const _ as *const [[i64; 3]; 3]) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior ``` This is added to help people who write unsafe code, especially when people have matrix struct that they cast to simple array of arrays. EDIT: One caveat, due to the [`&Header`](rust-lang/unsafe-code-guidelines#256) uncertainty the lint only fires when it can find the underline allocation. ~~I have manually tested all the new expressions that warn against Miri, and they all report immediate UB.~~ r? ``@est31``
Rollup merge of rust-lang#118983 - Urgau:invalid_ref_casting-bigger-layout, r=oli-obk Warn on references casting to bigger memory layout This PR extends the [`invalid_reference_casting`](https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html#invalid-reference-casting) lint (*deny-by-default*) which currently lint on `&T -> &mut T` casting to also lint on `&(mut) A -> &(mut) B` where `size_of::<B>() > size_of::<A>()` (bigger memory layout requirement). The goal is to detect such cases: ```rust let u8_ref: &u8 = &0u8; let u64_ref: &u64 = unsafe { &*(u8_ref as *const u8 as *const u64) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior let mat3 = Mat3 { a: Vec3(0i32, 0, 0), b: Vec3(0, 0, 0), c: Vec3(0, 0, 0) }; let mat3 = unsafe { &*(&mat3 as *const _ as *const [[i64; 3]; 3]) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior ``` This is added to help people who write unsafe code, especially when people have matrix struct that they cast to simple array of arrays. EDIT: One caveat, due to the [`&Header`](rust-lang/unsafe-code-guidelines#256) uncertainty the lint only fires when it can find the underline allocation. ~~I have manually tested all the new expressions that warn against Miri, and they all report immediate UB.~~ r? ``@est31``
…oli-obk Warn on references casting to bigger memory layout This PR extends the [`invalid_reference_casting`](https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html#invalid-reference-casting) lint (*deny-by-default*) which currently lint on `&T -> &mut T` casting to also lint on `&(mut) A -> &(mut) B` where `size_of::<B>() > size_of::<A>()` (bigger memory layout requirement). The goal is to detect such cases: ```rust let u8_ref: &u8 = &0u8; let u64_ref: &u64 = unsafe { &*(u8_ref as *const u8 as *const u64) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior let mat3 = Mat3 { a: Vec3(0i32, 0, 0), b: Vec3(0, 0, 0), c: Vec3(0, 0, 0) }; let mat3 = unsafe { &*(&mat3 as *const _ as *const [[i64; 3]; 3]) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior ``` This is added to help people who write unsafe code, especially when people have matrix struct that they cast to simple array of arrays. EDIT: One caveat, due to the [`&Header`](rust-lang/unsafe-code-guidelines#256) uncertainty the lint only fires when it can find the underline allocation. ~~I have manually tested all the new expressions that warn against Miri, and they all report immediate UB.~~ r? ``@est31``
…oli-obk Warn on references casting to bigger memory layout This PR extends the [`invalid_reference_casting`](https://doc.rust-lang.org/rustc/lints/listing/deny-by-default.html#invalid-reference-casting) lint (*deny-by-default*) which currently lint on `&T -> &mut T` casting to also lint on `&(mut) A -> &(mut) B` where `size_of::<B>() > size_of::<A>()` (bigger memory layout requirement). The goal is to detect such cases: ```rust let u8_ref: &u8 = &0u8; let u64_ref: &u64 = unsafe { &*(u8_ref as *const u8 as *const u64) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior let mat3 = Mat3 { a: Vec3(0i32, 0, 0), b: Vec3(0, 0, 0), c: Vec3(0, 0, 0) }; let mat3 = unsafe { &*(&mat3 as *const _ as *const [[i64; 3]; 3]) }; //~^ ERROR casting references to a bigger memory layout is undefined behavior ``` This is added to help people who write unsafe code, especially when people have matrix struct that they cast to simple array of arrays. EDIT: One caveat, due to the [`&Header`](rust-lang/unsafe-code-guidelines#256) uncertainty the lint only fires when it can find the underline allocation. ~~I have manually tested all the new expressions that warn against Miri, and they all report immediate UB.~~ r? ``@est31``
core: avoid `extern type`s in formatting infrastructure `@RalfJung` [said](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Use.20of.20.60extern.20type.60.20in.20formatting.20machinery/near/446552837): >How attached are y'all to using `extern type` in the formatting machinery? Seems like this was introduced a [long time ago](rust-lang@34ef8f5). However, it's also [not really compatible with Stacked Borrows](rust-lang/unsafe-code-guidelines#256), and only works currently because we effectively treat references-to-extern-type almost like raw pointers in Stacked Borrows -- which of course is unsound, it's not how LLVM works. I was planning to make Miri emit a warning when this happens to avoid cases like [this](rust-lang#126814 (comment)) where people use extern type specifically to silence Miri without realizing what happens. but with the formatting machinery using extern type, this warning would just show up everywhere... > > The "proper" way to do this in Stacked Borrows is to use raw pointers (or `NonNull`). This PR does just that. r? `@RalfJung`
core: avoid `extern type`s in formatting infrastructure ``@RalfJung`` [said](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Use.20of.20.60extern.20type.60.20in.20formatting.20machinery/near/446552837): >How attached are y'all to using `extern type` in the formatting machinery? Seems like this was introduced a [long time ago](rust-lang@34ef8f5). However, it's also [not really compatible with Stacked Borrows](rust-lang/unsafe-code-guidelines#256), and only works currently because we effectively treat references-to-extern-type almost like raw pointers in Stacked Borrows -- which of course is unsound, it's not how LLVM works. I was planning to make Miri emit a warning when this happens to avoid cases like [this](rust-lang#126814 (comment)) where people use extern type specifically to silence Miri without realizing what happens. but with the formatting machinery using extern type, this warning would just show up everywhere... > > The "proper" way to do this in Stacked Borrows is to use raw pointers (or `NonNull`). This PR does just that. r? ``@RalfJung``
core: avoid `extern type`s in formatting infrastructure ```@RalfJung``` [said](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Use.20of.20.60extern.20type.60.20in.20formatting.20machinery/near/446552837): >How attached are y'all to using `extern type` in the formatting machinery? Seems like this was introduced a [long time ago](rust-lang@34ef8f5). However, it's also [not really compatible with Stacked Borrows](rust-lang/unsafe-code-guidelines#256), and only works currently because we effectively treat references-to-extern-type almost like raw pointers in Stacked Borrows -- which of course is unsound, it's not how LLVM works. I was planning to make Miri emit a warning when this happens to avoid cases like [this](rust-lang#126814 (comment)) where people use extern type specifically to silence Miri without realizing what happens. but with the formatting machinery using extern type, this warning would just show up everywhere... > > The "proper" way to do this in Stacked Borrows is to use raw pointers (or `NonNull`). This PR does just that. r? ```@RalfJung```
Rollup merge of rust-lang#126956 - joboet:fmt_no_extern_ty, r=RalfJung core: avoid `extern type`s in formatting infrastructure ```@RalfJung``` [said](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Use.20of.20.60extern.20type.60.20in.20formatting.20machinery/near/446552837): >How attached are y'all to using `extern type` in the formatting machinery? Seems like this was introduced a [long time ago](rust-lang@34ef8f5). However, it's also [not really compatible with Stacked Borrows](rust-lang/unsafe-code-guidelines#256), and only works currently because we effectively treat references-to-extern-type almost like raw pointers in Stacked Borrows -- which of course is unsound, it's not how LLVM works. I was planning to make Miri emit a warning when this happens to avoid cases like [this](rust-lang#126814 (comment)) where people use extern type specifically to silence Miri without realizing what happens. but with the formatting machinery using extern type, this warning would just show up everywhere... > > The "proper" way to do this in Stacked Borrows is to use raw pointers (or `NonNull`). This PR does just that. r? ```@RalfJung```
We stumbled upon a weird interaction between this issue and interior mutability: use core::cell::Cell;
struct Header;
#[repr(C)]
struct Data {
header: Header,
value: Cell<i32>,
}
fn main() {
let data = Data {
header: Header,
value: Cell::new(0),
};
dbg!(data.value.get());
let header = &data.header;
let data = unsafe { &*(header as *const Header as *const Data) };
data.value.set(1);
dbg!(data.value.get());
}
As far as I can see, the problem is that I'm not sure if I can even classify this as a bug, but it surely is a counterintuitive interaction. |
A |
Casting I actually did a very comprenensive writeup including diagrams what I did there, as it cost me several days if not even weeks to get it all Miri safe: From figure I use the |
@RalfJung IMO, TB is inconsistent here. I'd be fine with "references don't shorten provenance" (which is what TB currently does in most cases, as I understand it), and I'd be equally fine with "references do shorten provenance" (SB-style), but what's happening here is that the rules are different depending on whether @phip1611 I don't believe this is always UB. It's UB under Stacked Borrows, but Tree Borrows allows this, IIUIC. |
So TB rejects the code as written, but accepts it if you add an UnsafeCell? That is exactly the intended behavior. If there is no UnsafeCell in the type, we also assume there is no UnsafeCell "around" it. This is important, otherwise we would have to ditch all the shared reference optimizations. If you use type erasure schemes (like accessing the full data with just the Header) mixed with references, it is your responsibility to preserve all the important type information, like UnsafeCell. The recommended approach is to use only raw ptrs, no references. By using references mixed with raw pointers, you are opting in to "hard mode Rust".
Code accepted by TB but not by SB should be considered "not at immediate risk of miscomputation, but also not in the supported fragment, and it may be declared UB in the future". |
This is related to #2 but the read is not out of bounds of the allocation, not being written to by other threads, not the bytes of a
&mut Blah
, etc. That is to say, really the code is trying to model a dynamically sized type, that for one reason or another does not support (Note that ther are a number of custom DST proposals).So, I heard that it was UB for you to have a &T and read outside the bounds of that T, even if conceptually it's a totally in-bounds read. E.g.
T
here may be a ZST, or it may be a header after which a trailing array is expected, or standing that sits at the head of a trailing array, or it may be a struct that's the common shared fields of some set of other struct... These are pretty common in unsafe code as it's a pattern which is both legal and useful in C and C++.It's pretty common in Rust too:
It's not unheard of in C apis to use a
#[repr(C)] struct Foo { _priv: [u8; 0] }
, as this is what bindgen uses. Some of these APIs then go on use&Foo
in the Rust code. (This is essentially a workaround for a lack of a stableextern Type
). This code doesn't read the data, so the only issue would be if we told LLVM it could assume things about the pointer that turn out to be untrue in a situation like cross-lang LTO, probably.Similarly, I've seen other FFI code that used a
struct CStr([u8; 0])
for a similar purpose — as a version ofstd::ffi::CStr
that you can actually pass to C directly. (I even almost did this for ffi_support::FfiStr, but went with a pointer inside so I could easily check for code passing in null).bitvec
has aBitSlice
type which acts a lot like a slice that magically has bit-level indexing. Internally it's something likestruct BitSlice { _mem: [()] }
which lets it behave like an unsized type, The "pointer" and length are both specially encoded values that contain both the actual pointer/length as well as bit-level offsets for tracking where withing byte things are. There are a lot of reasons this might be illegal, but I had not thoughtmem::size_of_val
returning the wrong value was the actual one.anyhow::Error
internally wraps aBox<ErrorImpl<()>>
, whereErrorImpl<T>
contains a vtable, a backtrace, and then theT
.ErrorImpl<()>
is used as it behaves as the "common header" for real ErrorImpl values. On construction,Box<ErrorImpl<T>>
is converted toBox<ErrorImpl<()>>
, when stored in the Error.Whenever a method is called that needs to delegate to the vtable, the
Box<ErrorImpl<()>>
is converted into the right pointer type for the vtable function (one of&ErrorImpl<()>
,&mut ErrorImpl<()>
,Box<ErrorImpl<()>>
) which is called with that pointer. The first thing the vtable function generally does is convert the reference to e.g.&ErrorImpl<T>
, example: https://github.com/dtolnay/anyhow/blob/99c982128458fecb8d1d7aff9478dd77dac0ee3b/src/error.rs#L538-L545. (I had always kind of thought it wasn't okay to useBox<T>
here, but I'm surprised that stuff like&ErrorImpl<()>
to&ErrorImpl<RealType>
isn't okay either).wio-rs
containsVariableSizedBox
which provides this pattern in a library form, and IIUC is mostly intended for the flexible-array-member case. The API attempts to launder pointers to the object, which is... very non-obvious. It seems like it plausibly avoids the issue here, though, but it's insanely subtle, and if this is the recommended pattern, I suspect it will need a very good nomicon entry. https://github.com/retep998/wio-rs/blob/9bf021178b2d02485f1bd35e6cff41bf52d4a9a2/src/vsb.rs#L98-L113I do something similar in
arcstr
, where there's a header and a variable length segment that trails it. I avoided issues here by luck, as I took great care to avoid ever putting the inner type behind a reference. This was lucky since I wasn't aware of this at all, and did it for other reasons. This was painful as it required field hard-coding offsets.This isn't to say anything of the numerous C or C++ apis which expose polymorphism in this way — In c++ this is how single non-virtual inheritance is represented, so it's especially common, although it was common in C too. Additionally, C code with a flexible array member is in tons of places, and not just windows APIs.
This is just a few off the top — there's a lot of unsafe code that does this. Personally, I had thought it was allowed so long as you don't go past the actual bounds of the allocation, it makes some sense that it's not though, unfortunately. (Somehow, I don't think I've ever had miri trouble me about it, but it's seeming like it's just because of luck && coincidence more than anything else).
Anyway, I think if this is UB we should start being way more vocal about it, because it's a totally legal pattern in C and C++, and common.
The text was updated successfully, but these errors were encountered: