-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database Schema #2
Comments
I'd be interested in helping out with this if I can get a bit more info on how you'd like things organized. I'm familiar with Postgres and SQLite but haven't yet had an excuse go in-depth with Diesel. |
Great! This issue is more about the design side of things and probably requires more understanding of the Rust HIR than of database work (with or without Diesel). You can look at the (very sparse) internal compiler docs at Manish's github page: https://manishearth.github.io/rust-internals-docs/std/index.html – look for the |
How stable is the HIR at the moment? I know there are lots of changes at the moment going on with respect to the MIR-- will any of these have a noticeable impact on NSA's internal representation? In other words, how stable can the schema be at this point in time? |
HIR changes slightly around once a week. |
I don't think that will be a problem for our schema – Generics aren't likely to change much (until we get HKTs that is – but then I'll gladly extend the schema to make it work 😄) |
Sounds good. I'll try and get a first attempt posted here sometime tomorrow night (I'm in PST). |
Looking at |
Yes. Plain |
Sorry, maybe my question was unclear. I was trying to ask if For example, you list in the README several goals including determining if a function may panic, is pure, is recursive, or allocates memory. Following the call graph to determine any of these properties is relatively simple in the case of statically dispatched functions, but much more complicated in the case of trait objects, where it would involve pointer analysis to determine all the different types a given trait object could possibly point to. Because of this, the handling of statically dispatched methods is going to be much more similar to that of plain function calls than that of methods on trait objects. |
Yes. Luckily, a) dynamic dispatch isn't all that common in Rust and b) for most intents & purposes it's sufficient to look up all possible impls and see if their methods panic, allocate, call back or whatever. This also means we'll need to store trait impls for later lookup. |
Obviously very early, but here's what I've got so far. The visualizer didn't quite get my foreign keys right, so I'll just explain them here: Paths foreign key to (nullable) parent paths to create a hierarchical structure ( Function calls can be static or not, and this decides whether or not they have a foreign key to a function implementation or a trait function. Function call arguments are foreign keyed to type, the function call instance, and the patkind (I wasn't sure what information from Function implementations, a path foreign key, a name, and (if they are an implementation of a trait function) a foreign key to the relevant trait and type. Eventually, this is where I forsee we will attach attributes such as whether or not a function can panic, which can then be propagated upwards after all the data has been entered. Traits currently just have an id, name and path. Trait bounds are a many-to-many linking traits to their children (i.e. id of Eq -> id of PartialEq). Trait functions are the functions that need to be implemented in order to satisfy a trait. The trait implementation lists is another many-to-many linking traits to the types that implement them. Finally, types have a path, name, and other associated metadata. Let me know your thoughts-- am I headed in the right direction? Should I be sticking more closely to the exact representation used in HIR? Does anything not make sense? |
I left this out in my description, but obviously there will also be a many-to-many piece describing what functions are called by what other functions. |
@cramertj Just a heads-up – I renamed the project to metacollect due to some people feeling uneasy about the old name. No need to stoke emotions. |
@llogiq Yeah, I saw the Reddit discussion. Some very strong opinions out there... That aside, it seems like a good move as it's a far more descriptive name. |
@llogiq Did you get a chance to check out the schema I posted above? |
Sorry, not yet. I'll need to catch some sleep first – maybe tomorrow. |
This has languished for some time. Sorry for the delays. There have been a few developments in the meantime:
Looking into the schema, I like the recursive path idea. I'd like to see the |
Can you explain what you mean by "type contents?" WRE to the specialization concern, I'm wondering exactly how you'd like to handle generics then. It seems to me that the easiest route would be to include monomorphization as a step in the analysis. That would allow us to treat all our static calls as non-generic. Otherwise, we'll have to come up with some way of tracking the flow of types down through the stack. We could also take the less accurate but more general approach of examining all versions of a function that could occur (based on the trait impls we know about). By starting at the lowest level of fn calls (those that don't call other generic functions) we could propagate our metrics upwards to all generic functions that could potentially call that function (either directly or through a trait impl). The
|
With "type contents" I mean the composition of structs and enums. I'm wary of monomorphization, because a) there are existential types and trying to monomorphize can give us an exponential of monomorphized types (for example, see https://github.com/paholg/typenum – and there's code out there mishandling the type system in worse ways!) whereas tracking the "flow" of generics (via generics and substitutions, as it is done in the compiler) avoids this pitfall. There are some ugly corner cases though. Still I think it's better to think them through when we encounter them. Better a 90% solution now than a 100% solution never. Ok, I think I get your idea of trait bounds now. This requires us to mix existential and composed types in one table; we should make sure that we can distinguish those. |
So, through flow analysis, the goal is to only analyze a function's metadata for call conditions that may exist in the current code? How should this interact with library crates calling something like |
With into() and detef() calls, we record the exact receiving types, and thus can pinpoint the trait impls. As long as all dependencies (and yes, that probably includes |
Only if we're analyzing binary code, right? This won't work for library code? |
(because not all dependencies of the actual execution context could be analyzed, only the dependencies of the library) |
OK, we don't analyze FFI code. We only analyze Rust code for the crate and all dependencies. Since we expect metacollect to run on the whole dependency tree, this should give us a large part of the whole picture. |
Perhaps I'm not explaining my question correctly. Yes, we can't work with FFI code. However, when writing a library, there are often exposed APIs which are generic on some input. When calling a trait method on one of these inputs, we can't tell (without having access to the client's crate) what the concrete implementation of the trait method is, and therefore can't tell whether or not it will panic. For example: struct Person {
name: String,
}
impl Person {
fn new<S>(name: S) -> Person where S: Into<String> {
Person {
// This into() fn might panic or be impure, but we don't know, because we didn't write it.
name: name.into()
}
}
} There's really no way to get around this, so far as I can tell, aside from giving some sort of warning about possible client-crate-defined behavior. |
That's why we collect (as in "meta_collect_") the generics and substitutions. Say, metacollect records that Remember, metacollect is not about reporting, but about collection. |
Ah, I see. Sorry for the confusion-- I understand now. |
Since I'd like to use Diesel with a SQLite database to store the data, I need a database schema that can do the following:
The text was updated successfully, but these errors were encountered: