-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we support inputs filter in bazel aquery? #93
Comments
Hey, @xinzhengzhang. Interesting. Yeah, there's a lot of flexibility to aquery that we're wrapping without exposing. Could I ask for your "goal backtrace"--that is, what goal you're trying to accomplish and how? I think that'll help us get to the best solution. My thoughts offhand, without knowing more details: Having a single file filter regex parameter seems reasonable if it's needed, since people can just I'm tagging also @alexander-born, just in case it's relevant to the "current file" use case he'd been describing in another issue. |
In bilibili we have a monorepo organized by bazel with tens of millions of lines of code and it is impossible to extract all target(//...) to the root project. I have tried that a 4GB compile_commands is too hard for parsing not matter source kit-lsp or clangd. So I plan to implement a plugin based on file-level. It will observer event of text editor and auto generate file-based compile_commands.json. Of course it will sacrifice some features like For source file(.swift .m .mm) we can find the corresponding target which guide how the file should be compiled. I have done a very very early version for internal use https://github.com/xinzhengzhang/bis |
I see, so the problem is an overwhelmingly large codebase/compile_commands.json. 4GB without headers is definitely overwhelmingly huge, but do I think it's kinda entertaining that it sounds like this tool managed to generate it. I presume Xcode isn't giving you references, since it, like sourcekit-lsp, is overwhelmed. Just to make sure you're avoiding a trap here: Presumably the downstream swift, cc, objc_libraries require inherited configuration from the binary rules used to build them--and the *_library rules therefore aren't configured correctly if built or aquery'd on their own. They'd be configured for the host (macOS) instead of the target (iOS, for example). Therefore it wouldn't work to just find the target that uses a given source or header file; instead you'd have to find a root target that transitively depends on the library and get the commands by aquery'ing the deps of that target. (Maybe that's what you mean by "top-level" target--but I want to make sure.) I assume it wouldn't work to just resolve this by having people specify the top-level targets they're working on in I also want to make sure there isn't some easy way to slim down compile_commands.json without making you do all this work. This is already with header extraction turned off, on a focused list of targets in `refresh_compile_commands", right? Could I ask you to also check that the overwhelming json size isn't just because of duplicate listings of files? That is, go into the 4gb file and make sure that each source file is listed only at most a few times (once per architecture), rather than like a 1000 times, or something that would lead to easily avoidable large file size. |
Thanks for the reminder and I didn't step on this pit and there are two reason why I need a input filter
The 4gb file contains the header file, because the habit of OC is one source file and one header file, removing the header file can greatly speed up the extractor but the size is still about 2gb. There are indeed some useless things such as protoc codegen tools, but there is no large-scale repetition |
Makes sense! To rehash: Sounds like you understand the trap; are finding that you want entries for headers; and want to use this to cut down processing and header search to something you can do in real time. I just wish I could help find an easier solution for you. Two last other, divergent ideas, in case they're useful: There's some chance you might be able to hand write (or generate) a single clangd config file that would cover all files well enough. And I assume you've already tried the exclude_external_sources = True and exclude_headers = "external" families of options, as quick ways of trying to filter things down. (I'm not sure if you've already considered using the set subtraction syntax in aquery, like deps(:a) - deps(:b), but IIRC, that won't work correctly here because configuration issue means that the actions will be different.) If those two ideas don't seem better--and you think this is of general use and can be done well, I'd be open to a good PR! [One other side thought: It occurs to me that you might want to be able to pass targets as runtime flags instead of needing to modify a build file every time. I'd be open to that, too, if this works out.] |
Seems that how the design looks, it will confuse users. How about we just open
And about the runtime flag if we open aquery directly like above do you mind if I remove
instead of using params like |
Hmm, that seems more complicated to me than just using an aquery string_dict, though maybe there's something I'm missing? |
What'd you think of the other proposal to keep things simple and correct for users? That is, something like: I assume neither of the other workarounds did it for you? |
So will we make aquery like this?
|
Roughly! Yeah, so the idea is to have an interface that solves your problem, while also being useful to others. I think this package list interface might do that--output should be small enough that I'm hoping it'll work for you, and more generally, it'll make it easy for others to filter down overwhelmingly large codebases, while not being too hard to implement (I think you mean that as a rough quick example, but just to make sure we're on the same page: It'd need to be |
[I'll check quickly to make sure that windows aquery doesn't need backslash separators. Assume we just need to think about forward slash unless I get back to you on that.] |
There is a little confused here. |
I'm not quite understanding what you're asking. But doesn't And I think headers are listed as inputs to the action by Bazel--the problem for users is that a header will also be an input to all rules that depend on it, potentially leading to way more matches than the one expected. Also, I checked the Windows thing and we're good to go: Bazel emits forward slashes there, too. [It's past midnight my time, so I'm going to sleep soon, but I'll be back.] |
These two ways will have the same result under package-based intercae like
I think unless we need to filter things other than package information, such as which compiler used, whether to use modules, or directly filter in the query regardless of semantics or performance. |
Yep--you're totally right. Sorry I pushed us towards the wrong solution, and thanks for pushing back. Please proceed if that interface would work for you, too! (Backtrace: I hadn't realized there was indeed already a target-based I (probably) should have built the external source filtering this way! If it's easy to fix my mistake when you're implementing this, I'd love it. If not, I can do that after. |
My bad I just gave one line of code without explaining more context. No problem, I will submit a pr to implement source_filter_packages and modify the implementation of external source based on the above discussion |
Hi @cpsauer In my own project, the speed is still not ideal without excluding the header. Because there are too many sources in some targets. But it's okay, this is our own problem, I will add additional file-level filtering based on packages filter in my fork :) And there is another problem about filtering external source, we can use syntax like
Do you have any other ideas? |
Hey, @xinzhengzhang! Thanks for your good efforts and code. I'm bummed to hear that it doesn't end up solving things efficiently enough for you. The goal had been to get something into the mainline that both solved things for you and was useful to folks more broadly. Given the current direction doesn't solve things for you, I'm going to propose we change course. How about if we instead add a runtime flag that lets you query a the command for just a single file, getting you exactly what you want. Something like Some notes:
What do you think? |
[This is less important, but I should say also that the previous approach's implementation doesn't work as cleanly as one might have thought. Some examples: We're actually matching targets, not packages, so //foo:... doesn't actually match all packages underneath, but rather all targets within the //foo package with a name at least three characters long. //foo behaves like one might have expected //foo/... to, //foo/... doesn't match targets in //foo, etc. These are probably all fixable, but still, the other isn't a slam dunk merge as is.] |
"I'd propose we only allow the --file= to simplify parsing. Though if you anticipate wanting to batch files (probably not?) then we could instead just do a --files=comma,separated,list? Using positional arguments or --files foo bar is going to get painful, given we also pass through arguments to Bazel, hence the thought of just accepting the = form."
"I'd propose we that the output be updating compile_commands.json in place, removing previous entries for the file(s) in question and replacing them, and thus handling the incremental update logic here, too, rather than pushing it to the user.
"Headers remain tricky. Consider the case where you've got mixed languages, like an Objective-C compile action at the top level target and a C++ header up the chain. Even if the top level target could technically access the C++ header with Bazel (but doesn't because that would lead to a build error), we wouldn't want to use the top level command for the header. I imagine you might have already hit this case with the plugin. Any ideas on how to handle it well?"
|
That's the best I can think of, too! Fixed in a426081 Thanks for alerting me to filter and better ways of doing this :) |
Ah, so the usual case where there are multiple compile commands for a file is when it's being recompiled for multiple OS's or architectures. This means that there are different generated directories under bazel-out. You certainly raise an interesting issue, though. I hadn't thought about that case.
I think some trickiness remains: How do we automatically find a source file that depends on a given header? Presumably you don't want to make the user manually find a source file that depends upon it... Hmm. One (not great) solution would be to input filter to all actions that could include the header, search the source files (maybe in path-closeness order) for one that contains the filename of the header, and verify that it's the right source by extracting the header. Any better ideas? Maybe there's an (a)query filtering mechanism for actions that directly depend on the header? Update: Oh, yeah! How about using attr to at least find the target that directly uses the header in srcs or hdrs? |
it is good! Very simple and effective I will implemente it like this |
(Will continue discussion over there!) |
Bazel's aquery functions can combine many interesting usage.
In my case, I am writing an extension integrating with compile_commands based on file granularity and I need a file filter from the root target.
I have done a commit xinzhengzhang@56e4147 since the target configuration is a dictionary now I have not consider the use case of multiple targets.
Back to the issue, do we need to support this feature? If there is any suggestion how to design the interface? I'd be happy to implement it.
The text was updated successfully, but these errors were encountered: