Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove some trivial leaks in codebase #5012

Merged
merged 11 commits into from
Nov 13, 2024
Merged

Remove some trivial leaks in codebase #5012

merged 11 commits into from
Nov 13, 2024

Conversation

asl
Copy link
Contributor

@asl asl commented Nov 12, 2024

The codebase contains some trivial leaks, e.g. when a vector is allocated on heap for no particular reason (it is further used by value) and then just dropped.

Try to fix this in certain places of frontend and midend.

@asl asl added core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run. labels Nov 12, 2024
@asl asl requested review from vlstill, ChrisDodd and fruffy November 12, 2024 05:59
@asl
Copy link
Contributor Author

asl commented Nov 12, 2024

This removed ~10k memory leaks in P4CParserUnroll.switch_20160512 test.

Before:
Screenshot 2024-11-12 at 09 00 36

After
Screenshot 2024-11-12 at 09 00 49

Some cleanup while there.

Copy link
Contributor

@ChrisDodd ChrisDodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bugs me that the count+at method accessing maps is so much less efficient than find+ != end, as it is much more readable. A lost opportunity to improve the C++ compiler.

@@ -615,7 +624,7 @@ class ComputeWriteSet : public Inspector, public IHasDbPrint {
exitDefinitions(new Definitions()),
lhs(false),
virtualMethod(false),
cached_locs(*new std::unordered_set<loc_t>) {
cached_locs(new CachedLocs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be std::make_shared<CachedLocs> instead?

@@ -9,7 +9,6 @@ namespace P4 {

// Insert explicit type specializations where they are missing
class DoBindTypeVariables : public Transform {
IR::IndexedVector<IR::Node> *newTypes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this can be removed indicates this pass is buggy -- it is not actually inserting any type specializations. Maybe that's just a bug in the comment and it is not actually supposed to insert explicit specializations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was copied from some other pass (this newTypes and insert) and then modified in place.

else
globals.emplace(key, flow_clone());
other->second.flow_merge(*this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do this as

if (auto other = globals->find(key); other != globals->end())
    other->second.flow_merge(*this);
else
    globals->emplace(key, flow_clone());

ir/visitor.h Outdated
@@ -528,7 +528,7 @@ class ControlFlowVisitor : public virtual Visitor {
* edge are never join points.
*/
virtual bool filter_join_point(const IR::Node *) { return false; }
ControlFlowVisitor() : globals(*new std::map<cstring, ControlFlowVisitor &>) {}
ControlFlowVisitor() : globals(new std::map<cstring, ControlFlowVisitor &>) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another place make_shared may be appropriate

tables(new std::map<cstring, TableInfo>),
actions(new std::map<cstring, FuncInfo>),
methods(new std::map<cstring, FuncInfo>),
states(new std::map<cstring, FuncInfo>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 more make_shared

body.insert(body.end(), m.begin(), m.end());
body.append(action->body->components);
action->body = new IR::BlockStatement(action->body->srcInfo, action->body->annotations,
std::move(body));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it actually worth it to use std::move in this case? I understand for the case where you passing along parameters. But here, couldn't the compiler even optimize this? Genuine question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see why compiler would generate something like move ctor here. W/o explicit move it's just plain copy as body technically is live after this call, so we need to explicitly "borrow" it. And when copy ctor is generated there is no way back :) copies are elided for certain cases only.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that you could see that the local variable is not used after it was passed in here (with some use-def analysis). Then you could elide the copy. But maybe this is too tricky to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what C++ standard says. See https://en.cppreference.com/w/cpp/language/copy_elision

Without this, copy ctor call is emitted. And it is too late for such optimizations as explicit copies and memory allocation is performed that you cannot remove as they are having side effects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the C++ compiler is permitted to use move instead of copy. The two constructors are allowed to have observably different behavior, even disregarding the state of moved-from object. As far as I know, all optimizations except for copy elision preserve semantics (at least from single-threaded view). And copy elision can only remove copy, it cannot replace it with move. The only place where move is implicit is in return foo; (where in fact this is suitable for copy elision, but in cases it does not happen there will be move). Compiler actually warns about redundant move in that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the constructors are inline, so the compiler could determine that they don't have observably different behavior, so could be used interchangeably, but it is not required to do so, and it would not surprise me if it did not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the constructors are inline, so the compiler could determine that they don't have observably different behavior, so could be used interchangeably, but it is not required to do so, and it would not surprise me if it did not.

Yeah I was thinking something along that line but I haven't really seen any discussion on this particular matter online. Mostly for my own curiosity because using std::move in this particular scenario seems onerous.

@vlstill
Copy link
Contributor

vlstill commented Nov 12, 2024

It bugs me that the count+at method accessing maps is so much less efficient than find+ != end, as it is much more readable. A lost opportunity to improve the C++ compiler.

We can use the P4::get in many cases... I'm always forgetting we have it. It is not that easy to give a good semantics to find that returns value and not iterator, especially in absence of optional (and that brings overhead). We kind of can take this shortcut (with P4::get) because most of our maps contain pointers anyway, and nullptr is usually not a valid outcome that can be stored in the map.

Copy link
Contributor

@vlstill vlstill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! The GC really does lend itself to writing a code that overuses it :-(. I know I am guilty of that too.

Just a few nitpicks...

std::vector<IR::Vector<IR::Declaration> *> toMove;
void push() { toMove.push_back(new IR::Vector<IR::Declaration>()); }
// FIXME: Do we really need IR::Vector here?
std::vector<IR::Vector<IR::Declaration>> toMove;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think IR::Vector is that much worse than std::vector when used as a value? I can see there will be some overhead as it has to be part of Node hierarchy, but would that matter that much? Maybe in this case it will because we have many of them, so it will add up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly speaking, I do not know. Vector is 112 bytes for me (as compared to 24 bytes of just std::vector), so it is definitely large compared to plain vector. The only question is how many Vector's we might have for large apps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm -- IR::Vector should be 64 bytes larger on a 64 byte machine. Most of that (48 bytes) is for the SourceInfo which is larger than it should be. If the difference is more than that, it implies the C++ compiler is doing something odd (such as using unnecessary redundant pointers in the object for mulitple inheritance).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisDodd

sizeof(Node) == 88 for me:

  • 24 bytes of INode
  • 48 bytes of SourceInfo
  • 2x4 bytes for id and cloneId
  • And 8 bytes for vtable pointer (essentially from RTTI::Base)

INode is 24 bytes because it is having 3 base classes each with virtual methods, so need to hold 3 pointers to vtables.

Vector is:

  • VectorBase: 64 bytes (essentially, Node w/o INode)
  • safe_vector: 24 bytes
  • INode: 24 bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, INode should be 0 bytes for a reasonable C++ implementation -- everything for the virtual base classes can be in Node's vtable. The fact that it is wasting 24 bytes in every object is annoying.

Copy link
Contributor Author

@asl asl Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it will be implemented then? E.g.:

class IFoo {
 virtual void foo() = 0;
}

class IBar {
 virtual void bar() = 0;
}

class IBaz {
 virtual void baz() = 0;
}

class IBig : public IFoo, public IBar, public IBaz {
};

class Foo : public virtual IBig {
  some fields;
  virtual void aaaa();
  void foo() override;
  void bar() override;
  void baz() override;
}

Foo f;

void callBar(IBar *b) {
  b->bar();
}

void callBar2(Foo *f) {
  f->bar();
}

void callAAAA(Foo *f) {
  f->aaaa();
}

callBar(&b);
callBar(f);
callAAAA(f);

How we'd make all cases work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of the bases has to have its own vptr if it has any virtual member functions, there is no way around this (as a pointer through the base type has to be valid for each base and has to be able to access vptr and members of the base). The slide below1 demonstrates the layout in a particular clang implementation (I'd say GCC on Linux has to do the same).

image

All the offsets (for casting etc.) are kept in vptr and so are the member functions, but there must be multiple vptrs. The implementation is actually pretty close to what Bjarne Stroustrup published in his paper on multiple inheritance back before C++ was standardized.

Footnotes

  1. I created the slice few years ago for an advanced C++ course at fi.muni.cz.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How we'd make all cases work?

It requires laying out the vtables at link time. That requires either a smarter linker, or a common virtual base class for IFoo/IBar/IBaz, in which case you can have a linker section for laying out the vtables of all subclasses of that common virtual base class with even a dumb traditional linker.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you can load a class from a .so file at runtime and use it in a program, and it can have a combination of bases from the original program and from the .so. And you have to know sizes of the structs much earlier than that happens.

frontends/p4/parserControlFlow.cpp Outdated Show resolved Hide resolved
frontends/p4/parserControlFlow.cpp Outdated Show resolved Hide resolved
ir/visitor.h Outdated Show resolved Hide resolved
midend/local_copyprop.cpp Outdated Show resolved Hide resolved
auto &local = available[var->name];
auto [it, inserted] = available.emplace(var->name, VarInfo());
if (!inserted) BUG("duplicate var declaration for %s", var->name);
auto &local = it->second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too bad we don't have a proper nested pattern matching in C++ :-(.

midend/local_copyprop.cpp Outdated Show resolved Hide resolved
Comment on lines +80 to +83
std::shared_ptr<std::map<cstring, TableInfo>> tables;
std::shared_ptr<std::map<cstring, FuncInfo>> actions;
std::shared_ptr<std::map<cstring, FuncInfo>> methods;
std::shared_ptr<std::map<cstring, FuncInfo>> states;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly related, but maybe use string_map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are lots of similar places in codebase. I do have stuff like this in todo list :)

asl and others added 10 commits November 13, 2024 09:01
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Co-authored-by: Vladimír Štill <[email protected]>
Signed-off-by: Anton Korobeynikov <[email protected]>
Copy link
Contributor

@vlstill vlstill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the make_shared suggested by Chris would also make sense as it should avoid the extra allocation for reference count.

Signed-off-by: Anton Korobeynikov <[email protected]>
@asl asl added this pull request to the merge queue Nov 13, 2024
Merged via the queue into p4lang:main with commit 7348ad9 Nov 13, 2024
19 checks passed
@asl asl deleted the trivial-leaks branch November 13, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants