Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: release unused memory in CNetMsgMaker::Make() #6233

Merged
merged 1 commit into from
Aug 31, 2024

Conversation

UdjinM6
Copy link

@UdjinM6 UdjinM6 commented Aug 27, 2024

Issue being fixed or feature implemented

We reserve capacity for large messages in CNetMsgMaker::Make() but most messages are small yet we never release unused memory here. Discovered while debugging 28165 backport issues.

What was done?

How Has This Been Tested?

Breaking Changes

n/a

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • I have assigned this pull request to a milestone

@UdjinM6 UdjinM6 added this to the 21.2 milestone Aug 27, 2024
Copy link
Collaborator

@kwvg kwvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c2c4b2b

@knst
Copy link
Collaborator

knst commented Aug 27, 2024

How long these objects are alive? And how many of them can be alive at the same time?

4kb doesn't seems as big overhead.

@kwvg
Copy link
Collaborator

kwvg commented Aug 27, 2024

@knst, when backporting bitcoin#28165, there's an assertion (source) to make sure that after data in queue is processed, the memory utilization is 0 (counter incremented when added to queue, decremented when removed from queue). This works as expected in Clang but causes Dash Core to crash with GCC.

With additional logging and bitcoin#28165, this is what it looks like.

Clang build:
2024-08-09T15:19:30.702137Z (mocktime: 2014-12-05T10:33:01Z) [httpworker.2] [net.cpp:153] [GetMemoryUsage] (PushMessage) net.cpp:4323 (node.m_send_memusage) static (56) + dynamic (4112) = 4168, msg_size=148
2024-08-09T15:19:30.702142Z (mocktime: 2014-12-05T10:33:01Z) [httpworker.2] [net.cpp:153] [GetMemoryUsage] (PushMessage) net.cpp:4324 (node.m_send_memusage) static (56) + dynamic (4112) = 4168, msg_size=148
2024-08-09T15:19:30.702147Z (mocktime: 2014-12-05T10:33:01Z) [httpworker.2] [net.cpp:4324] [PushMessage] node.m_send_memusage=4168, msg_memsize=4168, msg_size=148, msg_type=version
2024-08-09T15:19:30.702168Z (mocktime: 2014-12-05T10:33:01Z) [httpworker.2] [net.cpp:153] [GetMemoryUsage] (GetSendMemoryUsage) net.cpp:1022 (node.m_send_memusage) static (56) + dynamic (0) = 56, msg_size=0
2024-08-09T15:19:30.702226Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1034] [SocketSendData] (node.m_send_memusage) node.vSendMsg.size() = 1, node.vSendMsg[0].size() = 148
2024-08-09T15:19:30.702264Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:153] [GetMemoryUsage] (SocketSendData) net.cpp:1040 (node.m_send_memusage) static (56) + dynamic (4112) = 4168, msg_size=148
2024-08-09T15:19:30.702274Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:153] [GetMemoryUsage] (SetMessageToSend) net.cpp:972 (node.m_send_memusage) static (56) + dynamic (4112) = 4168, msg_size=148
2024-08-09T15:19:30.702280Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:972] [SetMessageToSend] (node.m_send_memusage) msg=version, size=4168
2024-08-09T15:19:30.702286Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1044] [SocketSendData] node.m_send_memusage = 0, it_pos=0
2024-08-09T15:19:30.702319Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1095] [SocketSendData] node.m_send_memusage = 0
2024-08-09T15:19:30.702332Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:153] [GetMemoryUsage] (GetSendMemoryUsage) net.cpp:1022 (node.m_send_memusage) static (56) + dynamic (0) = 56, msg_size=0
2024-08-09T15:19:30.702341Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1099] [SocketSendData] asserting node.m_send_memusage == 0
GCC build:
2024-08-12T09:06:54.135629Z (mocktime: 2014-12-05T10:33:01Z) [   msghand] [net.cpp:152] [GetMemoryUsage] (PushMessage) net.cpp:4313 (node.m_send_memusage) static (88) + dynamic (4112) = 4200, msg_size=148
2024-08-12T09:06:54.135638Z (mocktime: 2014-12-05T10:33:01Z) [   msghand] [net.cpp:152] [GetMemoryUsage] (PushMessage) net.cpp:4314 (node.m_send_memusage) static (88) + dynamic (4112) = 4200, msg_size=148
2024-08-12T09:06:54.135646Z (mocktime: 2014-12-05T10:33:01Z) [   msghand] [net.cpp:4314] [PushMessage] node.m_send_memusage=4200, msg_memsize=4200, msg_size=148, msg_type=version
2024-08-12T09:06:54.135654Z (mocktime: 2014-12-05T10:33:01Z) [   msghand] [net.cpp:152] [GetMemoryUsage] (GetSendMemoryUsage) net.cpp:1022 (node.m_send_memusage) static (88) + dynamic (0) = 88, msg_size=0
2024-08-12T09:06:54.135687Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1033] [SocketSendData] (node.m_send_memusage) node.vSendMsg.size() = 1, node.vSendMsg[0].size() = 148
2024-08-12T09:06:54.135701Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:152] [GetMemoryUsage] (SocketSendData) net.cpp:1040 (node.m_send_memusage) static (88) + dynamic (176) = 264, msg_size=148
2024-08-12T09:06:54.135733Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:152] [GetMemoryUsage] (SetMessageToSend) net.cpp:972 (node.m_send_memusage) static (88) + dynamic (176) = 264, msg_size=148
2024-08-12T09:06:54.135743Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:972] [SetMessageToSend] (node.m_send_memusage) msg=version, size=264
2024-08-12T09:06:54.135754Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1044] [SocketSendData] node.m_send_memusage = 3936, it_pos=0
2024-08-12T09:06:54.135788Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1095] [SocketSendData] node.m_send_memusage = 3936
2024-08-12T09:06:54.135801Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:152] [GetMemoryUsage] (GetSendMemoryUsage) net.cpp:1022 (node.m_send_memusage) static (88) + dynamic (0) = 88, msg_size=0
2024-08-12T09:06:54.135810Z (mocktime: 2014-12-05T10:33:01Z) [       net] [net.cpp:1099] [SocketSendData] asserting node.m_send_memusage == 0
2024-08-12T09:06:54.871872Z (mocktime: 2014-12-05T10:33:01Z) [       net] [stacktraces.cpp:528] [PrintCrashInfo] Posix Signal: Aborted
   0#: (0x55C6F5AF2B78) stacktraces.cpp:777  - HandlePosixSignal
   1#: (0x7FB2E8792420) sigaction.c          - ???
   2#: (0x7FB2E7BC100B) raise.c:51           - __GI_raise
   3#: (0x7FB2E7BA0859) abort.c:81           - __GI_abort
   4#: (0x7FB2E7BA0729) loadmsgcat.c:509     - get_sysdep_segment_value
   5#: (0x7FB2E7BA0729) loadmsgcat.c:970     - _nl_load_domain
   6#: (0x7FB2E7BB1FD6) <unknown-file>       - ???
   7#: (0x55C6F2C85B9D) net.cpp:1102         - CConnman::SocketSendData(CNode&) const
   8#: (0x55C6F2C8A76A) net.cpp:2240         - operator()
   9#: (0x55C6F2C8A76A) net.cpp:2240         - CConnman::SocketHandlerConnected(std::__debug::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&, std::__debug::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&, std::__debug::set<unsigned int, std::less<unsigned int>, std::allocator<unsigned int> > const&)
  10#: (0x55C6F2CA1461) net.cpp:2124         - CConnman::SocketHandler(CMasternodeSync&)
  11#: (0x55C6F2CA24CE) net.cpp:2400         - CConnman::ThreadSocketHandler(CMasternodeSync&)
  12#: (0x55C6F2CA28E9) std_function.h:292   - _M_invoke
  13#: (0x55C6F5D1289B) std_function.h:591   - std::function<void ()>::operator()() const
  14#: (0x55C6F5D1289B) thread.cpp:18        - util::TraceThread(char const*, std::function<void ()>)
  15#: (0x55C6F2BEB4EB) invoke.h:61          - __invoke_impl<void, void (*)(char const*, std::function<void()>), char const*, CConnman::Start(CDeterministicMNManager&, CMasternodeMetaMan&, CMasternodeSync&, CScheduler&, const Options&)::<lambda()> >
  16#: (0x55C6F2BEB4EB) invoke.h:96          - __invoke<void (*)(char const*, std::function<void()>), char const*, CConnman::Start(CDeterministicMNManager&, CMasternodeMetaMan&, CMasternodeSync&, CScheduler&, const Options&)::<lambda()> >
  17#: (0x55C6F2BEB4EB) std_thread.h:292     - _M_invoke<0, 1, 2>
  18#: (0x55C6F2BEB4EB) std_thread.h:299     - operator()
  19#: (0x55C6F2BEB4EB) std_thread.h:244     - _M_run
  20#: (0x7FB2E8602B03) <unknown-file>       - ???
  21#: (0x7FB2E8786609) pthread_create.c:478 - start_thread

This PR shrinks the vector down to size so that memory accounting works like it should and allows for reintroduction of bitcoin#28165, a blocker for BIP324 work, to dash#6067.

@UdjinM6
Copy link
Author

UdjinM6 commented Aug 27, 2024

How long these objects are alive? And how many of them can be alive at the same time?

4kb doesn't seems as big overhead.

The overhead of holding more memory than needed is small indeed and messages are alive only until they are sent so shouldn't be a huge issue from that point of view. The problem is that not releasing unused memory breaks memory accounting like @kwvg mentioned above. Also, shrink_to_fit() should do no reallocation in this case if I understand it correctly so we should have no overhead i.e. we just clean things up here a bit with no downsides.

@knst
Copy link
Collaborator

knst commented Aug 27, 2024

The problem is that not releasing unused memory breaks memory accounting like @kwvg mentioned above.

Could you show how exactly memory is accounting?

@UdjinM6
Copy link
Author

UdjinM6 commented Aug 27, 2024

The problem is that not releasing unused memory breaks memory accounting like @kwvg mentioned above.

Could you show how exactly memory is accounting?

check kwvg@aa3c25d

Copy link
Member

@PastaPastaPasta PastaPastaPasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK c2c4b2b

@@ -21,6 +21,7 @@ class CNetMsgMaker
msg.m_type = std::move(msg_type);
msg.data.reserve(4 * 1024);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to drop reserve then?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think so, check commit message here

Copy link
Collaborator

@knst knst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK c2c4b2b

@PastaPastaPasta PastaPastaPasta merged commit d16e953 into dashpay:develop Aug 31, 2024
41 checks passed
PastaPastaPasta added a commit that referenced this pull request Sep 4, 2024
, bitcoin#24021, bitcoin#24543, bitcoin#26844, bitcoin#25325, bitcoin#28165, partial bitcoin#20524, bitcoin#26036, bitcoin#27981 (networking backports: part 7)

76a458e fmt: apply formatting suggestions from `clang-format-diff.py` (Kittywhiskers Van Gogh)
63962ec merge bitcoin#28165: transport abstraction (Kittywhiskers Van Gogh)
c6b9186 merge bitcoin#25325: Add pool based memory resource (Kittywhiskers Van Gogh)
8c986d6 partial bitcoin#27981: Fix potential network stalling bug (Kittywhiskers Van Gogh)
13f6dc1 merge bitcoin#26844: Pass MSG_MORE flag when sending non-final network messages (Kittywhiskers Van Gogh)
caaa0fd net: use `std::deque` for `vSendMsg` instead of `std::list` (Kittywhiskers Van Gogh)
2ecba6b partial bitcoin#26036: add NetEventsInterface::g_msgproc_mutex (Kittywhiskers Van Gogh)
f6c9439 merge bitcoin#24543: Move remaining globals into PeerManagerImpl (Kittywhiskers Van Gogh)
dbe41ea refactor: move object request logic to `PeerManagerImpl` (Kittywhiskers Van Gogh)
112c4e0 merge bitcoin#24021: Rename and move PoissonNextSend functions (Kittywhiskers Van Gogh)
6d690ed merge bitcoin#23970: Remove pointless and confusing shift in RelayAddress (Kittywhiskers Van Gogh)
87205f2 merge bitcoin#21327: ignore transactions while in IBD (Kittywhiskers Van Gogh)
51ad8e4 merge bitcoin#21148: Split orphan handling from net_processing into txorphanage (Kittywhiskers Van Gogh)
cbff29a partial bitcoin#20524: Move MIN_VERSION_SUPPORTED to p2p.py (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * Dependent on #6098

  * Dependent on #6233

  * `p2p_ibd_txrelay.py` was first introduced in [bitcoin#19423](bitcoin#19423) to test feefilter logic but on account of Dash not having feefilter capabilities, that backport was skipped over but on account of the tests introduced in [bitcoin#21327](bitcoin#21327) that test capabilities present in Dash, a minimal version of `p2p_ibd_txrelay.py` has been committed in.

  * `vSendMsg` is originally a `std::deque` and as an optimization, was changed to a `std::list` in 027a852 ([dash#3398](#3398)) but this renders us unable to backport [bitcoin#26844](bitcoin#26844) as it introduces build failures. The optimization has been reverted to make way for the backport.

    <details>

    <summary>Compile failure:</summary>

    ```
    net.cpp:959:20: error: invalid operands to binary expression ('iterator' (aka '_List_iterator<std::vector<unsigned char, std::allocator<unsigned char>>>') and 'int')
                if (it + 1 != node.vSendMsg.end()) {
                    ~~ ^ ~
    /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_bvector.h:303:3: note: candidate function not viable: no known conversion from 'iterator' (aka '_List_iterator<std::vector<unsigned char, std::allocator<unsigned char>>>') to 'ptrdiff_t' (aka 'long') for 1st argument
      operator+(ptrdiff_t __n, const _Bit_iterator& __x)
    [...]
    1 error generated.
    make[2]: *** [Makefile:11296: libbitcoin_server_a-net.o] Error 1
    make[2]: *** Waiting for unfinished jobs....
    make[2]: Leaving directory '/src/dash/src'
    make[1]: *** [Makefile:19171: all-recursive] Error 1
    make[1]: Leaving directory '/src/dash/src'
    make: *** [Makefile:799: all-recursive] Error 1
    ```

    </details>

  * The collection of `CNode` pointers in `CConnman::SocketHandlerConnected` has been changed to a `std::set` to allow for us to erase elements from `vReceivableNodes` if the node is _also_ in the set of sendable nodes and the send hasn't entirely succeeded to avoid a deadlock (i.e. backport [bitcoin#27981](bitcoin#27981))

  * When backporting [bitcoin#28165](bitcoin#28165), `denialofservice_tests` has been modified to still check with `vSendMsg` instead of `Transport::GetBytesToSend()` as changes in networking code to support LT-SEMs (level-triggered socket events mode) mean that the message doesn't get shifted from `vSendMsg` to `m_message_to_send`, as the test expects.
    * Specifically, the changes made for LT-SEM support result in the function responsible for making that shift (`Transport::SetMessageToSend()` through `CConnman::SocketSendData()`), not being called during the test runtime.

  * As checking `vSendMsg` (directly or through `nSendMsgSize`) isn't enough to determine if the queue is empty, we now also check with `to_send` from `Transport::GetBytesToSend()` to help us make that determination. This mirrors the change present in the upstream backport ([source](https://github.com/bitcoin/bitcoin/pull/28165/files#diff-00021eed586a482abdb09d6cdada1d90115abe988a91421851960e26658bed02R1324-R1327)).

  ## Breaking Changes

  * `bandwidth.message.*.bytesSent` will no longer include overhead and will now only report message size as specifics that let us calculate the overhead have been abstracted away.

  ## Checklist:

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)**
  - [x] I have added or updated relevant unit/integration/functional/e2e tests
  - [x] I have made corresponding changes to the documentation **(note: N/A)**
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  PastaPastaPasta:
    utACK 76a458e

Tree-SHA512: 2e47c207c1f854cfbd5b28c07dd78e12765ddb919abcd7710325df5d253cd0ba4bc30aa21545d88519e8acfe65638a57db4ca66853aca82fc355542210f4b394
@UdjinM6 UdjinM6 modified the milestones: 21.2, 22 Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants