Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nvidia] Skip SAI discovery on ports on fast-boot #1416

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stepanblyschak
Copy link
Contributor

@stepanblyschak stepanblyschak commented Aug 21, 2024

Given that modern systems have lots of ports, performing SAI discovery takes very long time, e.g. (8 sec) for 256 port system. This has a big impact of fast-boot downtime and the discovery itself is not required for Nvidia platform fast-boot.

Same applies to Nvidia fastfast-boot (aka warm-boot), yet needs to be tested separately.

Given that modern systems have lots of ports, performing SAI discovery
takes very long time, e.g. 8 sec for 256 port system.
This has a big impact of fast-boot downtime and the discovery itself
is not required for Nvidia platform fast-boot.

Same applies to Nvidia fastfast-boot (aka warm-boot), yet needs to be
tested separately.

Signed-off-by: Stepan Blyschak <[email protected]>
Copy link
Collaborator

@kcudnik kcudnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will lead to inconsistency ASIC_DB vs what's on device, which will later on lead to crash

@@ -89,7 +89,8 @@ namespace syncd

virtual void onPostPortCreate(
_In_ sai_object_id_t port_rid,
_In_ sai_object_id_t port_vid) = 0;
_In_ sai_object_id_t port_vid,
_In_ bool discoverPortObjects = true) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very strict to ports, if we decide later on to do something similar on other objects then this is not optimal solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is meant to be used on ports. Considering current approach, I assume there will be onPostXCreate() functions for other object types. Then, if needed, they can accept a boolean flag in the same way. This is simple and gives required granularity.

Comment on lines +5304 to +5308
#ifdef SKIP_SAI_PORT_DISCOVERY_ON_FAST_BOOT
const bool discoverPortObjectsInFastBoot = false;
#else
const bool discoverPortObjectsInFastBoot = true;
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast boot cak be initiated after code was compiled which then this check will be hardcoded

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there are no tests for testing this code

Copy link
Contributor Author

@stepanblyschak stepanblyschak Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast boot cak be initiated after code was compiled which then this check will be hardcoded

This was the intention. For Nvidia - skip discover on ports in fast boot. The runtime check for fast boot is done in the condition below.

@stepanblyschak
Copy link
Contributor Author

stepanblyschak commented Nov 26, 2024

this will lead to inconsistency ASIC_DB vs what's on device, which will later on lead to crash

@kcudnik Yes, current design leads to performance problems on devices with lots of ports (could be 512, 1024 and more - tens of thousands keys to insert to ASIC_DB on init).
Could you suggest a test to cause a crash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants