-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added SmartSwitch support in chassisd and enabling chassisd #467
base: master
Are you sure you want to change the base?
Conversation
d5ed9fc
to
0785bb1
Compare
platform identifier.
temp fix to get the build passing.
What are the states supported by the DPUs in the Smart Switch? |
”dpu_midplane_link_state” |
accidental removal of a few lines from the original chassisd file
sonic-chassisd/scripts/chassisd
Outdated
@@ -184,7 +188,7 @@ class ModuleUpdater(logger.Logger): | |||
self.chassis = chassis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that instead of modifying the existing ModuleUpdater
class, we should implement SmartSwitchModuleUpdater(ModuleUpdater)
. SmartSwitchModuleUpdater
should be derived from ModuleUpdater
and overwrite the methods that should behave differently for the Smart Switch. This approach allows us to keep the original implementation untouched and guarantees full backward compatibility with the chassisd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will consider this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as commented
sonic-chassisd/scripts/chassisd
Outdated
self.update_dpu_state(key, "UP") | ||
|
||
elif midplane_access is False and current_midplane_state == 'False': | ||
if self.is_module_reboot_system_up_expired(module_key): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpunathilell No we don't, cleaned it.
sonic-chassisd/scripts/chassisd
Outdated
midplane_state = 'False' | ||
if isinstance(fvs, list) and fvs[0] is True: | ||
fvs = dict(fvs[-1]) | ||
midplane_state = fvs[CHASSIS_MIDPLANE_INFO_ACCESS_FIELD] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking for the midplane_state
here in the DB (and if it is not present in the DB we are assuming that the module is not reachable since midplane_state=False
by default) but in case of Nvidia platform the midplane_state
is being updated to the DB by Chassisd itself, so chassisd here is checking for midplane state in DB before it is supposed to update the midplane state itself. Either we need to re order code so that we update midplane_state to db before we check the DB for values, or instead of checking the DB we can directly invoke the midplane reachability API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this may cause issues, assuming the dpu is powered on initially, by default if there is no data in DB we assume the midplane_state is false, if no config is present the DPU is supposed to power off, but admin_state == 'down' and midplane_state is false, so the operation is skipped, causing DPU to stay on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpunathilell Using dpu operational_state and the API to get this as planned.
… to call the platform API
30fe529
to
364f522
Compare
25cfe16
to
3d35649
Compare
|
||
formatted_time = dt_obj.strftime("%a %b %d %I:%M:%S %p UTC %Y") | ||
|
||
reboot_cause_dict = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_reboot_cause
is supposed to return a tuple - a reboot cause and an accompanying description (from HLD) when we perform json.dump
with a tuple, it is saved as a list can we split here as "cause" and "comment" separately? otherwise the # Publish the reboot cause information to CHASSIS_STATE_DB
step fails in chassisd because we are performing hset
to chassis_state_db with a list (Example return value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpunathilell Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpunathilell Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
|
||
# Prepare the updated data | ||
updates = { | ||
"dpu_midplane_link_state": state, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting suggestion, Can we have the midplane state similar to admin state and have in lower case? the show interfaces status
also displays in lower case, Can be changed later if it takes time, also the time format can be changed to "%Y-%m-%d %H:%M:%S"
(here or in sonic-utilities system_health command - since we are displaying this data to user)
if admin_state == 'up' and operational_state != ModuleBase.MODULE_STATUS_ONLINE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpunathilell Made dpu_midplane_link_state lowercase. The time is already in the format "%Y-%m-%d %H:%M:%S" right?
The REBOOT_CAUSE_DIR is not accessible from the pmon docker where chassisd is running, (theres a diffierent /host/reboot created in the docker ) |
@gpunathilell Have you used the PR? This is needed for pmon docker to mount the volume. |
No, I see that the location is mounted in pmon in that PR, will include, please ignore comment |
updates = { | ||
"dpu_midplane_link_state": state, | ||
"dpu_midplane_link_reason": "", | ||
"dpu_midplane_link_time": datetime.now().strftime("%Y%m%d %H:%M:%S"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy I mean this line
"dpu_midplane_link_time": datetime.now().strftime("%Y%m%d %H:%M:%S"), | |
"dpu_midplane_link_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), |
Will be displayed as such when we do show system-health dpu
:
DPU2 Online dpu_midplane_link_state up 2024-11-25 23:38:21
PLATFORM_ENV_CONF_FILE = "/usr/share/sonic/platform/platform_env.conf" | ||
PLATFORM_JSON_FILE = "/usr/share/sonic/platform/platform.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see platform.json file exists in /usr/share/sonic/devices//platform.json, am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvolam The file exists, we are accessing it here
Added SmartSwitch support in chassisd and enabling chassisd for fixed SmartSwitches
Description
chassisd is enabled only for modular chassis. Smartswitch is a fixed chassis. However it has been treated like a modular chassis to manage the DPU cards just like the line-cards of a modular chassis. chassisd will be enabled only on the smartswitch NPU and hence the scope of these changes are limited only to NPU and not applicable to DPU.
Motivation and Context
chassisd is enabled only for modular chassis. Smartswitch is a fixed chassis. However it has been treated like a modular chassis to manage the DPU cards just like the line-cards of a modular chassis. Hence, chassid is needed for SmartSwitch and enabled here. Also, some of the table updates and clean up are not required for SmartSwitch platform and hence using the is_smartswitch API to selectively enable it. the text "fixes #xxxx", "closes #xxxx" or "resolves #xxxx" here
How Has This Been Tested?
Additional Information (Optional)