Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DASH HA session API design. #532

Merged
merged 22 commits into from
Apr 4, 2024
Merged

Conversation

r12f
Copy link
Collaborator

@r12f r12f commented Mar 7, 2024

This change is made to add HA session API HLD and P4 changes for SAI API generation.

To describe how the HA session API works, the HLD in this change contains:

  • The key fundamental components that used to form the HA set topology and their SAI API design.
  • How we could implement HA in DASH behavior model and how each components works with each other.
  • The life of the packet for inline flow sync.
  • The SAI API call sequence of several typical HA related work flows.

The change also contains the P4 code that generates the SAI APIs. And the generated APIs can be previewed in this branch here: https://github.com/r12f/SAI/tree/user/r12f/ha-api/experimental.

#ifndef _DASH_STAGE_HA_P4_
#define _DASH_STAGE_HA_P4_

control ha_stage(inout headers_t hdr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r12f, as discussed can we add a capability attribute for DPU scope versus ENI scope HA.

Copy link
Collaborator Author

@r12f r12f Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mukesh, sure thing! And as we chatted, it will be better to add it in another PR.

The DPU HA capability will make more sense to be added along with the DPU API or a dedicated one, if needed. Putting the DPU HA capability into ENI HA API will be really weird...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let me get some capacities in there, we can probably model this in a generic way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mukeshmv , I have added the capabilities in the PR :D. Feel free to help take a look!

typedef enum _sai_ha_set_event_t
{
/** Any HA set state is changed, such as data plane channel goes down. */
SAI_HA_SET_STATE_CHANGED,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum should have the possible states to remove the need to read them from the attribute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yea. i am mostly following what other SAI events is doing, but please correct me if my understanding is not right.

yes, the event enum doesn't contain the detailed state, but we don't need to read them from HA set attribute using query. the reason is the changed attributes will be reported as attr_count and attr in the sai_ha_set_event_data, hence we can directly fetch them there.

I found this approach being a repetitive pattern in existing SAI APIs, so i am mostly following this pattern. It does simplify the event data definition and won't cause state explosion on the enum value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notification data doesn't carry attributes but the type that attribute has. See https://github.com/opencomputeproject/SAI/blob/master/inc/saiport.h#L90C5-L90C28 for reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, i know ACL and Port notification is doing it, but it might not be extendable, because we need to change the API whenever a new field needs to be added.

on the otherhand, FDB, packet event are using attributes directly, which seems to be a more flexiable approach, but i guess the thing I am missing here is the bar, FDB needs to report all its attributes, hence designed like that.

so no problem at all, let me get the event updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FDB attribute is used to indicate a bridge ID and an entry type - https://github.com/sonic-net/sonic-swss/blob/master/orchagent/fdborch.cpp#L1054, not the state. State is passed as part of the notification data separately. Why would it be harder to extend if both are using the same type?

_In_ const sai_ha_set_event_data_t *ha_set_event_data);
```

#### 4.6.2. HA scope event notifications
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems redundant, HA scope does not define liveliness API for this event, it's in the HA_SET.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the HA scope state change event is used for reporting the HA state changes. For example, whenever we call SAI API to update the HA role to "Active". It might take some time for the implementation to process the request and really switch to active. Then whenever it is completed, we can send a HA scope event with the HA role set to "Active" in its event data.

This also helps the case where DPU decides to drive the HA state machine without using NPU (HLD here). Whenever the HA role is changed by DPU, it will be reported using this event notification as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting HA scope role is synchronous. the set_attribute call should not return until it is completed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, it could be synchronous for the first case. But for the second case, this notification will be required, otherwise we won't be able to know the state being changed.

typedef enum _sai_ha_set_event_t
{
/** Any HA set state is changed, such as data plane channel goes down. */
SAI_HA_SET_STATE_CHANGED,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notification data doesn't carry attributes but the type that attribute has. See https://github.com/opencomputeproject/SAI/blob/master/inc/saiport.h#L90C5-L90C28 for reference.

_In_ const sai_ha_set_event_data_t *ha_set_event_data);
```

#### 4.6.2. HA scope event notifications
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting HA scope role is synchronous. the set_attribute call should not return until it is completed.

@r12f
Copy link
Collaborator Author

r12f commented Apr 4, 2024

Woot! Thanks @mukeshmv and @marian-pritsak for reviewing! I am merging it now.

@r12f r12f merged commit 31b94c8 into sonic-net:main Apr 4, 2024
12 checks passed
@r12f r12f deleted the user/r12f/ha-session-api branch April 4, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants