-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Drop-0.5] Complete implementation for slowpath and CI based on the new architecture of opi-evpn-br #377
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #377 +/- ##
===========================================
- Coverage 58.92% 24.93% -33.99%
===========================================
Files 30 44 +14
Lines 1816 5575 +3759
===========================================
+ Hits 1070 1390 +320
- Misses 650 4022 +3372
- Partials 96 163 +67 ☔ View full report in Codecov by Sentry. |
af8ad2b
to
14be5a5
Compare
617c746
to
6f9fdcd
Compare
Can we split this patch to manageable chunks? It is hard to review +6,528 −2,984 |
Hello @artek-koltun, Unfortunately we cannot split this PR to multiple patches. It has been agreed on the TSC that because this is a rearchitecture effort of the opi-evpn-br the whole code will be developed in a fork and will be introduced as a single PR for review when significant milestone is reached. The milestone that we have reached is that we have a fully working CI env and support for IPU slowpth configuration |
a739620
to
8fcb968
Compare
Co-authored-by: Vemula Venkatesh <[email protected]> Co-authored-by: Patel Atul <[email protected]> Co-authored-by: Saikumar Banoth <[email protected]> Co-authored-by: Jambekar Vishakha <[email protected]> Signed-off-by: Dimitrios Markou <[email protected]>
Signed-off-by: Dimitrios Markou <[email protected]>
Hello @sandersms @artek-koltun Can you please give us some reviews on this PR ? Thank you in advance |
I completely agree with @artek-koltun |
Hello @glimchb, In the TSC call in 14th of Dec 2023 has been agreed that because this is a rearchitecture effort we will start with a big PR when a significant Milestone (this PR here) has been reached and when this is merged more smaller PRs will follow with additional code changes. TSC has agreed to that plan so we have followed this plan. So unfortunately I cannot break this PR to smaller ones. Here you can find the recording with the plan explanation (00:36:20) and the decision (00:43:50) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments
Makefile
Outdated
@@ -13,6 +13,10 @@ build: | |||
@echo " > Building binaries..." | |||
@CGO_ENABLED=0 go build -o ${PROJECTNAME} ./cmd/... | |||
|
|||
build-arm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's unify with godpu
opiproject/godpu@6e3f317
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ipu_linux.Init() | ||
frr.Init() | ||
|
||
case "ci": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe in readme what those modes used for?
It would be nice to describe there also the main components, how they interact with each other etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @artek-koltun I will take a look on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @artek-koltun I will add a readme in the docs folder the where I will describe the architecture and also how the different modes are used. I will also add some flow digramms there. I hope that will do the work. The reason that I add that in the docs folder and not in the main Readme is because I do not want to pollute that with too much information.
Hope that is alright
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @artek-koltun I have added the markdowns with the architecture and flow chart examples in the docs folder. You can take a look if you want.
cmd/main.go
Outdated
var redisAddress string | ||
flag.StringVar(&redisAddress, "redis_addr", "127.0.0.1:6379", "Redis address in ip_address:port format") | ||
switch config.GlobalConfig.Buildenv { | ||
case "ipu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPU is only Intel specific term. If I am not mistaken, it was agreed to use more widespread dpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is generic enough and can be used for different IPU/dpus that you have underneath and as well for the CI env where you have no IPU/DPU. Here the case : ipu is when you have an IPU smartnic underneath. Even though that the code is generic some modules need to be provided by the vendor to address the specific smartnic that you have underneath so with this case: ipu we capture the case and we load some of the specific modules that are realted to the IPU card. If for instance you had a dpu bluefield as a card the vendor will need to provide some specific modules for that card and then we will create a case dpu bluefield for that specific card where the ppropriate modules will be loaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. I mean, let's change the name and use more widespread terminology, like "dpu"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this is clear. According to the card that you have underneath you need to load certain modules. If you have an intel ipu for example you need to load the ipu_linux module. If you have a Marvell card you need to load the Marvel_linux. If you have a Bluefield you need to load a bluefield_linux module. So this is what this case does it loads the correct module related to the card that you have. So for intel IPU we have an ipu_linux module that we need to load. We cannot just change that to dpu because dpu can be Marvell or Bluefield. BTW by ipu here we are talking only for an Intel IPU. So the IPU case is pretty unique but the dpu case it isn't because you can have a Marvell dpu or a Bluefield dpu.
Is that more clear now ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard that the term ipu is also used to other dpus/xpus from different vendors within Intel.
Then let's be more specific and name it ipu_mev
/mev
since the description says Mount Evans or ipu_intel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes ok let me think about this a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @artek-koltun what do you think about intel_e2000? E2000 is the part name for a mount evans. So this means that all the vendor related files in the code will have the same name.
If you agree with this change we can start do the changes on the naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intel_e2000
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @atulpatel261194 Can you please update where is needed the name ipu with the name intel_e2000 ? Not only updating the config but also the golang filenames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cmd/main.go
Outdated
options.Codec = utils.ProtoCodec{} | ||
store, err := redis.NewClient(options) | ||
rootCmd.PersistentFlags().StringVarP(&config.GlobalConfig.CfgFile, "config", "c", "config.yaml", "config file path") | ||
rootCmd.PersistentFlags().IntVar(&config.GlobalConfig.GRPCPort, "grpcport", 50151, "The gRPC server port") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uint16Var
for this port and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cmd/main.go
Outdated
rootCmd.PersistentFlags().IntVar(&config.GlobalConfig.HTTPPort, "httpport", 8082, "The HTTP server port") | ||
rootCmd.PersistentFlags().StringVar(&config.GlobalConfig.TLSFiles, "tlsfiles", "", "TLS files in server_cert:server_key:ca_cert format.") | ||
rootCmd.PersistentFlags().StringVar(&config.GlobalConfig.DBAddress, "dbaddress", "127.0.0.1:6379", "db address in ip_address:port format") | ||
rootCmd.PersistentFlags().StringVar(&config.GlobalConfig.FRRAddress, "frraddress", "127.0.0.1", "Frr address in ip_address format, no port") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPVar
for ip addresses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After carefully considering your suggestion and evaluating our current implementation,currently sticking with using a string for IP addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and what is the justification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently FRRAddress isn't getting used, removed it as of now.
Dockerfile
Outdated
mkdir -p /etc/iproute2/ && \ | ||
echo "255 opi_evpn_br" > /etc/iproute2/rt_protos && \ | ||
cat /etc/iproute2/rt_protos && \ | ||
ls -al / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are those cat
and ls
needed only for debug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I believe this is for debug. We can remove those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/main.go
Outdated
viper.SetConfigType("yaml") | ||
viper.SetConfigName("config.yaml") | ||
} | ||
config.LoadConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if LoadConfig
failed? Should it be called only when a config file found? Should we return an error and print a message here if a config is malformed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added error log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it enough? Should we return an error to the top level an exit, since if a config is malformed, the user might get the unexpected results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added error handling before unmarshal and also added panic logs
cmd/main.go
Outdated
go runGatewayServer(grpcPort, httpPort) | ||
runGrpcServer(grpcPort, tlsFiles, frrAddress, store) | ||
// validateConfigs validates the config parameters | ||
func validateConfigs() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a dedicated configs
pkg, should we have config related things there?
also I see that we have only one rootCmd. Do we need a cobra then? To read config file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config validation is still in main...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe that is pre run for the cmd line argumnent which are necessary for setup bring up.
validating config file content is not intended as configs may differ for different vendors.
and if any error occurs we have Fatal log for read and load of the config file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have that validateConfigs
in main, how the vendors can provide their customization?
Are we going the core part? Would it make sense to move only that part into configs pkg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as suggested, moved validateConfigs to config pkg.
cmd/main.go
Outdated
flag.IntVar(&httpPort, "http_port", 8082, "The HTTP server port") | ||
err := infradb.NewInfraDB(config.GlobalConfig.DBAddress, config.GlobalConfig.Database) | ||
if err != nil { | ||
log.Println("error in creating db", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we proceed running here? Can it work without DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same as for validateConfigs
probably a new patch has not been uploaded yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added panic log if db creation fails.
cmd/main.go
Outdated
flag.StringVar(&frrAddress, "frr_addr", "127.0.0.1", "Frr address in ip_address format, no port") | ||
// Create GRD VRF configuration during startup | ||
if err := createGrdVrf(); err != nil { | ||
log.Printf("Error in creating GRD VRF %+v\n", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it function if grd vrf is not created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it cannot function because the GRD represents the underlay configuration which will be needed in the next drop [drop-1.0] where we will provide the fastpath modules which are responsible for offloading the slowpath configuration to the fastpath. So in order to configure correctly the fastpath for the underlay the GRD vrf is needed to be there so the faspath modules can find the correct inofrmation and offload the rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so let's return an error from this function. In the current state, it will continue executing, which is not coorect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ok we wil take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the error log and exit call if grd vrf is not created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Dimitrios Markou <[email protected]>
Makefile
Outdated
@@ -13,6 +13,10 @@ build: | |||
@echo " > Building binaries..." | |||
@CGO_ENABLED=0 go build -o ${PROJECTNAME} ./cmd/... | |||
|
|||
build-arm: | |||
@echo " > Building binaries..." | |||
@CGO_ENABLED=0 env GOOS=linux GOARCH=arm64 go build -o ${PROJECTNAME} ./cmd/... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @artek-koltun indicated to unify with godpu and you can also have GOOS=$(uname) to default to the local OS. If a different OS is needed such as cross-compile, then
GOOS ?= $(shell go env GOOS) # detect the os target from the environment variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
cmd/main.go
Outdated
viper.SetConfigType("yaml") | ||
viper.SetConfigName("config.yaml") | ||
} | ||
config.LoadConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the LoadConfig validate the file information that it is reading in case of errors in the information? Should that be part of ValidateConfigs operation below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 will reply on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added error handling.
@@ -293,8 +293,6 @@ services: | |||
/entrypoint.sh ls localhost:50151 opi_api.network.evpn_gw.v1alpha1.BridgePortService.UpdateBridgePort -l && \ | |||
echo toremove && \ | |||
apt update && apt install iproute2 -y && \ | |||
ip link add br-tenant type bridge vlan_default_pvid 0 vlan_filtering 1 vlan_protocol 802.1Q && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being removed since the configuration will be done through the API? Will this have implications on the CI testing and automation that need to be handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason that we remove that is because the br-tenant is created now when the opi-evpn-br starts. That means that there is no need to create the br-tenant beforehand as docker compose command. It is not created through API but is created with linux netlink command when the Linux General Module (lgm) starts. There will be no implecations on the CI testing and automation.
opi-evpn-bridge/pkg/LinuxGeneralModule/lgm.go
Line 331 in d5501d6
func setUpTenantBridge() { |
There is a ipu.go module that is under the LinuxVendorModule and the code looks to be relatively generic with the use of netlink for the controls for the network settings. Can this be considered more generic? Are there specifics that are related to the Intel IPU that I am missing here? |
Access = iota | ||
// Trunk bridge port type | ||
Trunk = iota | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to use the definitions for the Bridge Port Type from the VLAN protobuf that is in the review in the network area in the future to keep things common. Also that various status indicators. Eventually we can import that protobuf for the definitions that overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we use the pb types. What you see here is the internal translation that happens from a PB object to an internal infradb domain object. So the protobufs types internally are translated to an infradb domain object which is more ruch and we can capture on that internal information that the user doesn't want to see but it is crucial fro the system to work. So we have chosen to do an internal translation to the BP type from the PB types to the internal infradb object. This way we will not need to impor the Protobufs everywhere but ony in the place where the actual transation happens. This transation from pb types ton the internal domain types (infradb) and vise versa is an architecture/design choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may just want to track this for refactoring/optimization with the PB schemes for the future as an item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't get that fully. Can you please explain what do you mean ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the new PB types that will cover that type of operation (common protobufs), we can open an issue to track the future change that would handle this from a common perspective as well as any definitions in order to simplify the definitions and not have them spread and duplicated in various locations. Nothing to do for now until the common protobufs are reviewed/merged into the repo. Once they are merged, then we open an issue to track migration to a common definition.
The reason that this module exist is because it captures the specific configuration that needs to be handled for each card specifics. So the LinuxVendoModules do exactly that. They take care the non generic part of the card. So if you have a Bluefiled card a coresponding module is needed to capture the Bluefiled specifics. |
Signed-off-by: atulpatel261194 <[email protected]>
4a32acf
to
9ffd3be
Compare
// NewTaskQueue initializes a TaskQueue | ||
func NewTaskQueue() *TaskQueue { | ||
return &TaskQueue{ | ||
channel: make(chan *Task, 200), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 200?
Also, I see that it is used in a very strange way...
go t.taskQueue.Enqueue(task)
If a component cannot keep up with the requests, it should block the producing threads, rather than allowing them to generate billions of goroutines, which will never be handled due to upcoming Out Of Memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 200 number was a bit arbitrary in the beginning but afterwards has been tested and works without any problem for the amount of the objects that we expect to be created.
Also the number of the objects that we expect for the system to serve is not so many. In an EVPN system you do not generate so many VRFs, LBs, SVIs and BPs in parallel and in big numbers. Normally is a handfull of those. So we expect the current system to work really well in most of the cases and no OOM event to be produced. We have also ran some basic tests by creating 100 VRFs one after the other immediately and the system behaved correctly without any problem. The 100 VRFs are allready a lot. We do not expect to have in a real system more than 100. So I believe we are safe for now. If we face any problem in the future we can revisit this topic I believe. We have plans to make it multi-queue and parallel processing of the queues if we find any bottlenecks and is necessary. Of course if you have suggestions we a happy to examine them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still do not understand why we cannot block if there is no space in the channel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assume that there is no space in the channel and the buffer is full with all the 200 slots of the queue occupied which is highly unlikely but let's assume that. If there is more requests comming that means that we will have some outstanding go routines that just wait to find space in the buffer to enqueue their tasks. So even if that is the case the number of those go routines will not be too many. So what exactly you mean to block ? What exactly should we block ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if no space, we block and wait while we have a free space in the channel -> a client cannot provide more requests and flood us with too many of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artek-koltun Yes ok that makes sense. We can take a look on that but I was thinking as this is not a major problem right now we can probably open an issue on that and take a look in a later time so we can move a bit faster. WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artek-koltun I will create an issue after merging this PR as without having this code in the tree the issue doesn't have any value
// translation of pb to domain object | ||
domainLB := infradb.NewLogicalBridge(lb) | ||
// Note: The status of the object will be generated in infraDB operation not here | ||
if err := infradb.CreateLB(domainLB); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, CreateLB
will send a msg to task manager, which will find someone to really create a bridge. But why do we return here ok and subsequently from the server's CreateLogicalBridge
. Should we wait when it is really created? What if creation fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here by createLB you create a LogicalBridge intent in the DB. The actual intent is getting realized asychronously by the different components that put the configuration in place.
It is the same paradigm as in K8s. In K8s you create an intent in the database and then that intent get's realized by the different controllers. It is the same thing here.
If the intent is created in the DB then the call is succesful. We should not wait until the intent get's realized. If an error is occured during the realization of the intent then that is reported asychronously in the status of the object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an error is occured during the realization of the intent then that is reported asychronously in the status of the object
How will a client know that? By trying to get an object until it is returned successfully?
Also, our API does not say anything about intents. If a bridge returns ok, it implies that an object was created and functional.
If we introduce a concept of intents, it should be clearly stated in API. But I am not sure if it is a good idea to add intents 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we should state something in the API. An API is an API how do we choose to implement the backend that supports that API is implementation specific to my view. In our case when the bridge for instane returns ok it means that the object has been created in the DB and the realization of the intent is ongoing. We do have Pending states in the API today that demostrates that functionality
https://github.com/opiproject/opi-api/blob/main/network/evpn-gw/component.proto#L37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we have API update to indicate that pending state, ok
|
||
subscribers := eventbus.EBus.GetSubscribers("logical-bridge") | ||
if subscribers == nil { | ||
log.Println("CreateLB(): No subscribers for Logical Bridge objects") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error since no subscribers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes Maybe we should return an error if we do not have any subscribers. I will discuss this internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes @atulpatel261194 will take care of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/infradb/infradb.go
Outdated
vpns := make(map[uint32]bool) | ||
|
||
subscribers := eventbus.EBus.GetSubscribers("logical-bridge") | ||
if subscribers == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if GetSubscribers
returns not nil, but an empty array? Will it be better to use len(subscribers) == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Inbanoth @atulpatel261194 can you please reply on this ? Is the nil check enough or we should use the length ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len
works withnil
slices and with empty slices- comparison with
nil
works only with nil slices.
I think option 1 is just safer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artek-koltun what you said is right.
But here, we always expect slice to be non-empty unless there is no key (ex: "logical-bridge") exist. and if there is no key exist then it return the zero value for the slice type, which is nil for slices. In such a case, both the nil check and the length check will return true.
So we can use either of the comparisons, also i verified both works for us.
i think we can keep nil comparison as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if someone update the code and by accident return an empty slice, the code will work differently, and it can take time to identify the root cause. I think it is better to just use len
from the very beginning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Inbanoth will do the update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lbs := make(map[string]bool) | ||
_, err = infradb.client.Get("lbs", &lbs) | ||
if err != nil { | ||
log.Println(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to remove fall back and remove set vpns
err = infradb.client.Set("vpns", &vpns)
if err != nil {
log.Println(err)
return err
}
lb
and other objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we do not believe we should fall back in the operations with the DB. If there is a need for this we can discuss it in the future. We can live with this right now. I believe if we see any need we can revisit in the future. I would say let's not be too picky with the code right now. Let's make sure that the basic functionality is working currently and we can make more picky changes and enhancements in additional PRs.
WDYT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then at least an issue should be created after merging this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can create an issue just to register it as point for future discussions if that is needed just so we do not forget.
cmd/main.go
Outdated
var redisAddress string | ||
flag.StringVar(&redisAddress, "redis_addr", "127.0.0.1:6379", "Redis address in ip_address:port format") | ||
switch config.GlobalConfig.Buildenv { | ||
case "ipu": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've heard that the term ipu is also used to other dpus/xpus from different vendors within Intel.
Then let's be more specific and name it ipu_mev
/mev
since the description says Mount Evans or ipu_intel
?
return nil, err | ||
} | ||
|
||
log.Printf("UpdateBridgePort(): Bridge Port with id %v is not found so it will be created", in.BridgePort.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the Update really do an auto-create if the port is not found or should it just return not found? The update may not have all the parameters needed since it typically has a update mask of parameters being changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. According to the google AIP guidelines if we have the AllowMissing to true then it will create the object otherwise it will return an error. Check line 103.
Also the createBridgePort function internally validates also the received Spec so if a parameter is missing that needs to be there it will return error.
The Update functions are not yet tested. We have only the implementation there but is untested for the update case. We will test that in the future but right now is not first priority to have a fully working update. We have tested the Create, Delete, Get and List but not the update yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so, if the check in line 103 doesn't execute it falls through and then calls the "createBridgePort (function in line 27) and handles the processing there based on the parameters passed in and doesn't worry about the update mask settings.
My concern is that if only the update mask parameters are provided in the update request, then somewhere the defaults need to be set for the interface being created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually in order to allow the creation of an object using the field mask we first validate that the filed mask includes all the mandatory parameters. We do not really have any defaults. If the mandatory parameters are not set then we will return an error to the user.
But the update call is an untested call so far. I would like to test that in the future and revisit the matter. However I think that we are pretty safe with the current implementation that we have now
return nil, err | ||
} | ||
|
||
log.Printf("UpdateSvi(): Svi with id %v is not found so it will be created", in.Svi.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the Update really do an auto-create if the port is not found or should it just return not found? (see port/grpc.go comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered on that above
return nil, err | ||
} | ||
|
||
log.Printf("UpdateVrf(): Vrf with id %v is not found so it will be created", in.Vrf.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the Update really do an auto-create if the port is not found or should it just return not found? (see port/grpc.go comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered on that above
It makes sense; but, the current file doesn't seem to have any IPU specifics in the file, instead all of the calls are netlink commands. That seems to be more generic in usage unless I am missing something. |
Yes there are netlink commands but the names of the interfaces that are passed as parameters and the way that those interfaces are handled are IPU specific. For instance here the way that we handle these mux interfaces as we call them is something that is IPU specific and is not the same as DPU Bluefield for instance where they use representor interfaces. |
Signed-off-by: Atul Patel <[email protected]>
Signed-off-by: Saikumar, Banoth <[email protected]>
Signed-off-by: Saikumar, Banoth <[email protected]>
Signed-off-by: Atul Patel <[email protected]>
Signed-off-by: Dimitrios Markou <[email protected]>
Signed-off-by: Dimitrios Markou <[email protected]>
Signed-off-by: Atul Patel <[email protected]>
Signed-off-by: Saikumar, Banoth <[email protected]>
pkg/LinuxCIModule/lci.go
Outdated
log.Printf("LVM : VlanID %v value passed in Logical Bridge create is greater than 16 bit value\n", BrObj.Spec.VlanID) | ||
return false | ||
} | ||
//TODO: Update opi-api to change vlanid to uint16 in LogiclaBridge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsigned integers are not recommended by AIP, since not all languages have a good support for them https://linter.aip.dev/141/forbidden-types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 Can you please change the comment there from uint16 to int16 and put also this link in the comment: https://linter.aip.dev/141/forbidden-types
We can create an issue to update also the other fields of the API to use signed integers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@artek-koltun opened the issue. opiproject/opi-api#451
config-intele2000.yaml
Outdated
@@ -0,0 +1,31 @@ | |||
grpcport: 50151 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the file name: config-intele2000.yaml -> config-intel2000.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name of the chip is intel-e2000 no intel-2000 so this extra "e" is crucial. Maybe the name of the file can be config-intel-e2000.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config-intel-e2000.yaml
will be fine
Makefile
Outdated
compile: get build | ||
|
||
build: | ||
@echo " > Building binaries..." | ||
@CGO_ENABLED=0 go build -o ${PROJECTNAME} ./cmd/... | ||
@CGO_ENABLED=0 GOARCH=$(GOARCH) go build -o ${PROJECTNAME} ./cmd/... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark also proposed to add GOOS
#377 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/infradb/bridge.go
Outdated
// build time check that struct implements interface | ||
var _ EvpnObject[*pb.LogicalBridge] = (*LogicalBridge)(nil) | ||
|
||
// NewLogicalBridge creates new Logica Bridge object from protobuf message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LogicaL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
subscribers := eventbus.EBus.GetSubscribers("logical-bridge") | ||
if len(subscribers) == 0 { | ||
log.Println("NewLogicalBridge(): No subscribers for Logical Bridge objects") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot understand if missing subscribers is an error here... If it is an error, we need to return here. If it is not, do we need to print, since it is not an error?
Here, we treat it as an error https://github.com/opiproject/opi-evpn-bridge/pull/377/files#r1562602607
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I guess we missed that. @atulpatel261194 can you please return an error also in these cases here as we do in the other cases where we do not have subscribers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mardim91 , @artek-koltun New functions for each object just returns the domain specific object. we handle the error in infra db if there is no subscribers while create, delete and update of perticular object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atulpatel261194 If we handle the no sunscribers as error we should return the error also in the NewObject functions. There is no point allowing zero subscribers in the NewObjects and then retunr the error later in the CreateLB function for example. Let's handle the zero subscribers as error for now everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, fixed
Signed-off-by: Atul Patel <[email protected]>
Signed-off-by: atulpatel261194 <[email protected]>
Signed-off-by: Saikumar, Banoth <[email protected]>
Signed-off-by: atulpatel261194 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with the comments addressed and associated issues opened.
Introducing the complete implementation for CI and slowpath based on the new opi-evpn-br architecture that has been presented to opi-tsc in 15 Feb 2024 link
This PR includes:
This PR is versioned as drop-0.5 because it includes only the CI and slowpath implementation for an xPU.
The drop-1.0 that will come in a later PR will include also the fastpath implementation for an xPU