Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support zos4 deployments #3462

Open
AhmedHanafy725 opened this issue Sep 25, 2024 · 30 comments
Open

Support zos4 deployments #3462

AhmedHanafy725 opened this issue Sep 25, 2024 · 30 comments
Assignees
Labels
Milestone

Comments

@AhmedHanafy725
Copy link
Contributor

AhmedHanafy725 commented Sep 25, 2024

Which package/s are you suggesting this feature for?

Dashboard, grid_client

Is your feature request related to a problem? Please describe

There are now 2 different zos on the nodes (zos3, zos4), so the dashboard and grid client should identify the node zos version and use the corresponding workloads to deploy on them.

zos4 currently doesn't support wireguad, yggdrasil, or public ips. so it will require these filters from the grid proxy to be supported threefoldtech/tfgrid-sdk-go#1209

Describe the solution you'd like

  • 2 new workloads should be added

  • Before starting the deployment, the grid client should check the node version and decide which workloads to be used to deploy the vm and network.

  • Adjust the automatic and manual filters to respect the newly added queries on the gridproxy to suggest valid nodes for deployment.

@ramezsaeed ramezsaeed added this to the 2.6.0 milestone Sep 29, 2024
@ramezsaeed ramezsaeed moved this to Blocked in 3.15.x Sep 29, 2024
@ramezsaeed ramezsaeed moved this from Blocked to Accepted in 3.15.x Sep 29, 2024
@ramezsaeed ramezsaeed removed the status in 3.15.x Sep 29, 2024
@0oM4R
Copy link
Contributor

0oM4R commented Sep 29, 2024

  • we just need to add the new network filters in the gridclient, and on manual filter we need to check if the node support the selected options
  • we need to handle listing as well, how to list zos4 workloads

start with the workloads till we have filters ready on gridproxy

@maayarosama
Copy link
Contributor

  1. Work Completed:
  • Added workloads for zmachinelight and network light
  • Added Unit tests for both
  1. Work in Progress (WIP):
  • Trying to figure out why I can't import newly added workloads even after building
  • Adding new filters in grid_client
  • On deploying checking the node features

@maayarosama maayarosama moved this from Accepted to In Progress in 3.15.x Oct 3, 2024
@xmonader
Copy link
Contributor

xmonader commented Oct 8, 2024

Should we move it to in progress?

@maayarosama maayarosama moved this to In Progress in 3.15.x Oct 8, 2024
@maayarosama
Copy link
Contributor

Work in Progress (WIP):

Added a helper method to match filters with features and I'll be adding more networks filters so the method works well

Added a Vmlight primitive class
Currently working on network light primitive class

@maayarosama
Copy link
Contributor

Added a network light class in primitives, still need some adjustments

@maayarosama
Copy link
Contributor

Network light class is ready, trying to deploy a zos4 machines and I get multiple different errors

@AhmedHanafy725 AhmedHanafy725 removed this from 3.15.x Nov 3, 2024
@AhmedHanafy725 AhmedHanafy725 modified the milestones: 2.6.0, 2.7.0 Nov 3, 2024
@maayarosama
Copy link
Contributor

Deployed a machine with new workloads and worked on the extracting features from filters method on grid client
Currently trying to add filters like mycelium, wiregaurd and planetary to the filtersoption in dashboard

@maayarosama
Copy link
Contributor

Work Completed:

  • Added mycelium and planetary to FiltersOption, still struggling with wiregaurd
  • Added features to FiltersOptions
  • Updated automated and manual selection so that features validations is taken to consideration

Work in Progress (WIP):

  • Working on adding wiregaurd to FiltersOptions
  • Updating the deploy method to support zos4 workloads

@maayarosama
Copy link
Contributor

Work Completed:

  • Added wiregaurd to FiltersOptions
  • Updated the GetFeaturesFromFilters method
  • Updated Networklight primitive class to include multiple needed methods
  • Updated _createDeployment and create methods to support zos4 deployments

Work in Progress (WIP):

  • Trying to deploy a zos4 and zos3 VM and I get the following error, My network is stable
       Maximum number of attempts exceeded
       Please check your internet connection and try again. If the problem persists, please contact our support.```

@maayarosama
Copy link
Contributor

Work Completed:
I deployed a a zos3 machine successfully
Work in Progress (WIP):

Trying to deploy a zos4 machine but I get an error the node id isn't set on the network it's set to undefined

@maayarosama
Copy link
Contributor

Work Completed:

  • Adjusted the load method to get contracts and save nodeIds to an array
  • Adjusted addNode method to check if node exists
  • Adjusted nodeExsits method to check in node id is included in NodeIds array

Work in Progress (WIP):

  • When call getFreeSubnet method in networklight primitive I get the following error as network instance isn't set, I'm currently trying to follow along the network primitives and trying to figure out and debug when to set the network instance
    TypeError: Cannot read properties of undefined (reading 'node_id')

@maayarosama
Copy link
Contributor

maayarosama commented Nov 10, 2024

Work in Progress (WIP):

  • Am I correct in assuming that every time I try to deploy a zos4 machine , The IP value will always be undefined so the userIPsubnet will always be undefined

    let userIPsubnet;
    let accessNodeSubnet;
    //ip el machine nafse we validate the subnet in network and network light
    if (ip) {
    userIPsubnet = network.ValidateFreeSubnet(Addr(ip).mask(24).toString());

    so the subnet passed to the addNode method will always be undefined?
    const znet_workload = await network.addNode(nodeId, mycelium, description, userIPsubnet, myceliumNetworkSeeds);

  • I get the following error while trying to deploy a zos4 instance

Failed to send request to twinId 9458 with command: zos.deployment.deploy, payload: 
{
   "version":0,
   "twin_id":79,
   "contract_id":169871,
   "expiration":0,
   "metadata":"{\"version\":4,\"type\":\"network-light\",\"name\":\"hello9test\",\"projectName\":\"vm/new9MY\"}",
   "description":"test deploying single VM with mycelium via ts grid3 client",
   "workloads":[
      {
         "version":0,
         "name":"hello9test",
         "type":"network",
         "data":{
            "subnet":"10.249.2.0/24",
            "mycelium":{
               "hex_key":"6ee79969926454b19b0862c8948f7e23ab186ac7005d4597862f34be29411d9f",
               "peers":[
                  
               ]
            },
            "ip_range":"10.249.0.0/24",
            "node_id":259
         },
         "metadata":"{\"version\":4}",
         "description":"test deploying single VM with mycelium via ts grid3 client"
      }
   ],
   "signature_requirement":{
      "requests":[
         {
            "twin_id":79,
            "required":false,
            "weight":1
         }
      ],
      "weight_required":1,
      "signatures":[
         {
            "twin_id":79,
            "signature":"fc4fe32a01767cd8b605583130f4457951baee1d26bbd7ffc2c3de22c259c072774e269259537d1acba6bd8c5f0bff9e41b5edea49a0df5a83dde6a75ece9d81",
            "signature_type":"sr25519"
         }
      ]
   }
}
0 invalid reservation type 'network'

@AhmedHanafy725
Copy link
Contributor Author

Work in Progress (WIP):

  • Am I correct in assuming that every time I try to deploy a zos4 machine , The IP value will always be undefined so the userIPsubnet will always be undefined
    so the subnet passed to the addNode method will always be undefined?

no, user can pick the vm ip, therefore the subnet could have a value

  • I get the following error while trying to deploy a zos4 instance
Failed to send request to twinId 9458 with command: zos.deployment.deploy, payload: 
{
   "version":0,
   "twin_id":79,
   "contract_id":169871,
   "expiration":0,
   "metadata":"{\"version\":4,\"type\":\"network-light\",\"name\":\"hello9test\",\"projectName\":\"vm/new9MY\"}",
   "description":"test deploying single VM with mycelium via ts grid3 client",
   "workloads":[
      {
         "version":0,
         "name":"hello9test",
         "type":"network",
         "data":{
            "subnet":"10.249.2.0/24",
            "mycelium":{
               "hex_key":"6ee79969926454b19b0862c8948f7e23ab186ac7005d4597862f34be29411d9f",
               "peers":[
                  
               ]
            },
            "ip_range":"10.249.0.0/24",
            "node_id":259
         },
         "metadata":"{\"version\":4}",
         "description":"test deploying single VM with mycelium via ts grid3 client"
      }
   ],
   "signature_requirement":{
      "requests":[
         {
            "twin_id":79,
            "required":false,
            "weight":1
         }
      ],
      "weight_required":1,
      "signatures":[
         {
            "twin_id":79,
            "signature":"fc4fe32a01767cd8b605583130f4457951baee1d26bbd7ffc2c3de22c259c072774e269259537d1acba6bd8c5f0bff9e41b5edea49a0df5a83dde6a75ece9d81",
            "signature_type":"sr25519"
         }
      ]
   }
}
0 invalid reservation type 'network'

wasn't node 259 having an issue while testing the new workloads?

@maayarosama
Copy link
Contributor

Work in Progress (WIP):

  • 259 was down so I was blocked for a while waiting for zos4 nodes to be rebooted, trying to deploy on node 270 gives me the same error as above. While deploying on the node from the deployment script, the deployment is successful.
  • While testing on the dashboard I noticed that no nodes were found after some debugging I noticed that sometimes the features has zmachine and zmachine-light. If we want to show both zos4 and zos3 nodes when it's applicable I suggest using two features array and doing two requests one for zos4 and one for zos3

@maayarosama
Copy link
Contributor

Work in Progress (WIP):

@maayarosama
Copy link
Contributor

maayarosama commented Nov 14, 2024

blocked on 0 invalid reservation type 'network' error

@maayarosama
Copy link
Contributor

Work Completed:

  • Fixed the invalid reservation error, The issue was in network light and VM light primitives as the workload type wasn't set correctly
  • Deployed successfully on node 255, contract id is 172018

Work in Progress (WIP):

  • seprating contracts for network light and zmachine light
  • Documenting the new feature

Note:

  • While trying to deploy on node 259 it takes a lot of time then fails, same thing happens with @rawdaGastan while trying to deploy from go client
    throw new TimeoutError(`Deployment with contract_id: ${contract_id} failed to be ready after ${timeout} minutes.`); ^ TimeoutError: Deployment with contract_id: 172007 failed to be ready after 10 minutes.

@maayarosama
Copy link
Contributor

Work Completed:

  • separating contracts for network light and zmachine light
  • Adding a single_vm_zos4.ts script
  • Documenting the deployment flow

@maayarosama
Copy link
Contributor

Investigation and Solution::

When deleting zos4 deployments I was using delete_all_contracts.ts script. So I was testing according to ramez's added test scenarios and what happens as follows:

  • When deleting from the deployment list in the dashboard the network-light contract isn't deleted
  • When canceling the deployment from the grid_client scripts, no contracts are deleted and they're visible in my contracts list

@maayarosama
Copy link
Contributor

Work in Progress (WIP):

  • I ran into another error while deploying a new VM with the same IPrange as a deployed one return this.network.subnet; ^ TypeError: Cannot read properties of undefined (reading 'subnet')
  • The grid client now deletes the VM contract too but not the network contract

@maayarosama
Copy link
Contributor

Work in Progress (WIP):

  • I was trying to deploy on node 271 but if failed after investigation the getFreeIp method is failing because some values are undefined
  • Added some method in network light primitive so it can be used for deleting contracts

@maayarosama
Copy link
Contributor

Work Completed:

  • Issues were resulted from my last commit, I reverted the commit and everything works fine
  • Deployed on 271 and 255 successfully

Work in Progress (WIP):

  • Deleting the networkLight contract with the VM contract

@maayarosama
Copy link
Contributor

Work in Progress (WIP):

  • Adjusting _deleteMachineNetwork function so it can also delete networklight workloads
  • Fixed load function in networklight primitive
  • Added deleteReservedIp and getNodeReservedIps functions in networklight primitive
  • Working on deleteNode function in network primitive which retrieves the contract id so it can be deleted

@maayarosama
Copy link
Contributor

Work in Progress (WIP):

Added some changes in networklight primitive and added some more adjustments in _deleteMachineNetwork function.
Now blocked since TFchain is down

@maayarosama
Copy link
Contributor

Work Completed:

  • Handled the undefined public_ip error that was randomly shown while deleting from deployment list
  • Adjusted the manual selection to show what features are missing

Work in Progress (WIP):

  • Finding a way to only to only show myceluim as an option while managing domain, while debugging I noticed that loadVms get the contracts from graphql, I need to check if graphql returns the workload type
  • Adjusting the features filter in all needed scripts

@maayarosama
Copy link
Contributor

Investigation and Solution:

Still debugging why reserved ips aren't saved, as everything from nodes to deployments of network is saved and retrieved correctly

@maayarosama
Copy link
Contributor

maayarosama commented Dec 8, 2024

Work Completed:

  • Adjusting the features filter in all needed scripts

@maayarosama
Copy link
Contributor

Work Completed:

  • Adjusted Supported interface in manage domains in case of zmachine-light

@maayarosama
Copy link
Contributor

Work Completed:

  • Adjusted zos4 script
  • merged functionalities of delete zmachine and zmachine light in one condition to get rid of redundant code

Work in Progress (WIP):

  • add a features enum

@maayarosama
Copy link
Contributor

Work Completed:

  • Adjusted features everywhere to use the newly created enum
  • Refactored the VM primitive file to have a base abstract class that the other two classes can extend and override its methods
  • Debugged why node 249 was valid in the manual selection and not the automated selection. Manual selection doesn't validate cru, mru and sru. And that's not related to this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants