Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance monitor package #2046

Merged
merged 10 commits into from
Sep 26, 2023
12 changes: 12 additions & 0 deletions cmds/modules/noded/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"github.com/threefoldtech/zos/pkg/environment"
"github.com/threefoldtech/zos/pkg/events"
"github.com/threefoldtech/zos/pkg/monitord"
"github.com/threefoldtech/zos/pkg/perf"
"github.com/threefoldtech/zos/pkg/registrar"
"github.com/threefoldtech/zos/pkg/stubs"
"github.com/threefoldtech/zos/pkg/utils"
Expand Down Expand Up @@ -193,6 +194,17 @@ func action(cli *cli.Context) error {
return hypervisor, nil
})

log.Info().Msg("Start Perf scheduler")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use lowercase in the message

performanceMonitor := perf.NewPerformanceMonitor("/var/run/redis.sock")
Copy link
Collaborator

@xmonader xmonader Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer to use shorter var names e.g perfMon

performanceMonitor.InitScheduler()
err = performanceMonitor.RunScheduler(ctx)
if err != nil {
log.Error().Err(err).Msg("fails in scheduler")
}
bus.WithHandler("zos.perf.get", func(ctx context.Context, payload []byte) (interface{}, error) {
return performanceMonitor.Get(ctx, payload)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handling of user input is part of the public API and then it should handled here not in the performanceMonitor library or object.

This breaks separation of concerns and isolation of your design.

})

// answer calls for dmi
go func() {
if err := bus.Run(ctx); err != nil {
Expand Down
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ require (
github.com/gorilla/mux v1.8.0
github.com/gtank/merlin v0.1.1
github.com/hashicorp/golang-lru v0.5.5-0.20210104140557-80c98217689d
github.com/jasonlvhit/gocron v0.0.1
github.com/jbenet/go-base58 v0.0.0-20150317085156-6237cf65f3a6
github.com/joncrlsn/dque v0.0.0-20200702023911-3e80e3146ce5
github.com/lestrrat-go/jwx v1.1.7
Expand Down
5 changes: 5 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9
github.com/go-logfmt/logfmt v0.4.0/go.mod h1:3RMwSq7FuexP4Kalkev3ejPJsZTpXXBr9+V4qmtdjCk=
github.com/go-ole/go-ole v1.2.6 h1:/Fpf6oFPoeFik9ty7siob0G6Ke8QvQEuVcuChpwXzpY=
github.com/go-ole/go-ole v1.2.6/go.mod h1:pprOEPIfldk/42T2oK7lQ4v4JSDwmV0As9GaiUsvbm0=
github.com/go-redis/redis v6.15.5+incompatible/go.mod h1:NAIEuMOZ/fxfXJIrKDQDz8wamY7mA7PouImQ2Jvg6kA=
github.com/go-redis/redis v6.15.9+incompatible h1:K0pv1D7EQUjfyoMql+r/jZqCLizCGKFlFgcHWWmHQjg=
github.com/go-redis/redis v6.15.9+incompatible/go.mod h1:NAIEuMOZ/fxfXJIrKDQDz8wamY7mA7PouImQ2Jvg6kA=
github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
Expand Down Expand Up @@ -259,6 +260,8 @@ github.com/holiman/uint256 v1.2.2-0.20230321075855-87b91420868c/go.mod h1:SC8Ryt
github.com/hpcloud/tail v1.0.0/go.mod h1:ab1qPbhIpdTxEkNHXyeSf5vhxWSCs/tWer42PpOxQnU=
github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8=
github.com/j-keck/arping v0.0.0-20160618110441-2cf9dc699c56/go.mod h1:ymszkNOg6tORTn+6F6j+Jc8TOr5osrynvN6ivFWZ2GA=
github.com/jasonlvhit/gocron v0.0.1 h1:qTt5qF3b3srDjeOIR4Le1LfeyvoYzJlYpqvG7tJX5YU=
github.com/jasonlvhit/gocron v0.0.1/go.mod h1:k9a3TV8VcU73XZxfVHCHWMWF9SOqgoku0/QlY2yvlA4=
github.com/jbenet/go-base58 v0.0.0-20150317085156-6237cf65f3a6 h1:4zOlv2my+vf98jT1nQt4bT/yKWUImevYPJ2H344CloE=
github.com/jbenet/go-base58 v0.0.0-20150317085156-6237cf65f3a6/go.mod h1:r/8JmuR0qjuCiEhAolkfvdZgmPiHTnJaG0UXCSeR1Zo=
github.com/jgautheron/goconst v0.0.0-20170703170152-9740945f5dcb/go.mod h1:82TxjOpWQiPmywlbIaB2ZkqJoSYJdLGPgAJDvM3PbKc=
Expand Down Expand Up @@ -377,9 +380,11 @@ github.com/nxadm/tail v1.4.8 h1:nPr65rt6Y5JFSKQO7qToXr7pePgD6Gwiw05lkbyAQTE=
github.com/nxadm/tail v1.4.8/go.mod h1:+ncqLTQzXmGhMZNUePPaPqPvBxHAIsmXswZKocGu+AU=
github.com/oklog/ulid v1.3.1/go.mod h1:CirwcVhetQ6Lv90oh/F+FBtV6XMibvdAFo93nm5qn4U=
github.com/onsi/ginkgo v1.6.0/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+WWjE=
github.com/onsi/ginkgo v1.10.1/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+WWjE=
github.com/onsi/ginkgo v1.12.1/go.mod h1:zj2OWP4+oCPe1qIXoGWkgMRwljMUYCdkwsT2108oapk=
github.com/onsi/ginkgo v1.16.4 h1:29JGrr5oVBm5ulCWet69zQkzWipVXIol6ygQUe/EzNc=
github.com/onsi/ginkgo v1.16.4/go.mod h1:dX+/inL/fNMqNlz0e9LfyB9TswhZpCVdJM/Z6Vvnwo0=
github.com/onsi/gomega v1.7.0/go.mod h1:ex+gbHU/CVuBBDIJjb2X0qEXbFg53c61hWP/1CpauHY=
github.com/onsi/gomega v1.7.1/go.mod h1:XdKZgCCFLUoM/7CFJVPcG8C1xQ1AJ0vpAezJrB7JYyY=
github.com/onsi/gomega v1.10.1/go.mod h1:iN09h71vgCQne3DLsj+A5owkum+a2tYe+TOCB1ybHNo=
github.com/onsi/gomega v1.10.3/go.mod h1:V9xEwhxec5O8UDM77eCW8vLymOMltsqPVYWrpDsH8xc=
Expand Down
5 changes: 5 additions & 0 deletions pkg/perf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Perf

Perf is a performance test module than is scheduled to run and cache those tests results in redis which can be retrieved later over RMB call.

Perf tests are monitored by `noded` service from zos modules.
49 changes: 49 additions & 0 deletions pkg/perf/cache.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
package perf

import (
"context"
"encoding/json"
"time"

"github.com/pkg/errors"
)

// TestResultData the result test schema
type TestResultData struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now after checking the rest of the files, this should be TaskResult struct only task name and the result

TestName string
TestNumber uint64
Result interface{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields missing json tags

}

// GetRequest the get request struct
type GetRequest struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this struct can be removed completely and use the task name always to get the task result from the cache

TestName string `json:"test_name"`
TestNumber uint64 `json:"test_number"`
}

// CacheResult set result in redis
func (pm *PerformanceMonitor) CacheResult(ctx context.Context, resultKey string, resultData TestResultData) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a public method. I don't expect anyone should ever call this method by himself. this should an internal method. I see many attributes and methods that are public for no obvious reason. It make the code subtle to abuse and bad usage. And allows bad design

data, err := json.Marshal(resultData)
if err != nil {
return errors.Wrap(err, "Error marshaling data to JSON")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors should be lowercased

}

return pm.RedisClient.Set(resultKey, data, 10*time.Second).Err()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract the expiration into an unexported constant resultTTL, also 10 is too low imo,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was gonna say that data should never expire, but instead have the timestamp of when the test was run. but never expire. Some tests run every 6 hours our once per day. Hence expiration is not needed. so 0 is the right value

}

// GetCachedResult get data from redis
func (pm *PerformanceMonitor) GetCachedResult(resultKey string) (TestResultData, error) {
var res TestResultData

data, err := pm.RedisClient.Get(resultKey).Result()
if err != nil {
return res, errors.Wrap(err, "Failed getting the cached result")
}

err = json.Unmarshal([]byte(data), &res)
if err != nil {
return res, errors.Wrap(err, "Failed unmarshal data from json")
}

return res, nil
}
100 changes: 100 additions & 0 deletions pkg/perf/monitor.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
package perf

import (
"context"
"encoding/json"
"time"

"github.com/go-redis/redis"
"github.com/jasonlvhit/gocron"
"github.com/pkg/errors"
)

// TaskMethod is the task method signature
type TaskMethod func(ctx context.Context) (interface{}, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskAction or TaskFunc


type Task struct {
Key string
Method TaskMethod
Interval time.Duration // interval in seconds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntervalSeconds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's wrong to use Duration in seconds. Duration is represented internally in nano seconds! Means all Golang std and other libraries that accepts Duration it's ALWAYS in nano-seconds and you can't break this contract.

then a duration of 5 seconds is dur = 5 * time.Second dur in this case.

If you want a type that represents time in Seconds then use a uint16 or any unassigned numerical type. or create a new type type Seconds uint32

Another way to do this is make duration a private field, then the New function will then take uint32 (in seconds) and does the following

type Test {
dur Duration
}

func NewTest(seconds uint32) Test {
   return Test {
    dur: seconds * time.Seconds 
   }
}

ExecutionCount uint64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

}

// PerformanceMonitor holds the module data
type PerformanceMonitor struct {
Scheduler *gocron.Scheduler
RedisClient *redis.Client
Tasks []Task
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these fields are public? they open the door to bad design since you will end up accessing an internal implementation details to an object. A better approach is that all fields become private and then functionality is either defined by interfaces or public methods.

Some fields can become public of course, but usually in Data only types. Not operational types. For example there is noway someone will ever need to access the redis client on the Monitor structure because that is implementation details of how the Monitor caches the result!

}

// NewPerformanceMonitor returns PerformanceMonitor instance
func NewPerformanceMonitor(redisAddr string) *PerformanceMonitor {
redisClient := redis.NewClient(&redis.Options{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already using a redis package in zos so no need to add a new dependency. Please move to github.com/gomodule/redigo you can already create a new pool directly by calling this util function here https://github.com/threefoldtech/zos/blob/main_perf_package/pkg/utils/redis.go#L11

where the url is just unix://path/to/socket

Network: "unix",
Addr: redisAddr,
})

scheduler := gocron.NewScheduler()

return &PerformanceMonitor{
Scheduler: scheduler,
RedisClient: redisClient,
Tasks: []Task{},
}
}

// AddTask a simple helper method to add new tasks
func (pm *PerformanceMonitor) AddTask(taskName string, interval time.Duration, task TaskMethod) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point behind using gocron is not giving it an interval but instead a cron specs instead. I don't want to specify how often does it run but instead when it runs using the cron syntax.

A task can be secduled to run every sunday at 12pm, or 3 times per day on 0 8 and 16, etc.. So specifying an interval is not the right way.

I also don't like that the task is TaskMethod is just a function not an interface. This already forces you to create internal types to maintain the "task" why not instead use an interface as

type Task interface {
  ID() string
  Cron() string 
  Run(ctx context.Context) (any, error)
}

pm.Tasks = append(pm.Tasks, Task{
Key: taskName,
Method: task,
Interval: interval,
ExecutionCount: 0,
})
}

// InitScheduler adds all the test to the scheduler queue
func (pm *PerformanceMonitor) InitScheduler() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this

pm.AddTask("TestLogging", 5, TestLogging)
}

// RunScheduler adds the tasks to the corn queue and start the scheduler
func (pm *PerformanceMonitor) RunScheduler(ctx context.Context) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name RunScheduler shouldn't be introduced, a simple Run or Start should have been enough, now whoever is using this API will need to know it's using go-cron scheduler

for _, task := range pm.Tasks {
err := pm.Scheduler.Every(uint64(task.Interval)).Seconds().Do(func() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the added value of using go-cron, simple goroutines and error groups should have been more than enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already commented on the AddTask method regarding exactly that

testResult, err := task.Method(ctx)
if err != nil {
return errors.Wrapf(err, "failed running test: %s", task.Key)
}

task.ExecutionCount++
err = pm.CacheResult(ctx, task.Key, TestResultData{
TestName: task.Key,
TestNumber: task.ExecutionCount,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay now I understand the point of TestNumber, Please remove it from the test data completely, we only care about the latest test results. it doesn't mean much given it gets invalidated every 10 seconds according to the current code, so there's no point in storing that.

Result: testResult,
})
if err != nil {
return errors.Wrap(err, "failed setting cache")
}

return nil
})
if err != nil {
return errors.Wrapf(err, "failed scheduling the job: %s", task.Key)
}
}

pm.Scheduler.Start()
return nil
}

// Get handles the request to get data from cache
func (pm *PerformanceMonitor) Get(ctx context.Context, payload []byte) (interface{}, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so wrong! Why a function in a library knows about an API request format ? A library should only expose a library interface (Say a Get(test)) and that' it

An public API interface (http, rmb, or whatever) then can handle user requests and then just make a proper call to the library

var req GetRequest
err := json.Unmarshal([]byte(payload), &req)
if err != nil {
return nil, errors.Wrapf(err, "failed to unmarshal payload: %v", payload)
}

return pm.GetCachedResult(req.TestName)
}
37 changes: 37 additions & 0 deletions pkg/perf/tasks.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
package perf

import (
"context"
"fmt"
"time"

"github.com/rs/zerolog/log"
)

// TestLogging simple helper method that ensure scheduler is working
func TestLogging(ctx context.Context) (interface{}, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this one, there're other logs we can see.

log.Info().Msgf("TestLogging: time is: %v", time.Now())
return fmt.Sprintf("time is: %v", time.Now()), nil
}

// should run every 5 min
// ping 12 point, each 3 times and make average result
// iperf 12 grid node with ipv4
// iperf 12 grid node with ipv6
// uses the iperf binary
func TestNetworkPerformance(ctx context.Context) (interface{}, error) {
return nil, nil
}

// should run every 6min
// upload/download a 1 MB file to any point
func TestNetworkLoading(ctx context.Context) (interface{}, error) {
return nil, nil
}

// should run every 6 hours
// measure cpu, mem, disk performance and usage
// uses some tool that do the mentoring
func TestResourcesPerformance(ctx context.Context) (interface{}, error) {
return nil, nil
}
13 changes: 13 additions & 0 deletions qemu/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,16 @@ start:
bash vm.sh -n node1 -c "runmode=dev farmer_id=$(FARMERID)"
test:
bash vm.sh -n node1 -c "runmode=test farmer_id=$(FARMERID)"

auth:
@echo "Copying your public ssh to machine rootfs"
mkdir -p overlay/root/.ssh
cp ~/.ssh/id_rsa.pub overlay/root/.ssh/authorized_keys

net:
@echo "Creating a virtual natted network"
bash ../docs/development/net.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the script be instead moved here, and docs then refers to it in this location instead of the other way around?


run:
@echo "Running your node"
sudo ./vm.sh -g -n myzos-01 -c "farmer_id=$(id) version=v3 printk.devmsg=on runmode=dev nomodeset"