Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop to beta #950

Merged
merged 91 commits into from
Jun 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
91 commits
Select commit Hold shift + click to select a range
c8e796b
Add RPC_PORT to IMA env
badrogger Jan 30, 2023
863ec1c
Add IMA_RPC to internal ports
badrogger Jan 30, 2023
402c98e
Update skale.py to 5.8b1
badrogger Jan 30, 2023
57b0efc
Fix ImaEnv class
badrogger Jan 31, 2023
6c6ed96
Fix tests
badrogger Feb 6, 2023
d5dd69f
Fix route test
badrogger Feb 6, 2023
2cbb513
Merge branch 'develop' into add-rpc-port-to-ima-env
badrogger Feb 13, 2023
c8d388a
Fix firewall tests
badrogger Feb 13, 2023
a4c02eb
Remove OS version check
alexgex Apr 10, 2023
84ad70e
Update dependencies
alexgex Apr 10, 2023
85c6abc
Remove OS version check tests
alexgex Apr 10, 2023
272fd84
Merge pull request #940 from skalenetwork/remove-os-version-check
DmytroNazarenko Apr 11, 2023
a18f17a
Add monitor tasks module
badrogger May 22, 2023
852fb3b
Merge branch 'develop' into add-rpc-port-to-ima-env
badrogger May 24, 2023
d91df36
Add fail after error flag for install_python_dependencies
badrogger May 24, 2023
c93b125
Bump codecov to 2.1.13
badrogger May 24, 2023
48df022
Bump flask and cryptography
badrogger May 24, 2023
a50daeb
Bump Werkzeug to 2.2.2
badrogger May 24, 2023
faec41f
Bump pyOpenSSL to 2.1.1
badrogger May 24, 2023
1e234eb
Update docker.py version to 6.1.2
badrogger May 25, 2023
c919b92
Merge pull request #914 from skalenetwork/add-rpc-port-to-ima-env
badrogger May 25, 2023
9ebea08
Split checks into separate classes for each task
badrogger May 26, 2023
615d0e9
Bump requirements
badrogger May 26, 2023
91489a2
Add action module. Add config monitor module
badrogger May 29, 2023
70a5210
Add tests for config actions
badrogger May 30, 2023
9b9ac74
Add tests for container actions
badrogger May 30, 2023
a14f9b3
Extend action tests
badrogger May 30, 2023
f45541c
Rename container -> skaled
badrogger May 30, 2023
171cb9d
Extract secret_key from config fixture in conftest
badrogger May 30, 2023
5349ff5
Save config to a new path with timestamp and rotation_id
badrogger Jun 1, 2023
d7d433b
Introduce new monitor flow
badrogger Jun 7, 2023
1291c33
Fix config checks
badrogger Jun 8, 2023
d337640
Add the rest of monitor types
badrogger Jun 8, 2023
5a90e8b
Improve checks naming
badrogger Jun 10, 2023
0eeb576
Clean config path commands logic
badrogger Jun 10, 2023
ae2663e
Improve config related actions
badrogger Jun 10, 2023
fb76437
Updated to new config check names
badrogger Jun 10, 2023
19ef38f
Handle exceptions properly for Task
badrogger Jun 10, 2023
73801b8
Add skaled_monitor module
badrogger Jun 10, 2023
9a9a9d5
Fix config actions tests
badrogger Jun 10, 2023
dc23cf6
Bump pytest version to 7.x.x
badrogger Jun 10, 2023
ad49e10
Improve skaled action test
badrogger Jun 11, 2023
7e9e7d6
Handle empty skaled_status file. Fix skaled_action tests
badrogger Jun 12, 2023
a1218d8
Fix upstream config file determination
badrogger Jun 13, 2023
2fb33a4
Change logging format
badrogger Jun 13, 2023
7007692
Add update config test
badrogger Jun 13, 2023
7f6fb79
Raise custom exception for setExitTime request
badrogger Jun 13, 2023
123c8ba
Improve actions logging
badrogger Jun 13, 2023
db18071
Fix config check
badrogger Jun 13, 2023
ed74250
Add process name to cleaner
badrogger Jun 13, 2023
74a6a9f
Upgrade predeployed versions for web3 6.3.0 compitability
badrogger Jun 14, 2023
f427866
Move from camel case web3 calls
badrogger Jun 14, 2023
3531fa5
Add config updated check
badrogger Jun 14, 2023
06914be
Restructure config monitor execution
badrogger Jun 14, 2023
c32adad
Add NoConfigMonitor. Restructure skaled monitor execution
badrogger Jun 14, 2023
f4fa99b
Get finish_ts from config. Add missing actions
badrogger Jun 15, 2023
5712319
Download snapshot if volume was just created
badrogger Jun 15, 2023
138d477
Save upstream config in new format
badrogger Jun 16, 2023
56b31b8
Handle rotation new node
badrogger Jun 16, 2023
31b89db
Fix cleaner
badrogger Jun 16, 2023
2691ee9
Fix DKG
badrogger Jun 16, 2023
ef8ad10
Fix and improve tests
badrogger Jun 16, 2023
0923042
Remove old monitor structure modules
badrogger Jun 16, 2023
2c9bf3c
Fix get_finish_ts
badrogger Jun 16, 2023
df6febe
Remove old monitor choosing logic
badrogger Jun 16, 2023
bc53d6e
Various tests fixes
badrogger Jun 16, 2023
2246774
Remove old strucutre rotation tests
badrogger Jun 16, 2023
7b2eb0e
Enable terminate_stuck_schain_processes
badrogger Jun 16, 2023
fd762eb
Add missing monitor tests
badrogger Jun 16, 2023
dd2af01
Fix health routes
badrogger Jun 16, 2023
8af3152
Merge branch 'develop' into separate-thread-for-containers
badrogger Jun 16, 2023
4da1afa
Remove old base_monitor_test
badrogger Jun 16, 2023
361638f
Update skale.py to 6.0dev1 with fixed SkaledPorts
badrogger Jun 17, 2023
9dc3eb4
Add new node monitor
badrogger Jun 17, 2023
4dee255
Fix is_new_node_monitor
badrogger Jun 18, 2023
5b61501
Minor logging improvements
badrogger Jun 19, 2023
21acb95
Fix no config monitor condition
badrogger Jun 19, 2023
dd2094d
Fix retrieving finish ts. NewNode monitor condition
badrogger Jun 19, 2023
2ba3f2a
Fix no upstream config file handling
badrogger Jun 19, 2023
082c060
Handle update properly
badrogger Jun 20, 2023
eeed2e4
Add missing changes
badrogger Jun 20, 2023
b6463e9
Fix update config monitor condition
badrogger Jun 20, 2023
c6e49b6
Fix new node monitor condition
badrogger Jun 20, 2023
dd155d3
Fix NewNodeSkaledMonitor
badrogger Jun 20, 2023
15c78f2
Fix leaving node condition
badrogger Jun 20, 2023
a5b4b99
Remove unused structures
badrogger Jun 20, 2023
ac756e8
Improve logging in actions
badrogger Jun 20, 2023
2dff12a
Remove unused new_schain check
badrogger Jun 20, 2023
5cdc5bc
Fix repair monitor. Improve logs
badrogger Jun 20, 2023
497cad7
Merge pull request #945 from skalenetwork/separate-thread-for-containers
DmytroNazarenko Jun 20, 2023
af10e39
Merge pull request #949 from skalenetwork/v2.5.x
DmytroNazarenko Jun 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM python:3.8-buster
FROM python:3.9-buster

RUN apt-get update && apt-get install -y wget git libxslt-dev iptables kmod swig3.0
RUN ln -s /usr/bin/swig3.0 /usr/bin/swig
Expand Down
7 changes: 1 addition & 6 deletions core/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
import logging
import platform
import hashlib
import distro

import requests

Expand Down Expand Up @@ -314,8 +313,6 @@ def get_node_hardware_info() -> dict:
system_release = f'{platform.system()}-{platform.release()}'
uname_version = platform.uname().version
attached_storage_size = get_block_device_size()
os_name = distro.id()
os_version = distro.version()
return {
'cpu_total_cores': psutil.cpu_count(logical=True),
'cpu_physical_cores': psutil.cpu_count(logical=False),
Expand All @@ -325,9 +322,7 @@ def get_node_hardware_info() -> dict:
'mem_available': psutil.virtual_memory().available,
'system_release': system_release,
'uname_version': uname_version,
'attached_storage_size': attached_storage_size,
'os_name': os_name,
'os_version': os_version
'attached_storage_size': attached_storage_size
}


Expand Down
220 changes: 186 additions & 34 deletions core/schains/checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,28 +17,37 @@
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.

import filecmp
import os
import time
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict

from core.schains.config.directory import (
upstreams_for_rotation_id_version,
get_schain_check_filepath,
get_schain_config,
schain_config_dir,
schain_config_filepath,
get_schain_check_filepath
schain_config_filepath
)
from core.schains.config.helper import (
get_base_port_from_config,
get_node_ips_from_config,
get_own_ip_from_config,
get_local_schain_http_endpoint
)
from core.schains.config.main import schain_config_version_match
from core.schains.config.main import (
get_upstream_config_filepath,
get_rotation_ids_from_config_file
)
from core.schains.dkg.utils import get_secret_key_share_filepath
from core.schains.firewall.types import IRuleController
from core.schains.process_manager_helper import is_monitor_process_alive
from core.schains.rpc import (
check_endpoint_alive, check_endpoint_blocks, get_endpoint_alive_check_timeout
check_endpoint_alive,
check_endpoint_blocks,
get_endpoint_alive_check_timeout
)
from core.schains.runner import get_container_name
from core.schains.skaled_exit_codes import SkaledExitCodes
Expand Down Expand Up @@ -74,27 +83,37 @@ def __init__(self, status: bool, data: dict = None):
self.status = status
self.data = data if data else {}

def __bool__(self) -> bool:
return self.status

def __str__(self) -> str:
return f'CheckRes<{self.status}>'


class IChecks(ABC):
@abstractmethod
def get_all(self, log=True, save=False, checks_filter=None) -> Dict:
pass

def is_healthy(self) -> bool:
checks = self.get_all()
return False not in checks.values()

class SChainChecks:

class ConfigChecks(IChecks):
def __init__(
self,
schain_name: str,
node_id: int,
schain_record: SChainRecord,
rule_controller: IRuleController,
rotation_id: int = 0,
*,
ima_linked: bool = True,
dutils: DockerUtils = None
rotation_id: int,
stream_version: str
):
self.name = schain_name
self.node_id = node_id
self.schain_record = schain_record
self.rotation_id = rotation_id
self.dutils = dutils or DockerUtils()
self.container_name = get_container_name(SCHAIN_CONTAINER, self.name)
self.ima_linked = ima_linked
self.rc = rule_controller
self.stream_version = stream_version

@property
def config_dir(self) -> CheckRes:
Expand All @@ -112,14 +131,108 @@ def dkg(self) -> CheckRes:
return CheckRes(os.path.isfile(secret_key_share_filepath))

@property
def config(self) -> CheckRes:
"""Checks that sChain config file exists"""
config_filepath = schain_config_filepath(self.name)
if not os.path.isfile(config_filepath):
def upstream_config(self) -> CheckRes:
"""Checks that config exists for rotation id and stream"""
upstreams = upstreams_for_rotation_id_version(
self.name,
self.rotation_id,
self.stream_version
)
logger.debug('Upstream configs for %s: %s', self.name, upstreams)
return len(upstreams) > 0

def get_all(self, log=True, save=False, checks_filter=None) -> Dict:
if not checks_filter:
checks_filter = API_ALLOWED_CHECKS
checks_dict = {}
for check in checks_filter:
if hasattr(self, check):
if check not in API_ALLOWED_CHECKS:
logger.warning('Check %s is not allowed or does not exist', check)
else:
checks_dict[check] = getattr(self, check).status
if log:
log_checks_dict(self.name, checks_dict)
if save:
save_checks_dict(self.name, checks_dict)
return checks_dict

def is_healthy(self) -> bool:
checks = self.get_all()
return False not in checks.values()


class SkaledChecks(IChecks):
def __init__(
self,
schain_name: str,
schain_record: SChainRecord,
rule_controller: IRuleController,
*,
ima_linked: bool = True,
dutils: DockerUtils = None
):
self.name = schain_name
self.schain_record = schain_record
self.dutils = dutils or DockerUtils()
self.container_name = get_container_name(SCHAIN_CONTAINER, self.name)
self.ima_linked = ima_linked
self.rc = rule_controller

def get_all(self, log=True, save=False, checks_filter=None) -> Dict:
if not checks_filter:
checks_filter = API_ALLOWED_CHECKS
checks_dict = {}
for check in checks_filter:
if check == 'ima_container' and (DISABLE_IMA or not self.ima_linked):
logger.info(f'Check {check} will be skipped - IMA is not linked')
elif check not in API_ALLOWED_CHECKS:
logger.warning(f'Check {check} is not allowed or does not exist')
else:
if hasattr(self, check):
checks_dict[check] = getattr(self, check).status
if log:
log_checks_dict(self.name, checks_dict)
if save:
save_checks_dict(self.name, checks_dict)
return checks_dict

@property
def upstream_exists(self) -> CheckRes:
upstream_path = get_upstream_config_filepath(self.name)
return CheckRes(upstream_path is not None)

@property
def rotation_id_updated(self) -> int:
if not self.config:
return CheckRes(False)
return CheckRes(
schain_config_version_match(self.name, self.schain_record)
upstream_path = get_upstream_config_filepath(self.name)
config_path = schain_config_filepath(self.name)
upstream_rotations = get_rotation_ids_from_config_file(upstream_path)
config_rotations = get_rotation_ids_from_config_file(config_path)
logger.debug(
'Comparing rotation_ids between upstream %s and %s',
upstream_path,
config_path
)
return CheckRes(upstream_rotations == config_rotations)

@property
def config_updated(self) -> CheckRes:
if not self.config:
return CheckRes(False)
upstream_path = get_upstream_config_filepath(self.name)
config_path = schain_config_filepath(self.name)
logger.debug('Checking if %s updated according to %s', config_path, upstream_path)
if not upstream_path:
return CheckRes(True)
return CheckRes(filecmp.cmp(upstream_path, config_path))

@property
def config(self) -> CheckRes:
""" Checks that sChain config file exists """
config_path = schain_config_filepath(self.name)
return CheckRes(os.path.isfile(config_path))

@property
def volume(self) -> CheckRes:
Expand All @@ -129,7 +242,7 @@ def volume(self) -> CheckRes:
@property
def firewall_rules(self) -> CheckRes:
"""Checks that firewall rules are set correctly"""
if self.config.status:
if self.config:
conf = get_schain_config(self.name)
base_port = get_base_port_from_config(conf)
node_ips = get_node_ips_from_config(conf)
Expand Down Expand Up @@ -167,7 +280,7 @@ def ima_container(self) -> CheckRes:
def rpc(self) -> CheckRes:
"""Checks that local skaled RPC is accessible"""
res = False
if self.config.status:
if self.config:
http_endpoint = get_local_schain_http_endpoint(self.name)
timeout = get_endpoint_alive_check_timeout(
self.schain_record.failed_rpc_count
Expand All @@ -178,7 +291,7 @@ def rpc(self) -> CheckRes:
@property
def blocks(self) -> CheckRes:
"""Checks that local skaled is mining blocks"""
if self.config.status:
if self.config:
http_endpoint = get_local_schain_http_endpoint(self.name)
return CheckRes(check_endpoint_blocks(http_endpoint))
return CheckRes(False)
Expand All @@ -188,22 +301,61 @@ def process(self) -> CheckRes:
"""Checks that sChain monitor process is running"""
return CheckRes(is_monitor_process_alive(self.schain_record.monitor_id))


class SChainChecks(IChecks):
def __init__(
self,
schain_name: str,
node_id: int,
schain_record: SChainRecord,
rule_controller: IRuleController,
stream_version: str,
rotation_id: int = 0,
*,
ima_linked: bool = True,
dutils: DockerUtils = None
):
self._subjects = [
ConfigChecks(
schain_name=schain_name,
node_id=node_id,
schain_record=schain_record,
rotation_id=rotation_id,
stream_version=stream_version
),
SkaledChecks(
schain_name=schain_name,
schain_record=schain_record,
rule_controller=rule_controller,
ima_linked=ima_linked,
dutils=dutils
)
]

def __getattr__(self, attr: str) -> Any:
for subj in self._subjects:
if attr in dir(subj):
return getattr(subj, attr)
raise AttributeError(f'No such attribute {attr}')

def get_all(self, log=True, save=False, checks_filter=None):
if not checks_filter:
checks_filter = API_ALLOWED_CHECKS
checks_dict = {}
for check in checks_filter:
if check == 'ima_container' and (DISABLE_IMA or not self.ima_linked):
logger.info(f'Check {check} will be skipped - IMA is not linked')
elif check not in API_ALLOWED_CHECKS:
logger.warning(f'Check {check} is not allowed or does not exist')
else:
checks_dict[check] = getattr(self, check).status

plain_checks = {}
for subj in self._subjects:
subj_checks = subj.get_all(
log=False,
save=False,
checks_filter=checks_filter
)
plain_checks.update(subj_checks)

if log:
log_checks_dict(self.name, checks_dict)
log_checks_dict(self.name, plain_checks)
if save:
save_checks_dict(self.name, checks_dict)
return checks_dict
save_checks_dict(self.name, plain_checks)
return plain_checks

def is_healthy(self):
checks = self.get_all()
Expand Down
20 changes: 16 additions & 4 deletions core/schains/cleaner.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@

from sgx import SgxClient

from core.node import get_skale_node_version
from core.schains.checks import SChainChecks
from core.schains.config.directory import schain_config_dir
from core.schains.dkg.utils import get_secret_key_share_filepath
Expand Down Expand Up @@ -58,7 +59,7 @@


def run_cleaner(skale, node_config):
process = Process(target=monitor, args=(skale, node_config))
process = Process(name='cleaner', target=monitor, args=(skale, node_config))
process.start()
logger.info('Cleaner process started')
process.join(JOIN_TIMEOUT)
Expand Down Expand Up @@ -202,22 +203,33 @@ def remove_schain(skale, node_id, schain_name, msg, dutils=None) -> None:
terminate_schain_process(schain_record)
delete_bls_keys(skale, schain_name)
sync_agent_ranges = get_sync_agent_ranges(skale)
cleanup_schain(node_id, schain_name, sync_agent_ranges, dutils=dutils)
rotation_data = skale.node_rotation.get_rotation(schain_name)
rotation_id = rotation_data['rotation_id']
cleanup_schain(
node_id,
schain_name,
sync_agent_ranges,
rotation_id=rotation_id,
dutils=dutils
)


def cleanup_schain(node_id, schain_name, sync_agent_ranges, dutils=None) -> None:
def cleanup_schain(node_id, schain_name, sync_agent_ranges, rotation_id, dutils=None) -> None:
dutils = dutils or DockerUtils()
schain_record = upsert_schain_record(schain_name)

rc = get_default_rule_controller(
name=schain_name,
sync_agent_ranges=sync_agent_ranges
)
stream_version = get_skale_node_version()
checks = SChainChecks(
schain_name,
node_id,
rule_controller=rc,
schain_record=schain_record
stream_version=stream_version,
schain_record=schain_record,
rotation_id=rotation_id
)
if checks.skaled_container.status or is_exited(
schain_name,
Expand Down
Loading