Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards better inference: bits → nibbles #3808

Draft
wants to merge 81 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
ae00a8e
Introducing nibbles
originalsouth Aug 27, 2024
c90fcb0
Prototyping
originalsouth Aug 28, 2024
d57cf19
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Sep 18, 2024
64ece62
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Sep 18, 2024
bba22a3
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Sep 18, 2024
0896eba
set default in model
noamblitz Sep 19, 2024
964b89b
remove default bit
noamblitz Sep 19, 2024
5915f03
fix test
noamblitz Sep 19, 2024
ed7be58
Fix Octopoes tests for patch related changes
originalsouth Sep 19, 2024
efa3c97
Merge branch 'set-default-risk-in-model' of github.com:minvws/nl-kat-…
originalsouth Sep 19, 2024
663a9bb
Fix Octopoes tests for patch related changes II
originalsouth Sep 19, 2024
bd78ed9
Merge branch 'main' into set-default-risk-in-model
originalsouth Sep 19, 2024
b5ba90a
Fix Octopoes tests for patch related changes III
originalsouth Sep 19, 2024
f885652
Merge branch 'set-default-risk-in-model' of github.com:minvws/nl-kat-…
originalsouth Sep 19, 2024
b05283e
Prevent race conditions between Octopoes' event manager and the sched…
originalsouth Sep 19, 2024
06d1080
Merge branch 'main' into set-default-risk-in-model
underdarknl Sep 20, 2024
5bf8b35
Merge branch 'main' into set-default-risk-in-model
originalsouth Sep 23, 2024
967d41b
Merge branch 'main' into set-default-risk-in-model
underdarknl Sep 23, 2024
d30b33f
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Sep 23, 2024
86fe7d5
Merge branch 'fix/prevent_race_conditions_between_event_manager_and_s…
originalsouth Sep 23, 2024
dca2b20
Merge branch 'set-default-risk-in-model' into feature/nibbles
originalsouth Sep 23, 2024
7699d93
Fixes for idle run
originalsouth Sep 23, 2024
0eb106f
Merge branch 'main' into feature/nibbles
originalsouth Sep 24, 2024
2ed89fb
Manual merge
originalsouth Oct 14, 2024
d9c9fa2
Revert "Set default findingtype risk in model instead of in bit (#3562)"
originalsouth Oct 14, 2024
20c5abf
Pre-commit after revert
originalsouth Oct 14, 2024
2d09141
Remove bogus rlu_cache
originalsouth Oct 15, 2024
6adeffe
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 16, 2024
f3f4277
Register origins and add parameters begins
originalsouth Oct 16, 2024
ef9ad80
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 16, 2024
5546cd8
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 16, 2024
cf2f04c
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 16, 2024
6fd5f74
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 29, 2024
1b49c3b
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 30, 2024
b28ae84
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 30, 2024
8b0f50d
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Oct 31, 2024
f140e87
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 4, 2024
be03bf8
Add blocklist and ooi reuse to inference
originalsouth Nov 4, 2024
852ec3e
Fix runner
originalsouth Nov 4, 2024
ed4c40a
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 4, 2024
df9a329
Basic nibbler
originalsouth Nov 6, 2024
5908b42
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 6, 2024
2de975d
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 7, 2024
a67b297
Add more boilerplating
originalsouth Nov 7, 2024
f20cb4b
Check clearance for seed OOI in nibbles
originalsouth Nov 7, 2024
d706b35
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 7, 2024
49e1116
Add unit test
originalsouth Nov 7, 2024
8ff6fac
Add unit test
originalsouth Nov 7, 2024
a9da549
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 7, 2024
6fbcf12
Make SonarClaus Happier
originalsouth Nov 7, 2024
bd7b82d
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 12, 2024
13400b3
More testing and fixing
originalsouth Nov 12, 2024
137b687
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 12, 2024
aa66104
Moves towards a new niddles
originalsouth Nov 13, 2024
4d9baa2
Purge NMAX
originalsouth Nov 13, 2024
63cdaec
Another day another design
originalsouth Nov 14, 2024
f337ee3
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 14, 2024
a18929b
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 17, 2024
e35b101
Add multivariable support
originalsouth Nov 18, 2024
0c8a6bb
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 18, 2024
d320be2
Refactor
originalsouth Nov 19, 2024
87909ae
Fix typing
originalsouth Nov 19, 2024
4b853d9
Refactor
originalsouth Nov 19, 2024
d084a38
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 19, 2024
5266ccd
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 20, 2024
e7b3a5a
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 20, 2024
bd59705
Mostly fix nibble-origins -> nibblettes
originalsouth Nov 21, 2024
e9a4576
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 21, 2024
8c6d6e5
Add comment
originalsouth Nov 21, 2024
9890402
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 26, 2024
ac80ae0
Give me the $$$ AWK input
originalsouth Nov 26, 2024
d978519
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 26, 2024
6272afc
Faster serialization
originalsouth Nov 27, 2024
82b6ad4
Skip encoding
originalsouth Nov 27, 2024
dee4a4a
Revert "Faster serialization"
originalsouth Nov 27, 2024
c40537d
nibblette -> nibblet
originalsouth Nov 27, 2024
9e0a0ca
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 27, 2024
d19812d
Test re-evaluation
originalsouth Nov 27, 2024
5e5ff0b
Merge remote-tracking branch 'origin/main' into feature/nibbles
originalsouth Nov 27, 2024
cc73cf0
Fix double dict entry "bug"
originalsouth Nov 28, 2024
986b32d
Run all nibbles not touched by nibblets
originalsouth Nov 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion octopoes/.ci/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ services:
args:
ENVIRONMENT: dev
context: .
command: pytest tests/integration --timeout=300
command: pytest -s tests/integration/test_nibbles.py --timeout=300
depends_on:
- xtdb
- ci_octopoes
Expand Down
Empty file added octopoes/nibbles/__init__.py
Empty file.
96 changes: 96 additions & 0 deletions octopoes/nibbles/definitions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import importlib
import pkgutil
from collections.abc import Iterable
from pathlib import Path
from types import MethodType, ModuleType

import structlog
from pydantic import BaseModel

from octopoes.models import OOI

NIBBLES_DIR = Path(__file__).parent
NIBBLE_ATTR_NAME = "NIBBLE"
NIBBLE_FUNC_NAME = "nibble"
logger = structlog.get_logger(__name__)


class NibbleParameter(BaseModel):
object_type: type
parser: str = "[]"

def __eq__(self, other):
if isinstance(other, NibbleParameter):
return vars(self) == vars(other)
elif isinstance(other, type):
return self.object_type == other
else:
return False


class NibbleDefinition:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, what is the reason this isn't implemented as an e.g. Pydantic class but instead as a POJO-like class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow the Pydantic class does not work well with the importlib yielding the payload... not sure why but it fixed the issues so I moved on -- perhaps hoping one day you would fix it ;)

id: str
signature: list[NibbleParameter]
query: str | None = None
min_scan_level: int = 1
default_enabled: bool = True
config_ooi_relation_path: str | None = None
payload: MethodType | None = None

def __init__(
self,
name: str,
signature: list[NibbleParameter],
query: str | None = None,
min_scan_level: int = 1,
default_enabled: bool = True,
config_ooi_relation_path: str | None = None,
):
self.id = name
self.signature = signature
self.query = query
self.min_scan_level = min_scan_level
self.default_enabled = default_enabled
self.config_ooi_relation_path = config_ooi_relation_path

def __call__(self, args: Iterable[OOI]) -> OOI | Iterable[OOI | None] | None:
if self.payload is None:
raise NotImplementedError
else:
return self.payload(*args)


def get_nibble_definitions() -> dict[str, NibbleDefinition]:
nibble_definitions = {}

for package in pkgutil.walk_packages([str(NIBBLES_DIR)]):
if package.name in ["definitions", "runner"]:
continue

try:
module: ModuleType = importlib.import_module(".nibble", f"{NIBBLES_DIR.name}.{package.name}")

if hasattr(module, NIBBLE_ATTR_NAME):
nibble_definition: NibbleDefinition = getattr(module, NIBBLE_ATTR_NAME)

try:
payload: ModuleType = importlib.import_module(
f".{package.name}", f"{NIBBLES_DIR.name}.{package.name}"
)
if hasattr(payload, NIBBLE_FUNC_NAME):
nibble_definition.payload = getattr(payload, NIBBLE_FUNC_NAME)
else:
logger.warning('module "%s" has no function %s', package.name, NIBBLE_FUNC_NAME)

except ModuleNotFoundError:
logger.warning('package "%s" has no function nibble', package.name)

nibble_definitions[nibble_definition.id] = nibble_definition

else:
logger.warning('module "%s" has no attribute %s', package.name, NIBBLE_ATTR_NAME)

except ModuleNotFoundError:
logger.warning('package "%s" has no module nibble', package.name)

return nibble_definitions
138 changes: 138 additions & 0 deletions octopoes/nibbles/runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
import json
from collections.abc import Callable, Iterable
from datetime import datetime
from typing import TypeVar

from xxhash import xxh3_128_hexdigest as xxh3 # INFO: xxh3_64_hexdigest is faster but hash more collision probabilities

from nibbles.definitions import NibbleDefinition, get_nibble_definitions
from octopoes.models import OOI
from octopoes.models.origin import Origin, OriginType
from octopoes.models.types import type_by_name
from octopoes.repositories.ooi_repository import OOIRepository
from octopoes.repositories.origin_repository import OriginRepository
from octopoes.repositories.scan_profile_repository import ScanProfileRepository

T = TypeVar("T")
U = TypeVar("U")


def ooi_type(ooi: OOI) -> type[OOI]:
return type_by_name(ooi.get_ooi_type())


def merge_with(func: Callable[[set[T], set[T]], set[T]], d1: dict[U, set[T]], d2: dict[U, set[T]]) -> dict[U, set[T]]:
return {k: func(d1.get(k, set()), d2.get(k, set())) for k in set(d1) | set(d2)}


def flatten(items: Iterable[OOI | Iterable[OOI | None] | None]) -> Iterable[OOI]:
for item in items:
if isinstance(item, OOI):
yield item
elif item is None:
continue
else:
yield from flatten(item)


def nibble_hasher(data: Iterable) -> str:
return xxh3(
"".join(
[
json.dumps(json.loads(ooi.model_dump_json()), sort_keys=True)
if isinstance(ooi, OOI)
else json.dumps(ooi, sort_keys=True)
for ooi in data
]
)
)


class NibblesRunner:
def __init__(
self,
ooi_repository: OOIRepository,
origin_repository: OriginRepository,
scan_profile_repository: ScanProfileRepository,
perform_writes: bool = True,
):
self.ooi_repository = ooi_repository
self.origin_repository = origin_repository
self.scan_profile_repository = scan_profile_repository
self.perform_writes = perform_writes
self.update_nibbles()

def update_nibbles(self):
self.nibbles: dict[str, NibbleDefinition] = get_nibble_definitions()

def _run(self, ooi: OOI, valid_time: datetime) -> dict[str, dict[tuple, set[OOI]]]:
return_value: dict[str, dict[tuple, set[OOI]]] = {}
nibblets = self.origin_repository.list_origins(
valid_time, origin_type=OriginType.NIBBLET, parameters_references=[ooi.reference]
)
if nibblets:
for nibblet in nibblets:
# INFO: we do not strictly need this if statement because OriginType.NIBBLETS \
# always have parameters_references but it makes the linters super happy
if nibblet.parameters_references:
nibble = self.nibbles[nibblet.method]
args = self.ooi_repository.nibble_query(
ooi,
nibble,
valid_time,
nibblet.parameters_references
if nibble.query is not None and nibble.query.count("$") > 0
else [],
)
results = {
tuple(arg): set(flatten([nibble(arg)]))
for arg in args
if nibblet.parameters_hash != nibble_hasher(arg)
}
return_value |= {nibble.id: results}
nibblet_nibbles = {self.nibbles[nibblet.method] for nibblet in nibblets}
for nibble in filter(lambda x: type(ooi) in x.signature and x not in nibblet_nibbles, self.nibbles.values()):
args = self.ooi_repository.nibble_query(ooi, nibble, valid_time)
results = {tuple(arg): set(flatten([nibble(arg)])) for arg in args}
return_value |= {nibble.id: results}
# TODO: we could cache the writes for single OOI nibbles
self._write({ooi: return_value}, valid_time)
return return_value

def _cleared(self, ooi: OOI, valid_time: datetime) -> bool:
ooi_level = self.scan_profile_repository.get(ooi.reference, valid_time).level.value
target_nibbles = filter(lambda x: type(ooi) in x.signature, self.nibbles.values())
return any(nibble.min_scan_level < ooi_level for nibble in target_nibbles)

def _write(self, inferences: dict[OOI, dict[str, dict[tuple, set[OOI]]]], valid_time: datetime):
if self.perform_writes:
for source_ooi, results in inferences.items():
self.ooi_repository.save(source_ooi, valid_time)
for nibble_id, run_result in results.items():
for arg, result in run_result.items():
nibble_origin = Origin(
method=nibble_id,
origin_type=OriginType.NIBBLET,
source=source_ooi.reference,
result=[ooi.reference for ooi in result],
parameters_hash=nibble_hasher(arg),
# TODO: What to do if a is not an OOI?
parameters_references=[a.reference for a in arg if isinstance(a, OOI)],
)
for ooi in result:
self.ooi_repository.save(ooi, valid_time=valid_time)
self.origin_repository.save(nibble_origin, valid_time=valid_time)

def infer(self, stack: list[OOI], valid_time: datetime) -> dict[OOI, dict[str, dict[tuple, set[OOI]]]]:
inferences: dict[OOI, dict[str, dict[tuple, set[OOI]]]] = {}
blockset = set(stack)
if stack and self._cleared(stack[-1], valid_time):
while stack:
ooi = stack.pop()
results = self._run(ooi, valid_time)
if results:
blocks = set.union(set(), *[ooiset for result in results.values() for _, ooiset in result.items()])
stack += [o for o in blocks if o not in blockset]
blockset |= blocks
inferences |= {ooi: results}
return inferences
14 changes: 11 additions & 3 deletions octopoes/octopoes/core/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import structlog
from bits.definitions import get_bit_definitions
from bits.runner import BitRunner
from nibbles.runner import NibblesRunner
from pydantic import TypeAdapter

from octopoes.config.settings import (
Expand Down Expand Up @@ -76,6 +77,7 @@ def __init__(
self.origin_repository = origin_repository
self.origin_parameter_repository = origin_parameter_repository
self.scan_profile_repository = scan_profile_repository
self.nibbler = NibblesRunner(ooi_repository, origin_repository, scan_profile_repository)
self.session = session

@overload
Expand Down Expand Up @@ -170,10 +172,10 @@ def save_origin(
self.ooi_repository.get(origin.source, valid_time)
except ObjectNotFoundException:
if (
origin.origin_type not in [OriginType.DECLARATION, OriginType.AFFIRMATION]
origin.origin_type not in [OriginType.DECLARATION, OriginType.AFFIRMATION, OriginType.NIBBLET]
and origin.source not in origin.result
):
raise ValueError("Origin source of observation does not exist")
raise ValueError(f"Origin source [{origin.source}] does not exist")
elif origin.origin_type == OriginType.AFFIRMATION:
logger.debug("Affirmation source %s already deleted", origin.source)
return
Expand All @@ -200,6 +202,7 @@ def save_origin(
self.origin_repository.delete(origin, valid_time=valid_time)

def _run_inference(self, origin: Origin, valid_time: datetime) -> None:
# The bit part of inferring
bit_definition = get_bit_definitions().get(origin.method, None)

if bit_definition is None:
Expand Down Expand Up @@ -234,6 +237,7 @@ def _run_inference(self, origin: Origin, valid_time: datetime) -> None:
if len(configs) != 0:
config = configs[-1].config

resulting_oois: list[OOI] = []
try:
if isinstance(self.session, XTDBSession):
start = perf_counter()
Expand All @@ -252,10 +256,14 @@ def _run_inference(self, origin: Origin, valid_time: datetime) -> None:
self.session.client.submit_transaction(ops)
else:
resulting_oois = BitRunner(bit_definition).run(source, parameters, config=config)
self.save_origin(origin, resulting_oois, valid_time)
except Exception as e:
logger.exception("Error running inference", exc_info=e)

self.save_origin(origin, resulting_oois, valid_time)

# The nibble part of inferring
self.nibbler.infer([source], valid_time)

@staticmethod
def check_path_level(path_level: int | None, current_level: int):
return path_level is not None and path_level >= current_level
Expand Down
20 changes: 19 additions & 1 deletion octopoes/octopoes/models/origin.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ class OriginType(Enum):
OBSERVATION = "observation"
INFERENCE = "inference"
AFFIRMATION = "affirmation"
NIBBLET = "nibblet"


class Origin(BaseModel):
Expand All @@ -19,6 +20,8 @@ class Origin(BaseModel):
source: Reference
source_method: str | None = None # None for bits and normalizers
result: list[Reference] = Field(default_factory=list)
parameters_hash: str | None = None # None for anything other than Nibblette
parameters_references: list[Reference] | None = None # None for anything other than Nibblette
task_id: UUID | None = None

def __sub__(self, other) -> set[Reference]:
Expand All @@ -29,7 +32,22 @@ def __sub__(self, other) -> set[Reference]:

@property
def id(self) -> str:
if self.source_method is not None:
if self.origin_type == OriginType.NIBBLET:
return "|".join(
map(
str,
[
self.__class__.__name__,
self.origin_type.value,
self.method,
self.source,
self.result,
self.parameters_hash,
self.parameters_references,
],
)
)
elif self.source_method is not None:
return (
f"{self.__class__.__name__}|{self.origin_type.value}|{self.method}|{self.source_method}|{self.source}"
)
Expand Down
Loading