-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: search edge identities by trait (WIP) #4493
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
|
Uffizzi Preview |
I've done a quick performance test of this implementation and, I think, it's not particularly useable as it is here. The test created 5M identities, each with 5 traits, and searched for a particular trait key / value combination. On my local machine, this query took 20s to return. Here is the (crude) test I used: def test_search_edge_api_identity_meta(project: Project) -> None:
# Given
num_environments = 5
num_identities = 1_000_000
num_traits_per_identity = 5
for _environment in (
Environment.objects.create(name=f"environment_{i}", project=project)
for i in range(1, num_environments + 1)
):
to_create = []
start = time.time()
for j in range(1, num_identities + 1):
edge_identity_meta = EdgeIdentityMeta(
identifier=f"identity_{j}",
edge_identity_uuid=str(uuid.uuid4()),
environment=_environment,
)
traits = [
TraitModel(
trait_key=f"{_environment.name}_trait_{j}_{k}",
trait_value=f"value_{j}_{k}"
)
for k in range(1, num_traits_per_identity + 1)
]
edge_identity_meta.update_searchable_traits(traits)
to_create.append(edge_identity_meta)
EdgeIdentityMeta.objects.bulk_create(to_create)
logger.info(f"Created EdgeIdentityMeta objects for environment {_environment.name} in {time.time() - start} seconds")
environment_id_to_search_for = random.choice(list(Environment.objects.values_list("id", flat=True)))
environment_name_to_search_for = Environment.objects.get(id=environment_id_to_search_for).name
identity_id_to_search_for = random.randint(1, num_identities)
trait_id_to_search_for = random.randint(1, num_traits_per_identity)
trait_key_to_search_for = f"environment_{environment_name_to_search_for}_trait_{identity_id_to_search_for}_{trait_id_to_search_for}"
trait_value_to_search_for = f"value_{identity_id_to_search_for}_{trait_id_to_search_for}"
search_string = f"{trait_key_to_search_for}{EdgeIdentityMeta._EQUALITY_CHARS}{trait_value_to_search_for}"
# When
start = time.time()
result_exists = EdgeIdentityMeta.objects.filter(
environment_id=environment_id_to_search_for,
searchable_traits__search=search_string
).exists()
end = time.time()
# Then
assert result_exists, search_string
query_time_ms = (end - start) * 1000
logger.info("query time = %sms", query_time_ms)
assert query_time_ms < 1 |
192134c
to
362346a
Compare
By adding the index as per this commit, I've brought the query time down to ~500ms, but even that seems like a lot here. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4493 +/- ##
==========================================
+ Coverage 96.91% 96.93% +0.02%
==========================================
Files 1176 1180 +4
Lines 39349 39526 +177
==========================================
+ Hits 38135 38316 +181
+ Misses 1214 1210 -4 ☔ View full report in Codecov by Sentry. |
Changes
This is a WIP implementation of #4016
How did you test this code?
N/a (so far)