Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MongoDB indices for analysis fields #2079

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 32 additions & 12 deletions lib/cuckoo/core/startup.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,20 +92,40 @@ def check_webgui_mongo():
client = connect_to_mongo()
if not client:
sys.exit(
"You have enabled webgui but mongo isn't working, see mongodb manual for correct installation and configuration\nrun `systemctl status mongodb` for more info"
"You have enabled webgui but mongo isn't working, see mongodb "
"manual for correct installation and configuration\n"
"run `systemctl status mongodb` for more info"
)

# Create separate index for certain fields to enable efficient keyword
# searches for large amounts of data in the 'analysis' collection.
# NOTE: Silently ignores the creation if the index already exists.
items = [
"info.id",
"info.parent_sample.sha256",
"target.file.sha256",
"dropped.sha256",
"CAPE.payloads.sha256",
"procdump.sha256",
"procmemory.sha256",
"target.file.extracted_files.sha256",
"dropped.extracted_files.sha256",
"CAPE.payloads.extracted_files.sha256",
"procdump.extracted_files.sha256",
"procmemory.extracted_files.sha256",
"target.file.file_ref",
"dropped.file_ref",
"CAPE.payloads.file_ref",
"procdump.file_ref",
"procmemory.file_ref",
]
for item in items:
mongo_create_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take a very long time for a lot of CAPE installations - terabyte scale is common. Probably better to do this in the utils somewhere and emit a warning if the indices aren't found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly do you suggest we should put this?

Copy link
Contributor

@nbargnesi nbargnesi May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest as a new module in utils, mongodb_indices or something appropriately named.

In the startup module you can check the indices are there and emit a warning if they're missing.

Note too, the difference between doing it in startup and utils. Putting it in utils means we can add indices out-of-band, while CAPE continutes to run. Great way of making things faster incrementally without bringing CAPE down by touching startup.

collection="analysis",
index=item,
name=f"{item}_1"
)

# Create an index based on the info.id dict key. Increases overall scalability
# with large amounts of data.
# Note: Silently ignores the creation if the index already exists.
mongo_create_index("analysis", "info.id", name="info.id_1")
# mongo_create_index([("target.file.sha256", TEXT)], name="target_sha256")
# We performs a lot of SHA256 hash lookup so we need this index
# mongo_create_index(
# "analysis",
# [("target.file.sha256", TEXT), ("dropped.sha256", TEXT), ("procdump.sha256", TEXT), ("CAPE.payloads.sha256", TEXT)],
# name="ALL_SHA256",
# )
mongo_create_index("files", [("_task_ids", 1)])

elif repconf.elasticsearchdb.enabled:
Expand Down