-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDR][GCP] Add related.entity to GCP Audit Logs #11762
base: main
Are you sure you want to change the base?
Conversation
🚀 Benchmarks reportPackage
|
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
audit |
1862.2 | 1400.56 | -461.64 (-24.79%) | 💔 |
compute |
18518.52 | 14285.71 | -4232.81 (-22.86%) | 💔 |
firewall |
4424.78 | 2680.97 | -1743.81 (-39.41%) | 💔 |
loadbalancing_logs |
4347.83 | 2475.25 | -1872.58 (-43.07%) | 💔 |
vpcflow |
3906.25 | 3086.42 | -819.83 (-20.99%) | 💔 |
To see the full report comment with /test benchmark fullreport
dc812e8
to
88a3007
Compare
@tinnytintin10 I'd appreciate you taking a look at test cases. You can find examples of how |
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a proposed commit message.
packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
} | ||
|
||
void addField(Set entities, String fieldName) { | ||
addValue(entities, field(fieldName).get(null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing? I think this needs explanation for maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially this is syntax sugar for adding a value to a set. There a few layers to it because of what we use.
- The most basic is
addValue
, where we add one specific value to the set. That's is the foundation because it checks for empty and null values, removing the need to clean the set afterwards. AddValues
is a layer on top, syntax sugar to not call multipleaddValue
, but oneaddValues
with all the values you know you want to add.- But then, a very common situation is to extract a value from a field inside of the document. That what the
addField
is there for. It extracts a field value withfield(fieldName).get(null)
and adds its value usingaddValue
. - And on the same reasoning of
addValues
, anaddFields
was added.
With that said, I would appreciate feedback on how would you approach this! I first did it because it felt better to code and read throughout the script. Specially AWS one that is huge.
But if you believe we should add comments, might be a smell of unclarity and unreadable code. How would you approach it? Do you have suggestions of better naming?
I personally am a bit against adding a comment here (just a bit, if we need we do it). This is a function made to make life easier. If we need to add comments to make it clear, might not be worth to have it at all. Specially in a script language inside of a yaml without proper highlighting. So I rather have more readable code if that's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patterns that are used here are not common in the remainder of the integrations code base. This means that people need to spend time looking at the code to understand what it's doing. The absence of commentary exacerbates this, as does the indirection of arising from the helpers. I'd be happy for the code to be replaces with explicit iteration rather than the streaming iterator approach that's used here. But I would also like to have an explanation of what the intention of the code is.
void addValue(Set entities, def value) {
if (value != null && value != "") {
entities.add(value);
}
}
boolean isKubernetes = false;
if (ctx.json?.resource?.type != null) {
String typ = ctx.json.resource.type;
isKubernetes = (typ == "k8s_cluster" || typ == "gke_cluster" || typ == "kubernetes");
}
// Using tree set to ensure a sorting is kept (testing purposes)
TreeSet entities = new TreeSet();
addValue(entities, ctx.json?.protoPayload?.request?.parent);
if (!isKubernetes) {
addValue(entities, ctx.json?.protoPayload?.resourceName);
addValue(entities, ctx.json?.protoPayload?.response?.user);
}
HashMap authInfo = ctx.json?.protoPayload?.authenticationInfo ?: new HashMap();
if (!isKubernetes) {
addValue(entities, authInfo.principalEmail);
}
addValue(entities, authInfo.principalSubject);
addValue(entities, authInfo.serviceAccountKeyName);
// I think this section is what was intended; the code that's there doesn't make sense,
// but this looks like it does what someone might want it to do based on the vibe of
// the original.
if (authInfo.serviceAccountDelegationInfo instanceof List) {
for (def i: authInfo.serviceAccountDelegationInfo) {
addValue(entities, i.principalSubject);
addValue(entities, i.firstPartyPrincipal?.principalEmail);
addValue(entities, i.thirdPartyPrincipal?.principalEmail);
}
}
String serviceName = ctx.json?.protoPayload?.serviceName ?: '';
if (serviceName == "compute.googleapis.com") {
if (ctx.json?.protoPayload?.request?.networkInterfaces instanceof List) {
for (def e: ctx.json.protoPayload.request.networkInterfaces) {
addValue(entities, e.network);
}
}
if (ctx.json?.protoPayload?.request?.serviceAccounts instanceof List) {
for (def e: ctx.json.protoPayload.request.serviceAccounts) {
addValue(entities, e.email);
}
}
if (ctx.json?.protoPayload?.request?.disks instanceof List) {
for (def e: ctx.json.protoPayload.request.disks) {
addValue(entities, e.source);
}
}
} else if (serviceName == "cloudresourcemanager.googleapis.com") {
if (ctx.json?.protoPayload?.request?.policy?.bindings instanceof List) {
for (def e: ctx.json.protoPayload.request.policy.bindings) {
addValue(entities, e.role);
for (def m: e.members) {
addValue(entities, m);
}
}
}
if (ctx.json?.protoPayload?.response?.bindings instanceof List) {
for (def e: ctx.json.protoPayload.response.bindings) {
addValue(entities, e.role);
for (def m: e.members) {
addValue(entities, m);
}
}
}
} else if (serviceName == "iamcredentials.googleapis.com") {
if (ctx.json?.protoPayload?.metadata?.identityDelegationChain instanceof List) {
for (def e: ctx.json.protoPayload.metadata.identityDelegationChain) {
addValue(entities, e);
}
}
}
ctx.related = ctx.related ?: [:];
ctx.related.entity = entities;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree with you. But this is really taste based, right. Previously, the code was on loops and add values. I received the comment suggesting to add another function to sugar coat it.
If we do it, we gotta update this PR, which is a lot of code, and refactoring painless is very painful.
Could you please double check with your team what is your preference?
I personally don't mind too much either. I do believe that consistency is the key here, and both have its readability ok.
Let us know what is the preferred way to go from your team's standpoint please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efd6 I appreciate your concerns about clarity of code's purpose. Thank you for including a neat rework with iterative approach.
It would be great if we could establish what is the preferred approach for this and future PRs. Given Cloud Security's roadmap and plans to enrich related
, target
, and source
for Cloud Provider logs, it would help avoid us doing the guesswork and introducing inconsistent code. It would also help answer your questions and alleviate any concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's partially taste, but the rest of the integration does not use the fluent™ style and the additional code that was required to make it work (the bool returns for example) indicate that the code is fighting against the intention. The previous code was also allocating in some paths in order get the behaviour that an if
statement gives. The fact that a bug was hiding in the original code and was not apparent until the rewrite further argues against it.
Sugar is good in limited quantities, when it becomes a complete glaze, you end up with Dunkin Donuts, and I for one don't want that. With for
loops and if
statements, we have a commonly understood language, the fluent™ approach is another set of things that need to be understood.
If we do it, we gotta update this PR
That PR is in a package and data stream that is owned by another team, so their opinions matter more. I will note though that the language features that are used in this PR and that one are used nowhere else in the repo except in code that was introduced in #11115. There are uses of stream()
in some places for filtering, but largely not in cases where a for
loop would suffice as is the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the time to refactor the code. I've applied the changes and fixed the tests where new code changed related entities' order.
boolean addedAll = true; | ||
for (String value : values) { | ||
addedAll = addedAll && addValue(entities, value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return true for the case that values.length
is zero. This does not seem right to me.
However, the function is never used in an expression context. Why is it returning anything at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I implemented this on the previous PR, it was initially void. But once I called addValues
(or addValue
) inside of a stream().forEach()
the loop was not scanning all the list members unless I returned true.
That, however doesn't make any sense, given the interface of stream (docs)
We could test more, maybe something I did was wrong, and not the return.
But I remember at the time being just wanting to solve the issue and thinking "Well, I don't really know the underlying implementation of painless, could be that lambdas have a different behaviour than plain java".
If it turns out I was wrong, I can fix here too: #11245
packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
packages/gcp/data_stream/audit/elasticsearch/ingest_pipeline/default.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff! Left some improvement comments and typo, but overall is looking good!
💚 Build Succeeded
History
cc @kubasobon |
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please wait for at least one other since I'm essentially reviewing my own code.
Thank you @efd6. @elastic/obs-infraobs-integrations & @elastic/obs-ds-hosted-services, I'm kindly requesting a review. |
Proposed commit message
This PR introduces an ECS field called
related.entity
containing IDs of GCP resources tied to any audit log event.Closes https://github.com/elastic/security-team/issues/10392
Background and reasoning
Quoting the original explanation @romulets gave in #11115:
tldr; Adding
related.entity
to cloudtrail is part of the initial steps of CDR (epic overview). That enables our customers to better correlate events based on entities and enhances the investigation workflows for the Cloud Security use case.ECS
related.entity
PRBackground
Elastic Cloud Security Team has been focusing, this past year, on Cloud Detection and Response (CDR). One of the first steps towards the CDR vision is to enhance investigation workflows for the Cloud Security use-case in SIEM.
As part of enhancing investigation workflows it's necessary to be able to correlate events and entities. Meaning, if an alert is triggered on the ec2 instance
i-000000000
, it is of great value to easily be able to search all the events related to that entity, across multiple indices, with one query. Therefore we are working on extracting entities and enabling them to be correlated.Why
related.entity
With this background, we've researched a few options on what would be the best approach to enable such feature (discussions https://github.com/elastic/security-team/issues/10026 and https://github.com/elastic/security-team/issues/9798, and outcomes https://github.com/elastic/security-team/issues/10152), and arrived at the ecs field
related
.Based on the
related
description:To add a broad
related.entity
field that can hold any needed identifier to pivot data on seems to be well fitted. This would enable customers to simply runrelated.entity: "i-000000000"
and get all the hits to that specific cloud resource.What is an
entity
?An "entity" in our context refers to any discrete component within an IT environment that can be uniquely identified and monitored. This broad term encompasses both managed and unmanaged elements.
The term "entity" is broader than the current set of available fields under
related
. Althoughip
,user
andhosts
can be identities, there is a lack of space to represent messaging queues, load balancers, storage systems, databases and others. Therefore the proposal to add a new field.