Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only run DlsFlsValveImpl.invoke on indices requests #4937

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

cwperks
Copy link
Member

@cwperks cwperks commented Nov 25, 2024

Description

Companion PR in ISM: opensearch-project/index-management#1311

Fixes an issue seen in ISM's tests after the merge of #4380.

Failing test example: https://github.com/opensearch-project/index-management/actions/runs/12013427116/job/33486922892?pr=1310

Failing test error message:

[2024-11-25T12:58:24,055][DEBUG][o.o.i.t.a.g.TransportGetTransformAction] [smoketestnode] User and roles string from thread context: john|helpdesk_staff|own_index,helpdesk_role
[2024-11-25T12:58:24,357][DEBUG][o.o.i.t.a.g.TransportGetTransformAction] [smoketestnode] User and roles string from thread context: testUser|test_role_backend|own_index,test_role
[2024-11-25T12:58:24,365][INFO ][o.o.s.p.PrivilegesEvaluator] [smoketestnode] No cluster-level perm match for User [name=testUser, backend_roles=[test_role_backend], requestedTenant=null] Resolved [aliases=[*], allIndices=[*], types=[*], originalRequested=[*], remoteIndices=[]] [Action [cluster:admin/opendistro/transform/stop]] [RolesChecked [own_index, test_role]]. No permissions for [cluster:admin/opendistro/transform/stop]
[2024-11-25T12:58:24,369][WARN ][r.suppressed             ] [smoketestnode] path: /_plugins/_transform/id_1/_stop, params: {transformID=id_1}
org.opensearch.OpenSearchSecurityException: Update is not supported when FLS or DLS or Fieldmasking is activated
	at org.opensearch.security.configuration.DlsFlsValveImpl.invoke(DlsFlsValveImpl.java:268) [opensearch-security-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
	at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:392) [opensearch-security-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
	at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:166) [opensearch-security-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
	at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:190) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.action.support.TransportAction.execute(TransportAction.java:109) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:112) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:99) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:486) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.indexmanagement.transform.resthandler.RestStopTransformAction.prepareRequest$lambda$0(RestStopTransformAction.kt:41) [opensearch-index-management-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
	at org.opensearch.rest.BaseRestHandler.handleRequest(BaseRestHandler.java:128) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.security.filter.SecurityRestFilter$AuthczRestHandler.handleRequest(SecurityRestFilter.java:192) [opensearch-security-3.0.0.0-SNAPSHOT.jar:3.0.0.0-SNAPSHOT]
	at org.opensearch.rest.RestController.dispatchRequest(RestController.java:381) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.rest.RestController.tryAllHandlers(RestController.java:467) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
	at org.opensearch.rest.RestController.dispatchRequest(RestController.java:287) [opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT

ISM has actions like the StopTransformRequest that subclass UpdateRequest since behind the scenes, an entry in the ISM system index is updated and to run the request requires passing the ID of the transform with the request. This request is a cluster request meaning that is granted based on the action name alone without having to associate it with indices (behind the scenes the plugin updates its system index entry).

DlsFlsValveImpl.invoke is being called on this request and it is failing because the resolved request is resolving to all indices and the user calling that Stop Transform API has partial index (airlines-*) access.

This PR updates the logic in DlsFlsValveImpl.invoke to not run if the action is not prefixed with indices:

  • Category (Enhancement, New feature, Bug fix, Test fix, Refactoring, Maintenance, Documentation)

Bug fix

Issues Resolved

Resolves: opensearch-project/index-management#1305

Check List

  • New functionality includes testing
  • New functionality has been documented
  • New Roles/Permissions have a corresponding security dashboards plugin PR
  • API changes companion pull request created
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@cwperks
Copy link
Member Author

cwperks commented Nov 25, 2024

FYI @nibix

Signed-off-by: Craig Perkins <[email protected]>
@@ -135,6 +137,10 @@ public DlsFlsValveImpl(
*/
@Override
public boolean invoke(PrivilegesEvaluationContext context, final ActionListener<?> listener) {
if (!isIndexPerm(context.getAction())) {
Copy link
Collaborator

@nibix nibix Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that the valve code needs to be at least also called for scroll requests, otherwise FLS and field masking might be not correctly applied.

Generally, as the issue seems to be of a very specific kind, I am wondering whether the fix should be equally specific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed a commit to check the action prefix alone. If the action prefix is not indices:, then DlsFlsValveImpl would be skipped.

Copy link
Collaborator

@nibix nibix Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I think the fault is rather on the ISM side: It defines an action which has a request which extends UpdateRequest and thus in turn extends IndicesRequest - while the type string is cluster:admin/opendistro/transform/stop, i.e. it indicates it to be a cluster request. That seems to be a contradiction in itself.

I am a bit surprised that this should only occur after the merge of #4380. At least this basic behavior of DlsFlsValveImpl did not change. If this is really related to #4380, we should find out why #4380 changes this behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before #4380 , it would get short circuited here because the helpdesk_role has no restrictions. The new code does not use that data structure and instead calls on IndexNameExpressionResolver.getConcreteIndices and passes in * because the resolved request shows * as you pointed out in #4937 (comment)

@nibix
Copy link
Collaborator

nibix commented Nov 25, 2024

Two things are interesting IMHO:

  • [aliases=[*], allIndices=[*], types=[*], originalRequested=[*], remoteIndices=[]]: Why does the request resolve to * - all indices? Should it not be just its system index?
  • No permissions for [cluster:admin/opendistro/transform/stop]: Why does the DLS valve get called anyway after it was found that there are no permissions for cluster:admin/opendistro/transform/stop?

@cwperks
Copy link
Member Author

cwperks commented Nov 25, 2024

Two things are interesting IMHO:

  • [aliases=[*], allIndices=[*], types=[*], originalRequested=[*], remoteIndices=[]]: Why does the request resolve to * - all indices? Should it not be just its system index?
  • No permissions for [cluster:admin/opendistro/transform/stop]: Why does the DLS valve get called anyway after it was found that there are no permissions for cluster:admin/opendistro/transform/stop?

For the second one, that's actually because of the test itself: https://github.com/opensearch-project/index-management/blob/main/src/test/kotlin/org/opensearch/indexmanagement/TransformSecurityBehaviorIT.kt#L260-L261

It calls it first with a user that doesn't have permission and then again with a user that does.

This is where the helpdesk_role is created and assigned: https://github.com/opensearch-project/index-management/blob/main/src/test/kotlin/org/opensearch/indexmanagement/TransformSecurityBehaviorIT.kt#L44-L71

@nibix
Copy link
Collaborator

nibix commented Nov 26, 2024

Ok, I start to understand now. Still, I think the best fix would be to remove the UpdateRequest super class from StopTransformRequest and StartTransformRequest, as it is semantically wrong.

If that's not feasible, one could change this line to if (context.getAction().equals(UpdateAction.NAME)):

@cwperks
Copy link
Member Author

cwperks commented Nov 26, 2024

Ok, I start to understand now. Still, I think the best fix would be to remove the UpdateRequest super class from StopTransformRequest and StartTransformRequest, as it is semantically wrong.

If that's not feasible, one could change this line to if (context.getAction().equals(UpdateAction.NAME)):

Looks like this pattern is prevalent in the ISM codebase. StartRollupRequest, StopRollupRequest, StartSMRequest, StopSMRequest, StartTransformRequest and StopTransformRequest

Code search: https://github.com/search?q=repo%3Aopensearch-project%2Findex-management%20%22%3A%20UpdateRequest%22&type=code

@cwperks
Copy link
Member Author

cwperks commented Nov 27, 2024

@nibix There's another issue in ISM tests w/ security around cluster state updates:

  1> [2024-11-27T07:31:39,126][INFO ][o.o.i.i.r.RestRetryFailedManagedIndexActionIT] [test index not failed] after test
  2> REPRODUCE WITH: ./gradlew ':integTest' --tests "org.opensearch.indexmanagement.indexstatemanagement.resthandler.RestRetryFailedManagedIndexActionIT.test index not failed" -Dtests.seed=5DCAF5AB70EDDFFC -Dtests.security.manager=false -Dtests.locale=lg-Latn-UG -Dtests.timezone=Pacific/Johnston -Druntime.java=21
  2> java.lang.AssertionError: Still active threadpools in cluster: {rejected=0, node_name=smoketestnode, name=generic, active=1, queue=0}

When dumping out hot threads I see this:

> curl -XGET https://admin:myStrongPassword123\!@localhost:9200/_nodes/hot_threads -k
::: {smoketestnode}{LA7SwEzSSPiEB8vh0qAS4Q}{zlZj--YNQ8SPr-vFy8H4Cg}{192.168.1.152}{192.168.1.152:9300}{dimr}{shard_indexing_pressure_enabled=true}
   Hot threads at 2024-11-27T17:36:11.911Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

    0.0% (201micros out of 500ms) cpu usage by thread 'opensearch[smoketestnode][generic][T#7]'
     10/10 snapshots sharing following 11 elements
       [email protected]/java.lang.Thread.sleep0(Native Method)
       [email protected]/java.lang.Thread.sleep(Thread.java:509)
       org.opensearch.security.privileges.ClusterStateMetadataDependentPrivileges.lambda$updateClusterStateMetadataAsync$0(ClusterStateMetadataDependentPrivileges.java:52)
       org.opensearch.security.privileges.ClusterStateMetadataDependentPrivileges$$Lambda/0x000000c801de76a0.run(Unknown Source)
       [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
       [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:317)
       app//org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:932)
       [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
       [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
       [email protected]/java.lang.Thread.runWith(Thread.java:1596)
       [email protected]/java.lang.Thread.run(Thread.java:1583)

During the tests its failing due to active threads in the generic threadpool. ISM tests perform index operations frequently like creating indices and deleting all indices in the cluster.

Edit: At a certain point its getting reset to -1

[2024-11-27T13:05:01,853][INFO ][stdout                   ] [smoketestnode] ClusterStateMetadataDependentPrivileges.this.getCurrentlyUsedMetadataVersion(): 35
[2024-11-27T13:05:01,948][INFO ][stdout                   ] [smoketestnode] Thread ID: opensearch[smoketestnode][generic][T#2]
[2024-11-27T13:05:01,948][INFO ][stdout                   ] [smoketestnode] metadata.version(): 35
[2024-11-27T13:05:01,948][INFO ][stdout                   ] [smoketestnode] ClusterStateMetadataDependentPrivileges.this.getCurrentlyUsedMetadataVersion(): -1

Example test to run:

./gradlew integTest -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername=docker-cluster -Dsecurity=true -Dhttps=true -Duser=admin -Dpassword=myStrongPassword123\! -i -x :ktlint --tests "org.opensearch.indexmanagement.indexstatemanagement.resthandler.RestRetryFailedManagedIndexActionIT.test index not failed"

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.43%. Comparing base (a8447cc) to head (1369726).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4937      +/-   ##
==========================================
- Coverage   71.46%   71.43%   -0.03%     
==========================================
  Files         334      334              
  Lines       22517    22550      +33     
  Branches     3586     3589       +3     
==========================================
+ Hits        16091    16108      +17     
- Misses       4634     4648      +14     
- Partials     1792     1794       +2     
Files with missing lines Coverage Δ
...search/security/configuration/DlsFlsValveImpl.java 63.79% <100.00%> (-0.25%) ⬇️
...urity/privileges/dlsfls/DlsFlsProcessedConfig.java 100.00% <100.00%> (ø)

... and 10 files with indirect coverage changes

@cwperks
Copy link
Member Author

cwperks commented Nov 27, 2024

There's one other flavor of failure too:

Example test:

./gradlew integTest -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername=docker-cluster -Dsecurity=true -Dhttps=true -Duser=admin -Dpassword=myStrongPassword123\! -i -x :ktlint --tests "RestRemovePolicyActionIT.test remove policy on read only index update auto_manage setting"

It fails on this line: https://github.com/opensearch-project/index-management/blob/main/src/test/kotlin/org/opensearch/indexmanagement/indexstatemanagement/resthandler/RestRemovePolicyActionIT.kt#L203-L207

The response is an InternalServerError and its making the test exit early.

method [POST], host [https://localhost:9200], URI [/_plugins/_ism/remove/*index], status line [HTTP/2.0 500 Internal Server Error]
    {"error":{"root_cause":[{"type":"index_management_exception","reason":"Failed to remove policy because ISM auto_manage setting update requests failed with exception:"}],"type":"index_management_exception","reason":"Failed to remove policy because ISM auto_manage setting update requests failed with exception:","caused_by":{"type":"exception","reason":"java.lang.Exception: Failed to remove policy because ISM auto_manage setting update requests failed with exception:"}},"status":500}

Edit: This last error may be caused by having

if (!context.getAction().startsWith("indices:")) {
    return true;
}

at the top of DlsFlsValveImpl.invoke

Signed-off-by: Craig Perkins <[email protected]>
@@ -264,7 +265,7 @@ public boolean invoke(PrivilegesEvaluationContext context, final ActionListener<
}
}

if (request instanceof UpdateRequest) {
if (UpdateAction.NAME.equals(context.getAction())) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change wouldn't be needed if opensearch-project/index-management#1311 is merged

@nibix
Copy link
Collaborator

nibix commented Dec 2, 2024

@cwperks

There's another issue in ISM tests w/ security around cluster state updates:

Interesting. I tried to find more context in the logs linked at opensearch-project/index-management#1305 but could not really find the failure. Where did you see it?

@nibix
Copy link
Collaborator

nibix commented Dec 2, 2024

Good catch regarding 9de9c24 ... that might be already fixing the thread pool issue?

@cwperks
Copy link
Member Author

cwperks commented Dec 2, 2024

@cwperks

There's another issue in ISM tests w/ security around cluster state updates:

Interesting. I tried to find more context in the logs linked at opensearch-project/index-management#1305 but could not really find the failure. Where did you see it?

After running the test in the comment

./gradlew integTest -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername=docker-cluster -Dsecurity=true -Dhttps=true -Duser=admin -Dpassword=myStrongPassword123\! -i -x :ktlint --tests "org.opensearch.indexmanagement.indexstatemanagement.resthandler.RestRetryFailedManagedIndexActionIT.test index not failed"

you will see a thread remaining the generic threadpool when calling GET _cat/thread_pool?pretty (documentation). If you try to kill the OpenSearch node it will hang since there is an active thread in the cluster. When there was an active thread in the generic threadpool I called on /_nodes/hot_threads to understand where it was getting stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Integration Test Failed for index-management-2.19.0
2 participants