Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access denied to clusters that don't have Unity Catalog enabled #175

Open
sw33zy opened this issue Oct 23, 2024 · 15 comments
Open

Access denied to clusters that don't have Unity Catalog enabled #175

sw33zy opened this issue Oct 23, 2024 · 15 comments

Comments

@sw33zy
Copy link

sw33zy commented Oct 23, 2024

Hello,

I was following the CI/CD setup guide, I set up the service principals, assigned them to each workspace and set up the secrets on the repo. I am getting this error on the CI tests. I can't seem to understand why this is happening? Could it be a permission issue? The service principals were created using an adaptation of https://github.com/databricks/terraform-databricks-mlops-azure-infrastructure-with-sp-creation

Run databricks bundle validate -t prod > ../validate_output.txt
Error: unexpected error handling request: invalid character 'I' looking for beginning of value. This is likely a bug in the Databricks SDK for Go or the underlying REST API. Please report this issue with the following debugging information to the SDK issue tracker at https://github.com/databricks/databricks-sdk-go/issues. Request log:

GET /api/2.0/preview/scim/v2/Me
> * Host: 
> * Accept: application/json
> * Authorization: REDACTED
> * Traceparent: 00-7075d2979495e648a7ccd4406c563630-6ed9d4deea375c14-01
> * User-Agent: cli/0.221.0 databricks-sdk-go/0.42.0 go/1.21.10 os/linux cmd/bundle_validate auth/azure-client-secret cicd/github
> * X-Databricks-Azure-Sp-Management-Token: ***
< HTTP/2.0 403 Forbidden
< * Content-Length: 20
< * Content-Type: text/html; charset=utf-8
< * Date: Wed, 23 Oct 2024 09:06:27 GMT
< * Server: databricks
< * X-Request-Id: dfd848c3-f791-40f7-82e6-ca8e576f42[19](https://github.com/omniumai/cibus-ai-factory/actions/runs/11457054252/job/31937194588?pr=3#step:4:20)
< Invalid access token
@arpitjasa-db
Copy link
Collaborator

@sw33zy can you try using the PAT yourself (via the REST API for example)? That way we can isolate if the issue is with the PAT or the CLI

@sw33zy
Copy link
Author

sw33zy commented Oct 24, 2024

@arpitjasa-db here is what i did. First i created a Microsoft Entra ID access token.

curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://login.microsoftonline.com/<tenant-id>/oauth2/v2.0/token \
-d 'client_id=<client-id>' \
-d 'grant_type=client_credentials' \
-d 'scope=2ff814a6-3304-4ab8-85cb-cd0e6f879c1d%2F.default' \
-d 'client_secret=<client-secret>'

Added a new profile with this token and then databricks clusters list -p <profile-name-that-references-azure-ad-access-token>

Which returned:

GET /api/2.1/clusters/list
> * Host: 
> * Accept: application/json
> * Authorization: REDACTED
> * Traceparent: 00-f76158419d499361c4e265dffb9294e9-b95b19b64594917c-01
> * User-Agent: cli/0.230.0 databricks-sdk-go/0.48.0 go/1.22.7 os/linux cmd/clusters_list sdk-feature/pagination auth/pat
< HTTP/2.0 403 Forbidden
< * Access-Control-Allow-Headers: Authorization, X-Databricks-Azure-Workspace-Resource-Id, X-Databricks-Org-Id, Content-Type
< * Access-Control-Allow-Origin: *
< * Cache-Control: no-cache, no-store, must-revalidate
< * Content-Length: 20
< * Content-Type: text/html; charset=utf-8
< * Date: Thu, 24 Oct 2024 10:02:26 GMT
< * Expires: 0
< * Pragma: no-cache
< * Server: databricks
< * X-Request-Id: de7abb05-0612-4b4d-a89f-300bfa0d1fcb
< Invalid access token

Update: it is something related to my terraform. I created a service principal manually and it worked. The only thing i can notice a difference is the terraform created sps did not have User.Read API permissions.

Is there any way you could update https://github.com/databricks/terraform-databricks-mlops-azure-infrastructure-with-sp-creation? I adapted it since it was a little outdated but clearly I must be missing something

@arpitjasa-db
Copy link
Collaborator

Yeah we set those manually in the module. The module is just Terraform code so you can adapt it yourself, but we have plans to deprecate these modules anyway since we don't really have much support for them anymore

@sw33zy
Copy link
Author

sw33zy commented Oct 28, 2024

Ok it was in fact something related to my terraform, I was missing a managed application in local directory for each service principal.

Now I'm getting a new error when running databricks bundle run model_training_job -t test within cibus_ai_factory-run-tests.yml

PERMISSION_DENIED: Access denied to clusters that don't have Unity Catalog enabled

This happens when logging the model. The cluster created to run the job has access mode set to Custom by default which do not have access to UC. I can't seem to find how to change this access mode in the new_cluster job settings

Edit:

Added

    data_security_mode: SINGLE_USER
    single_user_name: ${workspace.current_user.userName}

to the new_cluster settings but I'm getting a different error:

RestException: PERMISSION_DENIED: request not authorized
File <command-2648993314149653>, line 5
      2 input_example = X_train.iloc[[0]]
      4 # Log the trained model with MLflow
----> 5 mlflow.lightgbm.log_model(
      6     model, 
      7     artifact_path="lgb_model", 
      8     # The signature is automatically inferred from the input example and its predicted output.
      9     input_example=input_example,    
     10     registered_model_name=model_name
     11 )
     13 # The returned model URI is needed by the model deployment notebook.
     14 model_version = get_latest_model_version(model_name)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/lightgbm/__init__.py:367, in log_model(lgb_model, artifact_path, conda_env, code_paths, registered_model_name, signature, input_example, await_registration_for, pip_requirements, extra_pip_requirements, metadata, **kwargs)
    278 @format_docstring(LOG_MODEL_PARAM_DOCS.format(package_name=FLAVOR_NAME))
    279 def log_model(
    280     lgb_model,
   (...)
    291     **kwargs,
    292 ):
    293     """
    294     Log a LightGBM model as an MLflow artifact for the current run.
    295 
   (...)
    365                     'model/requirements.txt']
    366     """
--> 367     return Model.log(
    368         artifact_path=artifact_path,
    369         flavor=mlflow.lightgbm,
    370         registered_model_name=registered_model_name,
    371         lgb_model=lgb_model,
    372         conda_env=conda_env,
    373         code_paths=code_paths,
    374         signature=signature,
    375         input_example=input_example,
    376         await_registration_for=await_registration_for,
    377         pip_requirements=pip_requirements,
    378         extra_pip_requirements=extra_pip_requirements,
    379         metadata=metadata,
    380         **kwargs,
    381     )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/models/model.py:670, in Model.log(cls, artifact_path, flavor, registered_model_name, await_registration_for, metadata, run_id, **kwargs)
    668         _logger.debug("", exc_info=True)
    669     if registered_model_name is not None:
--> 670         mlflow.tracking._model_registry.fluent._register_model(
    671             f"runs:/{run_id}/{mlflow_model.artifact_path}",
    672             registered_model_name,
    673             await_registration_for=await_registration_for,
    674             local_model_path=local_path,
    675         )
    676 return mlflow_model.get_model_info()
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/tracking/_model_registry/fluent.py:112, in _register_model(model_uri, name, await_registration_for, tags, local_model_path)
    109     source = RunsArtifactRepository.get_underlying_uri(model_uri)
    110     (run_id, _) = RunsArtifactRepository.parse_runs_uri(model_uri)
--> 112 create_version_response = client._create_model_version(
    113     name=name,
    114     source=source,
    115     run_id=run_id,
    116     tags=tags,
    117     await_creation_for=await_registration_for,
    118     local_model_path=local_model_path,
    119 )
    120 eprint(
    121     f"Created version '{create_version_response.version}' of model "
    122     f"'{create_version_response.name}'."
    123 )
    124 return create_version_response
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/tracking/client.py:2714, in MlflowClient._create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path)
   2706     # NOTE: we can't easily delete the target temp location due to the async nature
   2707     # of the model version creation - printing to let the user know.
   2708     eprint(
   2709         f"=== Source model files were copied to {new_source}"
   2710         + " in the model registry workspace. You may want to delete the files once the"
   2711         + " model version is in 'READY' status. You can also find this location in the"
   2712         + " `source` field of the created model version. ==="
   2713     )
-> 2714 return self._get_registry_client().create_model_version(
   2715     name=name,
   2716     source=new_source,
   2717     run_id=run_id,
   2718     tags=tags,
   2719     run_link=run_link,
   2720     description=description,
   2721     await_creation_for=await_creation_for,
   2722     local_model_path=local_model_path,
   2723 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/tracking/_model_registry/client.py:215, in ModelRegistryClient.create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path)
    213 arg_names = _get_arg_names(self.store.create_model_version)
    214 if "local_model_path" in arg_names:
--> 215     mv = self.store.create_model_version(
    216         name,
    217         source,
    218         run_id,
    219         tags,
    220         run_link,
    221         description,
    222         local_model_path=local_model_path,
    223     )
    224 else:
    225     # Fall back to calling create_model_version without
    226     # local_model_path since old model registry store implementations may not
    227     # support the local_model_path argument.
    228     mv = self.store.create_model_version(name, source, run_id, tags, run_link, description)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:731, in UcModelRegistryStore.create_model_version(self, name, source, run_id, tags, run_link, description, local_model_path)
    727 model_version = self._call_endpoint(
    728     CreateModelVersionRequest, req_body, extra_headers=extra_headers
    729 ).model_version
    730 version_number = model_version.version
--> 731 scoped_token = self._get_temporary_model_version_write_credentials(
    732     name=full_name, version=version_number
    733 )
    734 store = get_artifact_repo_from_storage_info(
    735     storage_location=model_version.storage_location, scoped_token=scoped_token
    736 )
    737 store.log_artifacts(local_dir=local_model_dir, artifact_path="")
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:490, in UcModelRegistryStore._get_temporary_model_version_write_credentials(self, name, version)
    474 """
    475 Get temporary credentials for uploading model version files
    476 
   (...)
    483     temporary model version credentials.
    484 """
    485 req_body = message_to_json(
    486     GenerateTemporaryModelVersionCredentialsRequest(
    487         name=name, version=version, operation=MODEL_VERSION_OPERATION_READ_WRITE
    488     )
    489 )
--> 490 return self._call_endpoint(
    491     GenerateTemporaryModelVersionCredentialsRequest, req_body
    492 ).credentials
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/store/model_registry/base_rest_store.py:44, in BaseRestStore._call_endpoint(self, api, json_body, call_all_endpoints, extra_headers)
     42 else:
     43     endpoint, method = self._get_endpoint_from_method(api)
---> 44     return call_endpoint(
     45         self.get_host_creds(), endpoint, method, json_body, response_proto, extra_headers
     46     )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/utils/rest_utils.py:220, in call_endpoint(host_creds, endpoint, method, json_body, response_proto, extra_headers)
    218     call_kwargs["json"] = json_body
    219     response = http_request(**call_kwargs)
--> 220 response = verify_rest_response(response, endpoint)
    221 js_dict = json.loads(response.text)
    222 parse_dict(js_dict=js_dict, message=response_proto)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-139e38f2-1f4a-4ba5-b3af-f75269017c1e/lib/python3.11/site-packages/mlflow/utils/rest_utils.py:152, in verify_rest_response(response, endpoint)
    150 if response.status_code != 200:
    151     if _can_parse_as_json_object(response.text):
--> 152         raise RestException(json.loads(response.text))
    153     else:
    154         base_msg = (
    155             f"API request to endpoint {endpoint} "
    156             f"failed with error code {response.status_code} != 200"
    157         )
    

@sw33zy sw33zy changed the title Invalid access token Access denied to clusters that don't have Unity Catalog enabled Oct 28, 2024
@arpitjasa-db
Copy link
Collaborator

Yeah this is similar to #173 where the type of UC cluster used creates some confusion, and generally single-user mode is the most feature-compatible, but the permissions around it creates some issues. When you got the above error, were you running the cluster as the SP and the owner of the cluster is also the SP?

@sw33zy
Copy link
Author

sw33zy commented Oct 30, 2024

The cluster was created and ran as the SP, but it shows me as the owner

@arpitjasa-db
Copy link
Collaborator

If you try re-assigning the cluster owner to the SP, does this resolve the issue? It would also help to confirm that the SP has all the right permissions on the catalog/schema/model as well

@sw33zy
Copy link
Author

sw33zy commented Oct 30, 2024

How can I achieve that? A new cluster is created each time the job is ran.

new_cluster: &new_cluster
  new_cluster:
    data_security_mode: SINGLE_USER
    single_user_name: ${workspace.current_user.userName}
    num_workers: 3
    spark_version: 15.3.x-cpu-ml-scala2.12
    node_type_id: Standard_D3_v2
    custom_tags:
      clusterSource: mlops-stacks_0.4

common_permissions: &permissions
  permissions:
    - level: CAN_VIEW
      group_name: users

resources:
  jobs:
    model_training_job:
      name: ${bundle.target}-cibus_ai_factory-model-training-job
...

Unless I create a dedicated SP cluster, re-assign the cluster owner to the SP, and then use it instead of creating a new one each time

@arpitjasa-db
Copy link
Collaborator

Oh so then how were you able to see that you're the owner of the cluster?

One more thing to confirm is that once you deploy the bundle to a Databricks workspace, you can see the details of what gets deployed by going into the home directory of the SP and clicking into the .bundle folder to inspect the Terraform state (the terraform.tfstate file)
Screenshot 2024-10-30 at 9 29 22 AM

@sw33zy
Copy link
Author

sw33zy commented Oct 30, 2024

image

I checked the details of the cluster associated with the job. It's on the automatically added tags.

@arpitjasa-db I checked and i do also have that .bundle folder with the terraform state within the SP directory

@arpitjasa-db
Copy link
Collaborator

@arpitjasa-db I checked and i do also have that .bundle folder with the terraform state within the SP directory

Could you confirm the Terraform state deploys as expected? And who it specifies as the creator/what policy is attached?

@sw33zy
Copy link
Author

sw33zy commented Nov 5, 2024

{
      "mode": "managed",
      "type": "databricks_job",
      "name": "model_training_job",
      "provider": "provider[\"registry.terraform.io/databricks/databricks\"]",
      "instances": [
        {
          "schema_version": 2,
          "attributes": {
            "always_running": false,
            "continuous": [],
            "control_run_state": false,
            "dbt_task": [],
            "deployment": [
              {
                "kind": "BUNDLE",
                "metadata_file_path": "/Workspace/Users/097f46c6-a018-4758-b083-9a0150d547ed/.bundle/cibus_ai_factory/test/state/metadata.json"
              }
            ],
            "description": null,
            "edit_mode": "UI_LOCKED",
            "email_notifications": [
              {
                "no_alert_for_skipped_runs": false,
                "on_duration_warning_threshold_exceeded": [],
                "on_failure": [],
                "on_start": [],
                "on_streaming_backlog_exceeded": [],
                "on_success": []
              }
            ],
            "environment": [],
            "existing_cluster_id": null,
            "format": "MULTI_TASK",
            "git_source": [],
            "health": [],
            "id": "1006074957047174",
            "job_cluster": [
              {
                "job_cluster_key": "model_training_job_cluster",
                "new_cluster": [
                  {
                    "apply_policy_default_values": false,
                    "autoscale": [],
                    "aws_attributes": [],
                    "azure_attributes": [
                      {
                        "availability": "ON_DEMAND_AZURE",
                        "first_on_demand": 0,
                        "log_analytics_info": [],
                        "spot_bid_max_price": 0
                      }
                    ],
                    "cluster_id": "",
                    "cluster_log_conf": [],
                    "cluster_mount_info": [],
                    "cluster_name": "",
                    "custom_tags": {
                      "clusterSource": "mlops-stacks_0.4"
                    },
                    "data_security_mode": "SINGLE_USER",
                    "docker_image": [],
                    "driver_instance_pool_id": "",
                    "driver_node_type_id": "",
                    "enable_elastic_disk": true,
                    "enable_local_disk_encryption": false,
                    "gcp_attributes": [],
                    "idempotency_token": "",
                    "init_scripts": [],
                    "instance_pool_id": "",
                    "library": [],
                    "node_type_id": "Standard_D3_v2",
                    "num_workers": 3,
                    "policy_id": "",
                    "runtime_engine": "",
                    "single_user_name": "097f46c6-a018-4758-b083-9a0150d547ed",
                    "spark_conf": {},
                    "spark_env_vars": {},
                    "spark_version": "15.3.x-cpu-ml-scala2.12",
                    "ssh_public_keys": [],
                    "workload_type": []
                  }
                ]
              }
            ],
            "library": [],
            "max_concurrent_runs": 1,
            "max_retries": 0,
            "min_retry_interval_millis": 0,
            "name": "test-cibus_ai_factory-model-training-job",
            "new_cluster": [],
            "notebook_task": [],
            "notification_settings": [],
            "parameter": [],
            "pipeline_task": [],
            "python_wheel_task": [],
            "queue": [
              {
                "enabled": true
              }
            ],
            "retry_on_timeout": false,
            "run_as": [
              {
                "service_principal_name": "097f46c6-a018-4758-b083-9a0150d547ed",
                "user_name": ""
              }
            ],
            "run_job_task": [],
            "schedule": [
              {
                "pause_status": "UNPAUSED",
                "quartz_cron_expression": "0 0 9 * * ?",
                "timezone_id": "UTC"
              }
            ],
            "spark_jar_task": [],
            "spark_python_task": [],
            "spark_submit_task": [],
            "tags": null,
            "task": [
              {
                "condition_task": [],
                "dbt_task": [],
                "depends_on": [
                  {
                    "outcome": "",
                    "task_key": "ModelValidation"
                  }
                ],
                "description": "",
                "disable_auto_optimization": false,
                "email_notifications": [
                  {
                    "no_alert_for_skipped_runs": false,
                    "on_duration_warning_threshold_exceeded": [],
                    "on_failure": [],
                    "on_start": [],
                    "on_streaming_backlog_exceeded": [],
                    "on_success": []
                  }
                ],
                "environment_key": "",
                "existing_cluster_id": "",
                "for_each_task": [],
                "health": [],
                "job_cluster_key": "model_training_job_cluster",
                "library": [],
                "max_retries": 0,
                "min_retry_interval_millis": 0,
                "new_cluster": [],
                "notebook_task": [
                  {
                    "base_parameters": {
                      "env": "test",
                      "git_source_info": "url:https://github.com/omniumai/cibus-ai-factory; branch:; commit:d36174fe093680ae9b97584842a7a6a257c51dbc"
                    },
                    "notebook_path": "/Workspace/Users/097f46c6-a018-4758-b083-9a0150d547ed/.bundle/cibus_ai_factory/test/files/deployment/model_deployment/notebooks/ModelDeployment",
                    "source": "WORKSPACE",
                    "warehouse_id": ""
                  }
                ],
                "notification_settings": [],
                "pipeline_task": [],
                "python_wheel_task": [],
                "retry_on_timeout": false,
                "run_if": "ALL_SUCCESS",
                "run_job_task": [],
                "spark_jar_task": [],
                "spark_python_task": [],
                "spark_submit_task": [],
                "sql_task": [],
                "task_key": "ModelDeployment",
                "timeout_seconds": 0,
                "webhook_notifications": []
              },
              {
                "condition_task": [],
                "dbt_task": [],
                "depends_on": [
                  {
                    "outcome": "",
                    "task_key": "Train"
                  }
                ],
                "description": "",
                "disable_auto_optimization": false,
                "email_notifications": [
                  {
                    "no_alert_for_skipped_runs": false,
                    "on_duration_warning_threshold_exceeded": [],
                    "on_failure": [],
                    "on_start": [],
                    "on_streaming_backlog_exceeded": [],
                    "on_success": []
                  }
                ],
                "environment_key": "",
                "existing_cluster_id": "",
                "for_each_task": [],
                "health": [],
                "job_cluster_key": "model_training_job_cluster",
                "library": [],
                "max_retries": 0,
                "min_retry_interval_millis": 0,
                "new_cluster": [],
                "notebook_task": [
                  {
                    "base_parameters": {
                      "custom_metrics_loader_function": "custom_metrics",
                      "enable_baseline_comparison": "false",
                      "evaluator_config_loader_function": "evaluator_config",
                      "experiment_name": "/Users/097f46c6-a018-4758-b083-9a0150d547ed/test-cibus_ai_factory-experiment",
                      "git_source_info": "url:https://github.com/omniumai/cibus-ai-factory; branch:; commit:d36174fe093680ae9b97584842a7a6a257c51dbc",
                      "model_type": "regressor",
                      "run_mode": "dry_run",
                      "targets": "fare_amount",
                      "validation_input": "SELECT * FROM delta.`dbfs:/databricks-datasets/nyctaxi-with-zipcodes/subsampled`",
                      "validation_thresholds_loader_function": "validation_thresholds"
                    },
                    "notebook_path": "/Workspace/Users/097f46c6-a018-4758-b083-9a0150d547ed/.bundle/cibus_ai_factory/test/files/validation/notebooks/ModelValidation",
                    "source": "WORKSPACE",
                    "warehouse_id": ""
                  }
                ],
                "notification_settings": [],
                "pipeline_task": [],
                "python_wheel_task": [],
                "retry_on_timeout": false,
                "run_if": "ALL_SUCCESS",
                "run_job_task": [],
                "spark_jar_task": [],
                "spark_python_task": [],
                "spark_submit_task": [],
                "sql_task": [],
                "task_key": "ModelValidation",
                "timeout_seconds": 0,
                "webhook_notifications": []
              },
              {
                "condition_task": [],
                "dbt_task": [],
                "depends_on": [],
                "description": "",
                "disable_auto_optimization": false,
                "email_notifications": [
                  {
                    "no_alert_for_skipped_runs": false,
                    "on_duration_warning_threshold_exceeded": [],
                    "on_failure": [],
                    "on_start": [],
                    "on_streaming_backlog_exceeded": [],
                    "on_success": []
                  }
                ],
                "environment_key": "",
                "existing_cluster_id": "",
                "for_each_task": [],
                "health": [],
                "job_cluster_key": "model_training_job_cluster",
                "library": [],
                "max_retries": 0,
                "min_retry_interval_millis": 0,
                "new_cluster": [],
                "notebook_task": [
                  {
                    "base_parameters": {
                      "env": "test",
                      "experiment_name": "/Users/097f46c6-a018-4758-b083-9a0150d547ed/test-cibus_ai_factory-experiment",
                      "git_source_info": "url:https://github.com/omniumai/cibus-ai-factory; branch:; commit:d36174fe093680ae9b97584842a7a6a257c51dbc",
                      "model_name": "test.cibus_ai_factory.cibus_ai_factory-model",
                      "training_data_path": "/databricks-datasets/nyctaxi-with-zipcodes/subsampled"
                    },
                    "notebook_path": "/Workspace/Users/097f46c6-a018-4758-b083-9a0150d547ed/.bundle/cibus_ai_factory/test/files/training/notebooks/Train",
                    "source": "WORKSPACE",
                    "warehouse_id": ""
                  }
                ],
                "notification_settings": [],
                "pipeline_task": [],
                "python_wheel_task": [],
                "retry_on_timeout": false,
                "run_if": "ALL_SUCCESS",
                "run_job_task": [],
                "spark_jar_task": [],
                "spark_python_task": [],
                "spark_submit_task": [],
                "sql_task": [],
                "task_key": "Train",
                "timeout_seconds": 0,
                "webhook_notifications": []
              }
            ],
            "timeout_seconds": 0,
            "timeouts": null,
            "trigger": [],
            "url": "https://adb-1601428151337150.10.azuredatabricks.net/#job/1006074957047174",
            "webhook_notifications": [
              {
                "on_duration_warning_threshold_exceeded": [],
                "on_failure": [],
                "on_start": [],
                "on_streaming_backlog_exceeded": [],
                "on_success": []
              }
            ]
          },
          "sensitive_attributes": [],
          "private": "eyJlMmJmYjczMC1lY2FhLTExZTYtOGY4OC0zNDM2M2JjN2M0YzAiOnsiY3JlYXRlIjoxODAwMDAwMDAwMDAwLCJ1cGRhdGUiOjE4MDAwMDAwMDAwMDB9LCJzY2hlbWFfdmVyc2lvbiI6IjIifQ=="
        }
      ]
    },

This is what I found regarding the model training job. The Terraform state does not mention a creator anywhere. The data_security_mode is set to SINGLE_USER which corresponds to the changes I made on the cluster creation.

@arpitjasa-db
Copy link
Collaborator

@sw33zy can you confirm that the value for single_user_name is the same as your SP application ID? What happens if you omit this field?

@sw33zy
Copy link
Author

sw33zy commented Nov 6, 2024

Yes, it is the same. When I omit it it doesn't change anything, I get the same PERMISSION_DENIED: request not authorized exception (I also confirmed on the Terraform state that this change was deployed correctly).

@arpitjasa-db
Copy link
Collaborator

Hmm we're not able to reproduce this error on our end, let's try to isolate what part is causing this issue. Could you try running the job manually from the UI? And if that has the same error, could you try running the notebook directly (maybe with Serverless or a different UC cluster)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants