Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature eng dev mmt #157

Merged
merged 8 commits into from
Jul 23, 2024
Merged

Conversation

hengrumay
Copy link
Collaborator

Updating code with feature store (fs) to feature engineering(fe) wrt https://databricks.atlassian.net/browse/MLOPS-675

@@ -103,7 +103,7 @@ raw_data = spark.read.format("delta").load(input_table_path)
# Compute the features. This is done by dynamically loading the features module.
from importlib import import_module

mod = import_module(features_module)
mod = import_module(features_module) ##??
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this comment for?

Copy link
Collaborator Author

@hengrumay hengrumay May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Should have been omitted -- maybe I deleted after the commit -- will recommit


fs = feature_store.FeatureStoreClient()

from databricks.feature_engineering import FeatureEngineeringClient
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only do this if UC is enabled, you can see other examples where we use:
{{ if (eq .input_include_models_in_unity_catalog `no`) }}, or does this client work for non-UC cases as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK -- I think need to update with conditionals. I was wondering where they were but they need to be added!

@@ -116,11 +116,11 @@ features_df = compute_features_fn(
# COMMAND ----------
# DBTITLE 1, Write computed features.

{{- if (eq .input_include_models_in_unity_catalog "no") }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you changed this from your last commit?

Suggested change
{{- if (eq .input_include_models_in_unity_catalog "no") }}
{{- if (eq .input_include_models_in_unity_catalog `no`) }}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some failed tests and I noted previous conditionals in section above related to dbutils.wigets with {{- if (eq .input_include_models_in_unity_catalog "no") }} so thought to replace... should it be 'no' instead of "no" ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm generally we use backticks (`) instead of quotes (") or ('). You can see other examples here: https://github.com/search?q=repo%3Adatabricks%2Fmlops-stacks%20input_include_models_in_unity_catalog%20&type=code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inconsistent within the existing repo. I will use backticks in my updates :)

label="fare_amount",
exclude_columns=exclude_columns,
)
{{end -}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the - at the end means to remove space after this line. You should try generating this notebook in your local Stacks repo by using databricks bundle init and add screenshots of both UC and non-UC to make sure they look as expected

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try generating -- I didn't realize I could! Thanks

@arpitjasa-db
Copy link
Collaborator

@hengrumay FYI only two tests seem to be failing now and they're about links. We can wait once you're ready to merge and then fix broken links together

@hengrumay
Copy link
Collaborator Author

@arpitjasa-db: updated /deployment/batch_inference/predict.py.tmpl to include UC--FE conditional however still encountering links related failed tests -- might be good to figure that out together. Thanks!

{{ if (eq .input_include_feature_store `yes`) }}
from databricks.feature_store import FeatureStoreClient

{{ if (eq .input_include_feature_store `yes`) }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit space

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see latest updates -- will be adding links to screenshots

output_df = (
prediction_df.withColumn("prediction", prediction_df["prediction"])
.withColumn("model_version", lit(model_version))
.withColumn("inference_timestamp", to_timestamp(lit(ts)))
)
{{ else }}
{{ else }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit space

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see latest updates -- will be adding links to screenshots

@@ -233,6 +264,30 @@ training_set = fs.create_training_set(
exclude_columns=exclude_columns,
)

{{else}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add hyphen

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see latest updates -- will be adding links to screenshots

@hengrumay
Copy link
Collaborator Author

Well that was 'unexpected' wrt passed checks lol

@arpitjasa-db FYI: Screenshots for recent local init and review updates:
https://docs.google.com/presentation/d/1gs16VoVrQFOo1shsPq2eZNDLFQOOZfklqFXEgj1KEcE/edit?usp=sharing

@arpitjasa-db arpitjasa-db merged commit f932312 into databricks:main Jul 23, 2024
1 check passed
@hengrumay hengrumay self-assigned this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants