Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week3 #3

Merged
merged 7 commits into from
Nov 8, 2024
Merged

Week3 #3

merged 7 commits into from
Nov 8, 2024

Conversation

NiGroUni
Copy link
Collaborator

@NiGroUni NiGroUni commented Nov 4, 2024

Summary by CodeRabbit

  • New Features

    • Introduced functionality for serving machine learning models in Databricks.
    • Added comprehensive documentation for Week 3 materials on model and feature serving.
  • Bug Fixes

    • Corrected dependency specification for the lightgbm package.
  • Documentation

    • Updated setup instructions for creating virtual environments and logging into Databricks.
    • Enhanced clarity in README for model serving and feature serving processes.
  • Chores

    • Updated .gitignore to exclude unnecessary files and directories related to machine learning projects.
    • Removed outdated metadata files related to machine learning experiments and models.

@NiGroUni NiGroUni requested a review from a team as a code owner November 4, 2024 15:37
Copy link

coderabbitai bot commented Nov 4, 2024

Walkthrough

The pull request introduces several updates across various files to enhance the management of machine learning projects. Key changes include updates to the .gitignore file to exclude specific directories and files, modifications to the README.md for clearer setup instructions, and the addition of a new script for model serving. Additionally, several files related to model metadata were removed, and corrections were made to dependency specifications in pyproject.toml.

Changes

File Change Summary
.gitignore Added entries to ignore notebooks/week2/mlruns/, mlruns/, notebooks/*.json, and model_version.json.
README.md Updated commands for virtual environment setup and Databricks login; added --overwrite flag for file copying.
mlruns/0/meta.yaml Removed file containing metadata for MLflow experiment.
notebooks/week2/02_04_train_log_custom_model.py Added import for conda environment, example prediction step, and updated model logging with conda dependencies.
notebooks/week2/05.log_and_register_fe_model.py Changed function and parameter names for consistency; updated feature engineering setup.
notebooks/week2/model_version.json Removed file containing model metadata.
notebooks/week3/02.model_serving.py Introduced new script for serving a machine learning model, including HTTP request handling and load testing.
notebooks/week3/README.md Added overview of Week 3 materials covering feature serving, model serving, and A/B testing.
pyproject.toml Corrected dependency specification for lightgbm from === to ==.

Possibly related PRs

  • week2 updates #2: The changes in the .gitignore file to ignore mlruns/ and model_version.json are related to the updates in the model_version.json file in the retrieved PR, which also involves metadata for machine learning models.

Suggested reviewers

  • mvechtomova

Poem

🐰 In the meadow where data flows,
A rabbit hops where the model grows.
With scripts and logs, we pave the way,
Ignoring files that lead us astray.
In the world of code, we find our cheer,
For every change brings new ideas near! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (18)
pyproject.toml (2)

16-16: Consider using version ranges for dependencies.

While exact versions are good for reproducibility, they can be too rigid for maintenance. Consider using version ranges (e.g., lightgbm>=4.5.0,<5.0.0) to allow for compatible updates while maintaining stability.


Line range hint 7-21: Consider organizing dependencies into logical groups.

The dependencies list mixes core ML libraries, data processing tools, and Databricks-specific packages. Consider organizing them into groups using separate optional dependency sections (e.g., ml, databricks, core) for better maintainability.

Example structure:

[project.optional-dependencies]
ml = [
    "lightgbm==4.5.0",
    "scikit-learn==1.5.2",
    "mlflow==2.17.0"
]
databricks = [
    "databricks-feature-engineering==0.6",
    "databricks-feature-lookup==1.2.0",
    "databricks-sdk==0.32.0"
]
README.md (2)

25-25: Consider documenting the purpose of --all-extras

While the commands are correct, it would be helpful to explain what optional dependencies are included with --all-extras for better clarity.


38-38: Document the implications of using --overwrite

While adding the --overwrite flag simplifies the deployment process, consider adding a note about:

  1. Potential data loss when overwriting existing files
  2. Best practices for package versioning
  3. How to verify successful deployment
notebooks/week3/README.md (5)

1-11: Fix grammar in the overview section.

Add a comma after "Last week" for better readability.

-Last week we demonstrated model training and registering for different use cases.
+Last week, we demonstrated model training and registering for different use cases.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...red in this lecture. ## Overview Last week we demonstrated model training and regi...

(AI_HYDRA_LEO_MISSING_COMMA)


14-28: Specify language for code block.

Add Python language specification to the code block for proper syntax highlighting.

-```
+```python
 01.feature_serving.py

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[grammar] ~27-~27: The verb form ‘shows’ does not seem to match the subject ‘examples’.
Context: ...e lookups. The subsequent code examples shows how to invoke this endpoint and get res...

(SUBJECT_VERB_AGREEMENT_PLURAL)

</details>
<details>
<summary>🪛 Markdownlint</summary>

15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

---

`29-41`: **Fix formatting and grammar in Model Serving section.**

1. Add Python language specification to the code block.
2. Add missing article before "entity name".

```diff
-```
+```python
 02.model_serving.py

```diff
-It's important to note that entity name we pass is a registered model name
+It's important to note that the entity name we pass is a registered model name
🧰 Tools
🪛 LanguageTool

[uncategorized] ~37-~37: Possible missing article found.
Context: ... the model. It's important to note that entity name we pass is a registered model name...

(AI_HYDRA_LEO_MISSING_THE)

🪛 Markdownlint

30-30: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


42-54: Fix formatting and typos in Feature Look Up section.

  1. Add Python language specification to the code block.
  2. Use consistent list style (asterisks instead of dashes).
  3. Fix typo "registred" → "registered".
-```
+```python
 03.model_serving_feature_lookup.py

Convert list items from:
```markdown
- We start with creating...
- This online table is...

to:

* We start with creating...
* This online table is...

Fix typo:

-the model we registred in the same notebook
+the model we registered in the same notebook
🧰 Tools
🪛 LanguageTool

[typographical] ~50-~50: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...he table we created last week on week 2 - 05.log_and_register_fe_model.py noteboo...

(DASH_RULE)


[typographical] ~52-~52: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e registred in the same notebook week 2 - 05.log_and_register_fe_model.p. This is...

(DASH_RULE)

🪛 Markdownlint

50-50: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


51-51: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


52-52: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


53-53: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


43-43: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


55-70: Fix formatting in A/B Testing section.

  1. Add Python language specification to the code block.
  2. Use consistent list style (asterisks instead of dashes).
-```
+```python
 04.AB_test_model_serving.py

Convert list items from:
```markdown
- We start with loading...
- We use the same approach...

to:

* We start with loading...
* We use the same approach...
🧰 Tools
🪛 LanguageTool

[typographical] ~64-~64: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e the same approach as we did in week 2 - 03.log_and_register_model.py. - We trai...

(DASH_RULE)

🪛 Markdownlint

63-63: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


64-64: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


65-65: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


66-66: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


67-67: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


68-68: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


69-69: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


70-70: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


57-57: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

notebooks/week2/02_04_train_log_custom_model.py (1)

121-123: Enhance example prediction robustness

While adding an example prediction is good practice, consider:

  1. Adding error handling for empty test set
  2. Using multiple diverse examples to better validate model behavior
-example_input = X_test.iloc[0:1]  # Select the first row for prediction as example
-example_prediction = wrapped_model.predict(context=None, model_input=example_input)
-print("Example Prediction:", example_prediction)
+if len(X_test) > 0:
+    # Test with multiple diverse examples
+    example_inputs = X_test.iloc[0:3]  # Select first three rows
+    for i, example_input in example_inputs.iterrows():
+        example_prediction = wrapped_model.predict(context=None, model_input=pd.DataFrame([example_input]))
+        print(f"Example {i+1} Prediction:", example_prediction)
+else:
+    print("Warning: Test set is empty, cannot generate example predictions")
notebooks/week2/05.log_and_register_fe_model.py (4)

2-2: Consider using a more portable package installation approach.

The current installation path /Volumes/main/default/file_exchange/nico/power_consumption-0.0.1-py3-none-any.whl is hardcoded to a specific user's directory. This could cause issues when other team members try to run the notebook.

Consider:

  1. Publishing the package to an internal PyPI repository
  2. Using relative paths with environment variables
  3. Adding the package to the project's dependencies in pyproject.toml

87-91: Enhance the SQL function with validation and documentation.

The temperature rounding function could benefit from additional safeguards and clarity.

Consider this enhanced version:

 CREATE OR REPLACE FUNCTION {function_name}(temperature DOUBLE)
 RETURNS INT
 LANGUAGE PYTHON AS
 $$
-return round(temperature)
+def round_temp(temperature: float) -> int:
+    """Round temperature to nearest integer.
+    
+    Args:
+        temperature: Temperature value in degrees
+    Returns:
+        Rounded temperature as integer
+    Raises:
+        ValueError: If temperature is None
+    """
+    if temperature is None:
+        raise ValueError("Temperature cannot be None")
+    return round(temperature)
+
+return round_temp(temperature)
 $$

Line range hint 142-143: Fix hardcoded git SHA value.

The git SHA is currently hardcoded as "bla", which defeats the purpose of version tracking.

-git_sha = "bla"
+import subprocess
+
+def get_git_sha():
+    try:
+        return subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()
+    except:
+        return None
+
+git_sha = get_git_sha()

177-177: Fix formatting issues.

There are minor formatting issues at the end of the file:

  • Remove trailing whitespace
  • Add newline at end of file
🧰 Tools
🪛 Ruff

177-177: Blank line contains whitespace

Remove whitespace from blank line

(W293)


177-177: No newline at end of file

Add trailing newline

(W292)

notebooks/week3/02.model_serving.py (4)

20-21: Remove unused imports TrafficConfig and Route.

The imports of TrafficConfig and Route are not used in the code. Removing them will clean up the code and avoid potential confusion.

Apply this diff to remove the unused imports:

 from databricks.sdk.service.serving import (
     EndpointCoreConfigInput,
     ServedEntityInput,
-    TrafficConfig,
-    Route,
 )
🧰 Tools
🪛 Ruff

20-20: databricks.sdk.service.serving.TrafficConfig imported but unused

Remove unused import

(F401)


21-21: databricks.sdk.service.serving.Route imported but unused

Remove unused import

(F401)


40-40: Parameterize the training dataset table name instead of hardcoding.

The table name train_set_nico is hardcoded. Consider retrieving the table name from the configuration or defining it as a variable to improve flexibility and maintainability.


52-52: Parameterize the entity_version instead of hardcoding.

The entity_version is hardcoded as 6. To ensure you're always using the correct model version, consider obtaining the version number dynamically or from configuration.


55-61: Remove commented-out traffic_config code if not needed.

The traffic_config section is commented out. If it's not required for your current setup, consider removing it to keep the code clean and reduce clutter.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c49e1aa and 042f900.

📒 Files selected for processing (10)
  • .gitignore (1 hunks)
  • README.md (2 hunks)
  • mlruns/0/meta.yaml (0 hunks)
  • notebooks/week2/02_04_train_log_custom_model.py (2 hunks)
  • notebooks/week2/05.log_and_register_fe_model.py (6 hunks)
  • notebooks/week2/mlruns/0/meta.yaml (0 hunks)
  • notebooks/week2/model_version.json (0 hunks)
  • notebooks/week3/02.model_serving.py (1 hunks)
  • notebooks/week3/README.md (1 hunks)
  • pyproject.toml (1 hunks)
💤 Files with no reviewable changes (3)
  • mlruns/0/meta.yaml
  • notebooks/week2/mlruns/0/meta.yaml
  • notebooks/week2/model_version.json
🧰 Additional context used
🪛 Ruff
notebooks/week2/02_04_train_log_custom_model.py

131-131: Blank line contains whitespace

Remove whitespace from blank line

(W293)

notebooks/week2/05.log_and_register_fe_model.py

177-177: Blank line contains whitespace

Remove whitespace from blank line

(W293)


177-177: No newline at end of file

Add trailing newline

(W292)

notebooks/week3/02.model_serving.py

20-20: databricks.sdk.service.serving.TrafficConfig imported but unused

Remove unused import

(F401)


21-21: databricks.sdk.service.serving.Route imported but unused

Remove unused import

(F401)

🪛 LanguageTool
notebooks/week3/README.md

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...red in this lecture. ## Overview Last week we demonstrated model training and regi...

(AI_HYDRA_LEO_MISSING_COMMA)


[grammar] ~27-~27: The verb form ‘shows’ does not seem to match the subject ‘examples’.
Context: ...e lookups. The subsequent code examples shows how to invoke this endpoint and get res...

(SUBJECT_VERB_AGREEMENT_PLURAL)


[uncategorized] ~37-~37: Possible missing article found.
Context: ... the model. It's important to note that entity name we pass is a registered model name...

(AI_HYDRA_LEO_MISSING_THE)


[typographical] ~50-~50: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...he table we created last week on week 2 - 05.log_and_register_fe_model.py noteboo...

(DASH_RULE)


[typographical] ~52-~52: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e registred in the same notebook week 2 - 05.log_and_register_fe_model.p. This is...

(DASH_RULE)


[typographical] ~64-~64: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e the same approach as we did in week 2 - 03.log_and_register_model.py. - We trai...

(DASH_RULE)

🪛 Markdownlint
notebooks/week3/README.md

50-50: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


51-51: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


52-52: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


53-53: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


63-63: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


64-64: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


65-65: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


66-66: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


67-67: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


68-68: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


69-69: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


70-70: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


30-30: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


43-43: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


57-57: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (7)
pyproject.toml (1)

16-16: LGTM! Correct version specifier syntax.

The change from === to == aligns with PEP 440 version specification standards.

.gitignore (2)

103-104: Review JSON ignore patterns for potential over-exclusion.

The pattern notebooks/*.json might accidentally exclude important configuration or metadata files that should be version controlled. Consider:

  1. Being more specific about which JSON files to ignore
  2. Documenting which JSON files are generated vs. configuration files

Let's check what JSON files might be affected:

#!/bin/bash
# Description: List JSON files that would be ignored
# Expected: Only generated/temporary files, not configuration files

# Find all JSON files in notebooks directory
fd -e json . notebooks/

101-102: Consider centralizing MLflow tracking.

Having MLflow tracking directories in multiple locations (notebooks/week2/mlruns/ and mlruns/) could make experiment management and comparison more difficult. Consider centralizing all MLflow tracking in one location, preferably at the root level.

Let's check if there are any MLflow-related files in other locations:

README.md (1)

22-23: LGTM! Good improvement in virtual environment naming.

The change from venv to .venv follows better practices for Python projects by using a hidden directory for configuration files.

notebooks/week3/README.md (1)

1-70: Overall content is well-structured and informative.

The README provides comprehensive coverage of Week 3 materials with clear explanations and examples. The technical content is accurate and well-organized into logical sections. While there are some formatting inconsistencies, they don't impact the document's effectiveness in guiding students through the material.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing comma found.
Context: ...red in this lecture. ## Overview Last week we demonstrated model training and regi...

(AI_HYDRA_LEO_MISSING_COMMA)


[grammar] ~27-~27: The verb form ‘shows’ does not seem to match the subject ‘examples’.
Context: ...e lookups. The subsequent code examples shows how to invoke this endpoint and get res...

(SUBJECT_VERB_AGREEMENT_PLURAL)


[uncategorized] ~37-~37: Possible missing article found.
Context: ... the model. It's important to note that entity name we pass is a registered model name...

(AI_HYDRA_LEO_MISSING_THE)


[typographical] ~50-~50: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...he table we created last week on week 2 - 05.log_and_register_fe_model.py noteboo...

(DASH_RULE)


[typographical] ~52-~52: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e registred in the same notebook week 2 - 05.log_and_register_fe_model.p. This is...

(DASH_RULE)


[typographical] ~64-~64: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...e the same approach as we did in week 2 - 03.log_and_register_model.py. - We trai...

(DASH_RULE)

🪛 Markdownlint

50-50: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


51-51: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


52-52: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


53-53: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


63-63: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


64-64: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


65-65: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


66-66: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


67-67: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


68-68: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


69-69: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


70-70: Expected: asterisk; Actual: dash
Unordered list style

(MD004, ul-style)


15-15: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


30-30: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


43-43: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)


57-57: null
Fenced code blocks should have a language specified

(MD040, fenced-code-language)

notebooks/week2/02_04_train_log_custom_model.py (1)

132-132: Verify model signature format

The model signature is being created with a dictionary output format. Ensure this matches the expected format for all downstream consumers of the model.

✅ Verification successful

Let me analyze the search results. I can see three instances of infer_signature usage in the codebase:

  1. In 05.log_and_register_fe_model.py:
signature = infer_signature(model_input=X_train, model_output=y_pred)
  1. In 02_04_train_log_custom_model.py (two instances):
signature = infer_signature(model_input=X_train, model_output=y_pred)

and

signature = infer_signature(model_input=X_train, model_output={'Prediction': example_prediction})

Let me check if there are any model serving or inference code that might depend on this signature format.


Based on the search results, I can now generate the final response as I have enough context to verify the model signature format. The code shows that:

  1. The custom model's predict method in 02_04_train_log_custom_model.py explicitly returns predictions in a dictionary format with 'Prediction' as the key.
  2. This matches exactly with the signature format being used: model_output={'Prediction': example_prediction}
  3. The model is loaded and used consistently throughout the codebase, including in serving endpoints (as seen in notebooks/week3/README.md).

Model signature format is correct and consistent

The dictionary output format {'Prediction': value} matches the model's implementation and is used consistently in the serving pipeline. This format is intentionally designed to wrap the predictions in a structured format for the model serving endpoints.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for other model signatures in the codebase for consistency
rg -A 5 "infer_signature|ModelSignature"

Length of output: 2715


Script:

#!/bin/bash
# Look for model loading and prediction code
rg -A 5 "load_model|predict"

Length of output: 7937

notebooks/week2/05.log_and_register_fe_model.py (1)

Line range hint 103-120: Address the unauthorized error TODO comment.

There's an unresolved TODO comment about unauthorized errors. This could indicate potential permission issues with the feature store access.

Let's verify the permissions setup:

Would you like me to help investigate the authorization issue or create a GitHub issue to track this?

README.md Show resolved Hide resolved
notebooks/week3/02.model_serving.py Show resolved Hide resolved
notebooks/week3/02.model_serving.py Outdated Show resolved Hide resolved
notebooks/week3/02.model_serving.py Show resolved Hide resolved
notebooks/week3/02.model_serving.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (4)
notebooks/week3/02.model_serving.py (4)

17-22: Remove unused imports.

The following imports are not used in the code:

  • TrafficConfig
  • Route
 from databricks.sdk.service.serving import (
     EndpointCoreConfigInput,
     ServedEntityInput,
-    TrafficConfig,
-    Route,
 )
🧰 Tools
🪛 Ruff

20-20: databricks.sdk.service.serving.TrafficConfig imported but unused

Remove unused import

(F401)


21-21: databricks.sdk.service.serving.Route imported but unused

Remove unused import

(F401)


32-35: Consider using environment variables for config paths.

The hardcoded fallback path "../../project_config.yml" could break if the notebook location changes. Consider using an environment variable for the config path.


92-93: Remove commented out code.

Remove the commented line as it's not being used:

 sampled_records = train_set[required_columns].sample(n=1000, replace=True).to_dict(orient="records")
-# sampled_records = train_set.sample(n=1000, replace=True).to_dict(orient="records")

132-139: Make load test parameters configurable.

The number of workers and requests are hardcoded. Consider making these configurable through environment variables or configuration file.

+# Load test configuration
+num_workers = int(os.getenv("LOAD_TEST_WORKERS", "100"))
+num_requests = int(os.getenv("LOAD_TEST_REQUESTS", "1000"))
+
 # Initialize variables
 model_serving_endpoint = (
     f"https://{host}/serving-endpoints/power-consumption-model-serving/invocations"
 )

 headers = {"Authorization": f"Bearer {token}"}
-num_requests = 1000

Then update the ThreadPoolExecutor:

-with ThreadPoolExecutor(max_workers=100) as executor:
+with ThreadPoolExecutor(max_workers=num_workers) as executor:
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 042f900 and d99f179.

📒 Files selected for processing (1)
  • notebooks/week3/02.model_serving.py (1 hunks)
🧰 Additional context used
🪛 Ruff
notebooks/week3/02.model_serving.py

20-20: databricks.sdk.service.serving.TrafficConfig imported but unused

Remove unused import

(F401)


21-21: databricks.sdk.service.serving.Route imported but unused

Remove unused import

(F401)

notebooks/week3/02.model_serving.py Show resolved Hide resolved
notebooks/week3/02.model_serving.py Show resolved Hide resolved
notebooks/week3/02.model_serving.py Show resolved Hide resolved
notebooks/week3/02.model_serving.py Show resolved Hide resolved
@@ -175,8 +174,4 @@
mlflow.register_model(
model_uri=f'runs:/{run_id}/lightgbm-pipeline-model-fe',
name=f"{catalog_name}.{schema_name}.power_consumptions_model_fe")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you also tried to serve this model?

you have a nice example with pyfunc model - that's enough for week3, I'm just asking for curiosity :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, no I did not try because of time limitations

Copy link

@basakeskili basakeskili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! c

@NiGroUni NiGroUni merged commit 9701b11 into main Nov 8, 2024
2 checks passed
@NiGroUni NiGroUni deleted the week3 branch November 8, 2024 14:43
@NiGroUni NiGroUni restored the week3 branch November 8, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants