wandb · tcapelle · Nov 18, 2024 · Nov 18, 2024 · Nov 18, 2024 · Nov 18, 2024
diff --git a/colabs/wandb-artifacts/WandB_Artifact_Tags.ipynb b/colabs/wandb-artifacts/WandB_Artifact_Tags.ipynb
@@ -0,0 +1,347 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "dV-HDTDiWHjk"
+      },
+      "source": [
+        "<img src=\"http://wandb.me/logo-im-png\" width=\"400\" alt=\"Weights & Biases\" />\n",
+        "<!--- @wandbcode{artifact-tags} -->"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4ojAULSUWHjk"
+      },
+      "source": [
+        "## W&B Artifact version tags\n",
+        "\n",
+        "W&B Models supports finding and retrieving artifacts using version tags from Registry or ML projects. Filtering directly by artifact version tags via the SDK offers a more efficient way to retrieve only the artifacts you need instead of grabbing every artifact from a collection and parsing them manually or using aliases which can present challenges due to enforced uniqueness within each collection.\n",
+        "\n",
+        "In this notebook, you will create multiple model versions in a collection in the Model registry. Each of the versions has been assigned multiple tags making discoverability and retrieval simple using the SDK.\n",
+        "\n",
+        "You can find Registry in your W&B account by clicking on the Registry link in the left sidebar in the Applications section.\n",
+        "<br><br>\n",
+        "<img src=\"https://rratshin-images.s3.us-west-2.amazonaws.com/colab_artifact_registry.png\" width=\"800\" alt=\"Weights & Biases\" border=\"1\" />"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "rnSykLMYWHjk"
+      },
+      "source": [
+        "## Prerequisites\n",
+        "\n",
+        "Install the W&B Python SDK and log in:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "!pip install wandb -qU"
+      ],
+      "metadata": {
+        "id": "lOKDB7KZiFAo"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "# Log in to your W&B account\n",
+        "import wandb\n",
+        "wandb.login()"
+      ],
+      "metadata": {
+        "id": "vq1qM18bvJaF"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Initialize a W&B run\n",
+        "\n",
+        "Import additional Python libraries and initialize a W&B run to generate demo artifacts[link text](https://):"
+      ],
+      "metadata": {
+        "id": "FJnhsfWgspgm"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "0J12ZezVWHjk"
+      },
+      "outputs": [],
+      "source": [
+        "import random\n",
+        "import math\n",
+        "import pandas as pd\n",
+        "import numpy as np\n",
+        "import os\n",
+        "\n",
+        "PROJECT = \"artifacts-example\"\n",
+        "JOB_TYPE = \"generate_artifacts\"\n",
+        "\n",
+        "run = wandb.init(\n",
+        "  project=PROJECT,\n",
+        "  job_type=JOB_TYPE\n",
+        ")\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "3KERmoDMWHjk"
+      },
+      "source": [
+        "## Generate demo artifact versions with tags\n",
+        "\n",
+        "The following code block will generate four artifact versions for each AWS region. Each artifact version represents the result of fine-tuning a base model. The artifact versions will be tagged with the following:\n",
+        "\n",
+        "1. Region tag: Specifies the AWS region where the model is stored or deployed\n",
+        "2. Base model tag: Indicates which base model was fine-tuned\n",
+        "3. Status tag: Specifies the model's status, when applicable (e.g., production, candidate, archived)\n",
+        "\n",
+        "To create the artifact versions, simply execute the code block."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "E55OQXADWHjk"
+      },
+      "outputs": [],
+      "source": [
+        "import random\n",
+        "import math\n",
+        "\n",
+        "\n",
+        "\n",
+        "# set up tag values\n",
+        "# AWS regions (us-west-2 = Oregon, eu-central-1 = Frankfurt, me-central-1 = UAE, ap-east-1 = Hong Kong)\n",
+        "regions = [\"us-west-2\", \"eu-central-1\", \"me-central-1\", \"ap-east-1\"]\n",
+        "# add base model tags to each artifact version\n",
+        "artifact_base_model_by_region = {\n",
+        "    \"us-west-2\": [\"Llama 3 Instruct - 70B\", \"Llama 3 Instruct - 70B\", \"Gemini 1_5 Pro\", \"Claude 3_5 Sonnet\"],\n",
+        "    \"eu-central-1\": [\"Llama 3 Instruct - 70B\", \"Gemini 1_5 Pro\", \"Claude 3_5 Sonnet\", \"Llama 3 Instruct - 70B\"],\n",
+        "    \"me-central-1\": [\"Llama 3 Instruct - 70B\", \"Gemini 1_5 Pro\", \"Gemini 1_5 Pro\", \"Claude 3_5 Sonnet\"],\n",
+        "    \"ap-east-1\": [\"Claude 3_5 Sonnet\", \"Gemini 1_5 Pro\", \"Claude 3_5 Sonnet\", \"Gemini 1_5 Pro\"]\n",
+        "}\n",
+        "# add performance metadata to each artifact version\n",
+        "artifact_metadata_by_region = {\n",
+        "    \"us-west-2\": [0.70, 0.73, 0.77, 0.71],\n",
+        "    \"eu-central-1\": [0.76, 0.81, 0.79, 0.77],\n",
+        "    \"me-central-1\": [0.77, 0.68, 0.74, 0.83],\n",
+        "    \"ap-east-1\": [0.82, 0.79, 0.76, 0.76]\n",
+        "}\n",
+        "# add model status tags to each artifact version\n",
+        "artifact_status_by_region = {\n",
+        "    \"us-west-2\": [\"production\", \"candidate\", \"archived\", \"archived\"],\n",
+        "    \"eu-central-1\": [\"production\", \"candidate\", \"archived\", \"archived\"],\n",
+        "    \"me-central-1\": [\"production\", \"candidate\", \"candidate\", \"archived\"],\n",
+        "    \"ap-east-1\": [\"production\", \"candidate\", \"archived\", \"archived\"]\n",
+        "}\n",
+        "\n",
+        "\n",
+        "# create 4 artifact versions for each region\n",
+        "for region in regions:\n",
+        "\n",
+        "  i = 0\n",
+        "  for base_model in artifact_base_model_by_region[region]:\n",
+        "\n",
+        "    # create random dataset to use as a model version\n",
+        "    df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))\n",
+        "    df.to_json(\"test_model.pt\", orient='records', lines=True)\n",
+        "\n",
+        "    model_name = \"test_model_\" + region + \"_\" + str(i)\n",
+        "    at = wandb.Artifact(\n",
+        "        name=model_name,\n",
+        "        type=\"model\",\n",
+        "    )\n",
+        "    at.add_file(\"test_model.pt\")\n",
+        "    arti = run.log_artifact(at)\n",
+        "    arti.wait()\n",
+        "\n",
+        "    # assign a status tag to each artifact except the 4th which has no status tag assigned\n",
+        "    if i == 3:\n",
+        "      arti.tags = [region, base_model] # Provide one or more tags in a list\n",
+        "      arti.metadata = {\"accuracy\": artifact_metadata_by_region[region][i]}\n",
+        "    else:\n",
+        "      arti.tags = [region, base_model, artifact_status_by_region[region][i]] # Provide one or more tags in a list\n",
+        "      arti.metadata = {\"accuracy\": artifact_metadata_by_region[region][i]}\n",
+        "    arti.save()\n",
+        "\n",
+        "    # link the artifact to the model registry\n",
+        "    registered_at = run.link_artifact(at, f\"wandb-registry-model/Artifact Demo Models\")\n",
+        "    i = (i + 1)\n",
+        "\n",
+        "\n",
+        "# mark the run as finished\n",
+        "run.finish()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "hfRK9Ns-WHjl"
+      },
+      "source": [
+        "## Filter artifacts using version tags\n",
+        "\n",
+        "After generating the 16 artifacts using the code block above, the Artifact Demo Models collection in your model registry should look like this:\n",
+        "<br><br>\n",
+        "\n",
+        "<img src=\"https://rratshin-images.s3.us-west-2.amazonaws.com/colab_artifact_version_tags5.png\" width=\"800\" alt=\"Weights & Biases\" border=\"1\" />\n",
+        "\n",
+        "This colab presumes that you have not one production model in a collection, but multiple production models, one for each AWS region. In cases where only a single production model exists in a collection, using a unique alias to identify this model is generally the right approach. The enforced uniqueness of aliases makes them extremely valuable when searching for and retrieving a specific artifact version. Aliases can also be used in W&B Models to trigger automated workflows, or Automations, which are often used for model testing and deployment as part of a CI/CD pipeline. But event-based triggers are not always necessary and there are times when you want to track down and retrieve multiple artifact versions based on a search filter, such as when multiple production models exist. This is when tags are the answer.\n",
+        "\n",
+        "An example where version artifact tags come in handy is when deploying models using Amazon Sagemaker. The Amazon S3 bucket where the model artifacts are stored must be in the same AWS Region as the model that you are creating. In cases where it is a requirement to deploy specific models to specific regions, retrieving those models using an AWS region artifact tag ensures that the right model exists and that the right model is deployed.\n",
+        "\n",
+        "Attaching tags to artifact versions also helps with compliance requirements. Filtering by artifact tag retruns a specific model of interest and, from there, it is easy to track the detailed lineage of this model, including all input and output artifacts, using W&B Registry. For example, during an audit, it might be required to produce the exact dataset used for training the model deployed in the Central Europe region to ensure that it did not contain any PII data or other sensitive information.\n",
+        "\n",
+        "\n",
+        "We have compiled a number of use cases that require filtering by artifact version tags to retrieve the right artifacts. To see version tag filtering using the SDK in action, just execute the following code block:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "api = wandb.Api()\n",
+        "# if you belong to multiple orgs, prefix the 'name' value in the api.artifacts call with the org you are fetching from:\n",
+        "# name=f\"{INSERT_ORG_NAME}/wandb-registry-model/Artifact Demo Models\"\n",
+        "\n",
+        "##########################################################\n",
+        "##### Artifact Version Tag Use Cases #####################\n",
+        "##########################################################\n",
+        "\n",
+        "# 1. As an ML Platform Engineer, I need to ensure the correct models are deployed in each region.\n",
+        "#   Action: Filter by `production`, `us-west-2`, and base_model tags. (Base model should be `Llama 3 Instruct - 70B`).\n",
+        "#   Scenario: Verify that only production-ready models are deployed in the US-West region, using the correct base model.\n",
+        "#   Benefit: Ensures accurate deployment for specific regions and base models.\n",
+        "\n",
+        "print(\"\\nArtifact Version Tag Use Case #1\")\n",
+        "print(\"As an ML Platform Engineer, I need to ensure the correct models are deployed in each region.\")\n",
+        "print(\"---------------------------------------\\n\")\n",
+        "artifact_versions = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['production', 'us-west-2'])\n",
+        "for av in artifact_versions:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "\n",
+        "# 2. As a Data Scientist, I want to compare model performance in US-West and EU-Central regions.\n",
+        "#    Action: Filter by `us-west-2`, `eu-central-1` tags.\n",
+        "#    Scenario: Compare models deployed in US-West and EU-Central based on performance metrics.\n",
+        "#    Benefit: Identifies performance variations between regions for optimization.\n",
+        "\n",
+        "print(\"\\n\\nArtifact Version Tag Use Case #2\")\n",
+        "print(\"As a Data Scientist, I want to compare model performance in US-West and EU-Central regions.\")\n",
+        "print(\"---------------------------------------\\n\")\n",
+        "artifact_versions_us_west_2 = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['production', 'us-west-2'])\n",
+        "artifact_versions_eu_central_1 = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['production', 'eu-central-1'])\n",
+        "for av in artifact_versions_us_west_2:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "for av in artifact_versions_eu_central_1:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "\n",
+        "# 3. As an ML Ops Engineer, I need to check the production model in EU-Central to troubleshoot an issue.\n",
+        "#    Action: Filter by `production`, `eu-central-1`, and model_version tags.\n",
+        "#    Scenario: Quickly find the production model version deployed in EU-Central for debugging.\n",
+        "#    Benefit: Saves time and ensures the correct model is under investigation.\n",
+        "\n",
+        "print(\"\\n\\nArtifact Version Tag Use Case #3\")\n",
+        "print(\"As an ML Ops Engineer, I need to check the production model in EU-Central to troubleshoot an issue.\")\n",
+        "print(\"---------------------------------------\\n\")\n",
+        "artifact_versions = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['production', 'eu-central-1'])\n",
+        "for av in artifact_versions:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "\n",
+        "# 4. As a Product Manager, I want to review all candidate models in ME-Central for possible promotion.\n",
+        "#    Action: Filter by `candidate`, `me-central-1`, and base_model tags.\n",
+        "#    Scenario: Gather models tagged as candidates for the ME-Central region and evaluate for production.\n",
+        "#    Benefit: Streamlines the process of selecting models for promotion.\n",
+        "\n",
+        "print(\"\\n\\nArtifact Version Tag Use Case #4\")\n",
+        "print(\"As a Product Manager, I want to review all candidate models in ME-Central for possible promotion.\")\n",
+        "print(\"---------------------------------------\\n\")\n",
+        "artifact_versions = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['candidate', 'me-central-1'])\n",
+        "for av in artifact_versions:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "\n",
+        "# 5. As a Compliance Officer, I need to audit archived models in US-West and ME-Central.\n",
+        "#    Action: Filter by `archived`, `us-west-2`, and `me-central-1` tags.\n",
+        "#    Scenario: Find older models in these regions to ensure they meet compliance requirements.\n",
+        "#    Benefit: Efficiently audits the models without needing manual searches.\n",
+        "\n",
+        "print(\"\\n\\nArtifact Version Tag Use Case #5\")\n",
+        "print(\"As a Compliance Officer, I need to audit archived models in US-West and ME-Central.\")\n",
+        "print(\"---------------------------------------\\n\")\n",
+        "artifact_versions_us_west_2 = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['archived', 'us-west-2'])\n",
+        "artifact_versions_me_central_1 = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['archived', 'me-central-1'])\n",
+        "for av in artifact_versions_us_west_2:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "for av in artifact_versions_me_central_1:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "\n",
+        "# 6. As an ML Engineer, I want to evaluate models using the `Claude 3_5 Sonnet` base model in AP-East and EU-Central regions.\n",
+        "#    Action: Filter by `Claude 3_5 Sonnet`, `ap-east-1`, and `eu-central-1` tags.\n",
+        "#    Review and compare models fine-tuned on `Claude 3_5 Sonnet` across these regions.\n",
+        "#    Benefit: Helps assess and improve performance for models based on the `Claude 3_5 Sonnet` base.\n",
+        "\n",
+        "print(\"\\n\\nArtifact Version Tag Use Case #6\")\n",
+        "print(\"As an ML Engineer, I want to evaluate models using the `Claude 3_5 Sonnet` base model in AP-East and EU-Central regions.\")\n",
+        "print(\"---------------------------------------\\n\")\n",
+        "artifact_versions_ap_east_1 = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['Claude 3_5 Sonnet', 'ap-east-1'])\n",
+        "artifact_versions_eu_central_1 = api.artifacts(type_name=\"model\", name=\"wandb-registry-model/Artifact Demo Models\", tags=['Claude 3_5 Sonnet', 'eu-central-1'])\n",
+        "for av in artifact_versions_ap_east_1:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "for av in artifact_versions_eu_central_1:\n",
+        "  print(\"Artifact Version: \" + str(av.name) + \"\\nTags: \" + str(av.tags) + \"\\nVersion: \" + str(av.version) + \"\\nMetadata: \" + str(av.metadata))\n",
+        "\n"
+      ],
+      "metadata": {
+        "id": "Xi-2476K_6pK"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Artifact version filtering results\n",
+        "\n",
+        "As you can see from the output of the script, it is possible to use one or more tags to retrieve one or more artifact versions from your registry collection.\n",
+        "\n",
+        "It is also possible to search by collection names, tags, and version tags from within the UI. Just use the search bar inside of any registry to find the artifacts that you need. The following screenshot shows the results for a search on \"ap-east-1\" in our Model registry.\n",
+        "<br><br>\n",
+        "\n",
+        "<img src=\"https://rratshin-images.s3.us-west-2.amazonaws.com/colab_artifact_ui_search3.png\" width=\"800\" alt=\"Weights & Biases\" border=\"1\" />\n",
+        "\n",
+        "You can read more about organizing artifacts with tags in the W&B documentation:\n",
+        "https://docs.wandb.ai/guides/registry/organize-with-tags/\n"
+      ],
+      "metadata": {
+        "id": "vsJGA-bvCxEj"
+      }
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "private_outputs": true,
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}