Merge branch 'master' into andrew/ts_common

wandb · Oct 10, 2024 · 9115bde · 9115bde
2 parents e15272b + 5579795
commit 9115bde
Show file tree

Hide file tree

Showing 41 changed files with 4,558 additions and 78 deletions.
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
@@ -181,7 +181,7 @@ jobs:
       - name: Checkout
         uses: actions/checkout@v3
       - name: Set up Python
-        uses: actions/setup-python@v4
+        uses: actions/setup-python@v5
         with:
           python-version: 3.9
       - name: Install dependencies
@@ -203,6 +203,7 @@ jobs:
             '10',
             '11',
             '12',
+            '13',
             #
           ]
         nox-shard:
@@ -220,6 +221,7 @@ jobs:
             'llamaindex',
             'mistral0',
             'mistral1',
+            'notdiamond',
             'openai',
           ]
       fail-fast: false
@@ -246,7 +248,7 @@ jobs:
       - name: Checkout
         uses: actions/checkout@v3
       - name: Set up Python ${{ matrix.python-version-major }}.${{ matrix.python-version-minor }}
-        uses: actions/setup-python@v4
+        uses: actions/setup-python@v5
         with:
           python-version: ${{ matrix.python-version-major }}.${{ matrix.python-version-minor }}
       - name: Install dependencies
@@ -272,7 +274,8 @@ jobs:
           WF_CLICKHOUSE_HOST: weave_clickhouse
           WEAVE_SERVER_DISABLE_ECOSYSTEM: 1
         run: |
-          nox -e "tests-${{ matrix.python-version-major }}.${{ matrix.python-version-minor }}(shard='${{ matrix.nox-shard }}')" -- -n4
+          nox -e "tests-${{ matrix.python-version-major }}.${{ matrix.python-version-minor }}(shard='${{ matrix.nox-shard }}')" -- \
+            -n4
   trace-tests-matrix-check: # This job does nothing and is only used for the branch protection
     if: always()
 

diff --git a/docs/docs/guides/core-types/evaluations.md b/docs/docs/guides/core-types/evaluations.md
@@ -106,6 +106,32 @@ asyncio.run(evaluation.evaluate(model))
 
 This will run `predict` on each example and score the output with each scoring functions.
 
+#### Custom Naming
+
+You can change the name of the Evaluation itself by passing a `name` parameter to the `Evaluation` class.
+
+```python
+evaluation = Evaluation(
+    dataset=examples, scorers=[match_score1], name="My Evaluation"
+)
+```
+
+You can also change the name of individual evaluations by setting the `display_name` key of the `__weave` dictionary.
+
+:::note
+
+Using the `__weave` dictionary sets the call display name which is distinct from the Evaluation object name. In the
+UI, you will see the display name if set, otherwise the Evaluation object name will be used.
+
+:::
+
+```python
+evaluation = Evaluation(
+    dataset=examples, scorers=[match_score1]
+)
+evaluation.evaluate(model, __weave={"display_name": "My Evaluation Run"})
+```
+
 ### Define a function to evaluate
 
 Alternatively, you can also evaluate a function that is wrapped in a `@weave.op()`.

diff --git a/docs/docs/guides/integrations/imgs/notdiamond/api-keys.png b/docs/docs/guides/integrations/imgs/notdiamond/api-keys.png
diff --git a/docs/docs/guides/integrations/imgs/notdiamond/evaluations.png b/docs/docs/guides/integrations/imgs/notdiamond/evaluations.png
diff --git a/docs/docs/guides/integrations/imgs/notdiamond/router-preferences.png b/docs/docs/guides/integrations/imgs/notdiamond/router-preferences.png
diff --git a/docs/docs/guides/integrations/imgs/notdiamond/weave-trace.png b/docs/docs/guides/integrations/imgs/notdiamond/weave-trace.png
diff --git a/docs/docs/guides/integrations/notdiamond.md b/docs/docs/guides/integrations/notdiamond.md
@@ -0,0 +1,114 @@
+# Not Diamond ¬◇
+
+When building complex LLM workflows users may need to prompt different models according to accuracy,
+cost, or call latency. Users can use [Not Diamond][nd] to route prompts in these workflows to the
+right model for their needs, helping maximize accuracy while saving on model costs.
+
+## Getting started
+
+Make sure you have [created an account][account] and [generated an API key][keys], then add your API
+key to your env as `NOTDIAMOND_API_KEY`.
+
+![[Create an API key](imgs/notdiamond/api-keys.png)]
+
+From here, you can
+
+- try the [quickstart guide],
+- [build a custom router][custom router] with W&B Weave and Not Diamond, or
+- [chat with Not Diamond][chat] to see routing in action
+
+## Tracing
+
+Weave integrates with [Not Diamond's Python library][python] to [automatically log API calls][ops].
+You only need to run `weave.init()` at the start of your workflow, then continue using the routed
+provider as usual:
+
+```python
+from notdiamond import NotDiamond
+
+import weave
+weave.init('notdiamond-quickstart')
+
+client = NotDiamond()
+session_id, provider = client.chat.completions.model_select(
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Concisely explain merge sort."}
+    ],
+    model=['openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620']
+)
+
+print("LLM called: ", provider.provider)  # openai, anthropic, etc
+print("Provider model: ", provider.model) # gpt-4o, claude-3-5-sonnet-20240620, etc
+```
+
+## Custom routing
+
+You can also train your own [custom router] on [Evaluations][evals], allowing Not Diamond to route prompts
+according to eval performance for specialized use cases.
+
+Start by training a custom router:
+
+```python
+from weave.flow.eval import EvaluationResults
+from weave.integrations.notdiamond.custom_router import train_router
+
+# Build an Evaluation on gpt-4o and Claude 3.5 Sonnet
+evaluation = weave.Evaluation(...)
+gpt_4o = weave.Model(...)
+sonnet = weave.Model(...)
+
+model_evals = {
+    'openai/gpt-4o': evaluation.get_eval_results(gpt_4o),
+    'anthropic/claude-3-5-sonnet-20240620': evaluation.get_eval_results(sonnet),
+}
+preference_id = train_router(
+    model_evals=model_evals,
+    prompt_column="prompt",
+    response_column="actual",
+    language="en",
+    maximize=True,
+    api_key=api_key,
+)
+```
+
+By reusing this preference ID in any `model_select` request, you can route your prompts
+to maximize performance and minimize cost on your evaluation data:
+
+```python
+from notdiamond import NotDiamond
+client = NotDiamond()
+
+import weave
+weave.init('notdiamond-quickstart')
+
+session_id, provider = client.chat.completions.model_select(
+    messages=[
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Concisely explain merge sort."}
+    ],
+    model=['openai/gpt-4o', 'anthropic/claude-3-5-sonnet-20240620'],
+
+    # passing this preference ID reuses your custom router
+    preference_id=preference_id
+)
+
+print("LLM called: ", provider.provider)  # openai, anthropic, etc
+print("Provider model: ", provider.model) # gpt-4o, claude-3-5-sonnet-20240620, etc
+```
+
+## Additional support
+
+Visit the [docs] or [send us a message][support] for further support.
+
+[account]: https://app.notdiamond.ai
+[chat]: https://chat.notdiamond.ai
+[custom router]: https://docs.notdiamond.ai/docs/router-training-quickstart
+[docs]: https://docs.notdiamond.ai
+[evals]: ../../guides/core-types/evaluations.md
+[keys]: https://app.notdiamond.ai/keys
+[nd]: https://www.notdiamond.ai/
+[ops]: ../../guides/tracking/ops.md
+[python]: https://github.com/Not-Diamond/notdiamond-python
+[quickstart guide]: https://docs.notdiamond.ai/docs/quickstart
+[support]: mailto:[email protected]
diff --git a/docs/docs/guides/tracking/tracing.mdx b/docs/docs/guides/tracking/tracing.mdx
@@ -104,7 +104,19 @@ instance.my_method.call(instance, "World")
 
 #### Call Display Name
 
-Sometimes you may want to override the display name of a call. You can achieve this in one of three ways:
+Sometimes you may want to override the display name of a call. You can achieve this in one of four ways:
+
+0. Change the display name at the time of calling the op:
+
+```python showLineNumbers
+result = my_function("World", __weave={"display_name": "My Custom Display Name"})
+```
+
+:::note
+
+Using the `__weave` dictionary sets the call display name which will take precedence over the Op display name.
+
+:::
 
 1. Change the display name on a per-call basis. This uses the [`Op.call`](../../reference/python-sdk/weave/trace/weave.trace.op.md#function-call) method to return a `Call` object, which you can then use to set the display name using [`Call.set_display_name`](../../reference/python-sdk/weave/trace/weave.trace.weave_client.md#method-set_display_name).
 ```python showLineNumbers