merge master

wandb · Aug 16, 2024 · 2671be2 · 2671be2
2 parents e3c966a + 6c74337
commit 2671be2
Show file tree

Hide file tree

Showing 155 changed files with 10,855 additions and 3,031 deletions.
diff --git a/docs/CONTRIBUTING_DOCS.md b/docs/CONTRIBUTING_DOCS.md
@@ -18,6 +18,9 @@ Satisfy the following dependencies to create, build, and locally serve Weave Doc
   ```yarn
   npm install --global yarn
   ```
+- Make sure your python environment is setup by running the following from the repro root:
+  - `pip install -r requirements.dev.txt` 
+  - `pip install -e .` 
 - Install an IDE (e.g. VS Code) or Text Editor (e.g. Sublime)
 
 &nbsp;
@@ -77,3 +80,50 @@ git push origin <your-feature-branch>
 ```
 
 8. Open a pull request from the new branch to the original repo.
+
+## DocGen
+Currently, we have 3 forms of doc generation:
+1. Python Doc Gen
+2. Service Doc Gen
+3. Notebook Doc Gen
+
+Assuming you have node and python packages installed, these can all be generated by running `make generate_reference_docs`.
+
+Let's review some details about each process:
+
+### Python Doc Gen
+
+See: `docs/scripts/generate_python_sdk_docs.py` and `./docs/reference/python-sdk`
+
+Python doc gen uses `lazydocs` as the core library for building markdown docs from our symbols. There are a few things to keep in mind:
+
+1. `docs/scripts/generate_python_sdk_docs.py` contains an allow-list of modules to document. Since the Weave codebase is massive, it is far easier to just select what modules are useful for docs.
+2. If the file does now have a `__docspec__` list of symbols, all non-underscore symbols will be documented. However, if it does have a `__docspec__`, that will further narrow the symbols to just that selection.
+3. Documentation itself:
+   1. Module-level: Put a triple double quote (""") comment as the first line of the module to add module-level documentation
+   2. Classes: Put a triple double quote (""") comment as the first line of the class to add class-level docs
+   3. Methods / Functions: Put a triple double quote (""") comment as the first line of the implementation to add method/function-level docs
+      1. Currently attributes are not automatically documented. Instead, use the @property pattern.
+      2. `BaseModel`. For classes that inherit from `BaseModel`, we create a special field list automatically to overcome this limitation.
+
+### Service Doc Gen
+
+See `docs/scripts/generate_service_api_spec.py`  and `./docs/reference/service-api`
+
+Service doc generation loads the `openapi.json` file describing the server, processes it, then uses the `docusaurus-plugin-openapi-docs` plugin to generate markdown files from that specification.
+
+To improve docs, basically follow FastAPI's instructions to create good Swagger docs by adding field-level and endpoint-level documentation using their APIs. Assuming your have made some changes, `docs/scripts/generate_service_api_spec.py` needs a server to point to. You can either deploy to prod, or run the server locally and point to it in `docs/scripts/generate_service_api_spec.py`. From there, `docs/scripts/generate_service_api_spec.py` will download the spec, clean it up, and build the docs!
+
+### Notebook Doc Gen
+
+See `docs/scripts/generate_notebooks.py`, `./docs/notebooks`, and `./docs/reference/gen_notebooks`.
+
+This script will load all notebooks in `./docs/notebooks`, transforming them into viable markdown docs in `./docs/reference/gen_notebooks` which can be referenced by docusaurus just like any other markdown file. If you need header metadata, you can add a markdown block at the top of your notebook with:
+```
+<!-- docusaurus_head_meta::start
+---
+title: Head Metadata
+---
+docusaurus_head_meta::end -->
+```
+
diff --git a/docs/Makefile b/docs/Makefile
@@ -1,13 +1,21 @@
 generate_service_api_docs:
+	mkdir -p ./docs/reference/service-api
 	rm -rf ./docs/reference/service-api
 	mkdir -p ./docs/reference/service-api
 	python scripts/generate_service_api_spec.py
 	yarn docusaurus gen-api-docs all
 
 generate_python_sdk_docs:
+	mkdir -p ./docs/reference/python-sdk
 	rm -rf ./docs/reference/python-sdk
 	mkdir -p ./docs/reference/python-sdk
 	python scripts/generate_python_sdk_docs.py
 
-generate_reference_docs: generate_service_api_docs generate_python_sdk_docs
+generate_notebooks_docs:
+	mkdir -p ./docs/reference/gen_notebooks
+	rm -rf ./docs/reference/gen_notebooks
+	mkdir -p ./docs/reference/gen_notebooks
+	python scripts/generate_notebooks.py
+
+generate_reference_docs: generate_service_api_docs generate_python_sdk_docs generate_notebooks_docs
 	yarn build
diff --git a/docs/docs/guides/integrations/llamaindex.md b/docs/docs/guides/integrations/llamaindex.md
@@ -31,7 +31,7 @@ In the example above, we are creating a simple LlamaIndex chat engine which unde
 
 ## Tracing
 
-LlamaIndex is known for it's ease of connecting data with LLM. A simple RAG application requires an embedding step, retrieval step and a response synthesis step. With the increasing complexity, it becomes important to store traces of individual steps in a central database during both development and production. 
+LlamaIndex is known for its ease of connecting data with LLM. A simple RAG application requires an embedding step, retrieval step and a response synthesis step. With the increasing complexity, it becomes important to store traces of individual steps in a central database during both development and production.
 
 These traces are essential for debugging and improving your application. Weave automatically tracks all calls made through the LlamaIndex library, including prompt templates, LLM calls, tools, and agent steps. You can view the traces in the Weave web interface.
 
@@ -68,7 +68,7 @@ Our integration leverages this capability of LlamaIndex and automatically sets [
 
 Organizing and evaluating LLMs in applications for various use cases is challenging with multiple components, such as prompts, model configurations, and inference parameters. Using the [`weave.Model`](/guides/core-types/models), you can capture and organize experimental details like system prompts or the models you use, making it easier to compare different iterations.
 
-The following example demonstrates building a LlamaIndex query engine in a `WeaveModel`:
+The following example demonstrates building a LlamaIndex query engine in a `WeaveModel`, using data that can be found in the [weave/data](https://github.com/wandb/weave/tree/master/data) folder:
 
 ```python
 import weave
@@ -84,7 +84,7 @@ You are given with relevant information about Paul Graham. Answer the user query
 
 User Query: {query_str}
 Context: {context_str}
-Answer: 
+Answer:
 """
 
 # highlight-next-line
@@ -123,11 +123,12 @@ class SimpleRAGPipeline(weave.Model):
             llm=llm,
             text_qa_template=prompt_template,
         )
+
 # highlight-next-line
     @weave.op()
     def predict(self, query: str):
-        llm = self.get_llm()
         query_engine = self.get_query_engine(
+            # This data can be found in the weave repo under data/paul_graham
             "data/paul_graham",
         )
         response = query_engine.query(query)
@@ -145,7 +146,6 @@ This `SimpleRAGPipeline` class subclassed from `weave.Model` organizes the impor
 
 [![llamaindex_model.png](imgs/llamaindex_model.png)](https://wandb.ai/wandbot/test-llamaindex-weave/weave/calls?filter=%7B%22traceRootsOnly%22%3Atrue%7D&peekPath=%2Fwandbot%2Ftest-llamaindex-weave%2Fcalls%2Fa82afbf4-29a5-43cd-8c51-603350abeafd%3Ftracetree%3D1)
 
-
 ## Doing Evaluation with `weave.Evaluation`
 
 Evaluations help you measure the performance of your applications. By using the [`weave.Evaluation`](/guides/core-types/evaluations) class, you can capture how well your model performs on specific tasks or datasets, making it easier to compare different models and iterations of your application. The following example demonstrates how to evaluate the model we created: