diff --git a/docs/docs/development.md b/docs/docs/development.md
index 34ce4900..799d58c0 100644
--- a/docs/docs/development.md
+++ b/docs/docs/development.md
@@ -24,7 +24,7 @@ There are several demos available in the `demo\` folder, as Jupyter notebooks. T
$ poetry install --with demo
```
-If using macOS, please also install this additional group:
+If using macOS, also install this additional group:
```bash
$ poetry install --with demo-mac
@@ -32,13 +32,13 @@ $ poetry install --with demo-mac
## Project Development Commands
-You can run most project commands (to format sources, lint, etc.), in two ways: using the commands in the included Makefile, or running things manually. Using the Makefile works on UNIX-like systems (or anywhere `make` is available), and is shorter to type. Alternatively, you can run each command manually. The sections below describe how to run commands in both ways.
+You can run most project commands (e.g., format sources, lint), in two ways: using the commands in the included Makefile, or running things manually. Using the Makefile works on UNIX-like systems (or anywhere `make` is available), and is shorter to type. Alternatively, you can run each command manually. The sections below describe how to run commands in both ways.
Also, the commands below do not assume that you have your virtual environment enabled. Calling `poetry run` ensures things run in the current virtual environment even if it is not activated. If you manually activate your virtual environment with `source .venv/bin/activate` (see above), you can run all the commands below without the `poetry run` prefix.
### Import Sorting
-We sort all Python imports code in this project with `isort`. Assuming you have followed the instructions in the [Quickstart](#quickstart), you can run this locally with:
+We sort all Python import code in this project with `isort`. Assuming you have followed the instructions in the [Quickstart](#quickstart), you can run this locally with:
```bash
$ poetry run make isort
@@ -126,7 +126,7 @@ Alternatively, you can run the tests manually from the project root:
$ poetry run pytest test
```
-Unit tests failures result in build failures in CI.
+Unit test failures result in build failures in CI.
To test the Juypter notebooks present in the demo folders, run:
@@ -154,8 +154,8 @@ Unit test failures result in build failures in CI.
There are a couple of shorthand commands in the Makefile to run several of the above commands at the same time. The most useful ones include:
-* `poetry run make qa`: execues the source sorting, formatting, source linting, and static type checking commands.
-* `poetry run make ci`: execues the same commands as `qa`, but also runs `gen` to generate updated schemas if needed, and runs `test` to execute the unit tests.
+* `poetry run make qa`: executes the source sorting, formatting, source linting, and static type checking commands.
+* `poetry run make ci`: executes the same commands as `qa`, but also runs `gen` to generate updated schemas if needed, and runs `test` to execute the unit tests.
## Front End
@@ -214,7 +214,7 @@ We utilize GitHub A
We build documentation with `mkdocs` and host documentation on ReadTheDocs. A webhook is set up in the MLTE repository to trigger an integration effect on ReadTheDocs when certain changes to the repo are made.
-We maintain a group of requirements for building the documentation under asthe `docs` optional group. They are installed if you follow the general instructions to set up the environment. But if you only want to build the documentation locally, install the requirements from this group, either in the same dev environment or a separate one:
+We maintain a group of requirements for building the documentation under the `docs` optional group. They are installed if you follow the general instructions to set up the environment. But if you only want to build the documentation locally, install the requirements from this group, either in the same dev environment or a separate one:
```bash
$ poetry install --with docs
@@ -258,15 +258,15 @@ $ bash build.sh
You can also do this manually:
1. Build the static distribution for the front end; the command below assumes that you have the dependencies for frontend builds installed:
-```bash
-$ cd mlte/frontend/nuxt-app && npm run build
-```
+ ```bash
+ $ cd mlte/frontend/nuxt-app && npm run build
+ ```
2. Create the source distribution and wheel from the main repo folder:
-```bash
-$ poetry build
-```
+ ```bash
+ $ poetry build
+ ```
Once the package is built, publish the package to `PyPi` using a PyPi API token:
@@ -290,9 +290,9 @@ Run the containers with:
bash start.sh
```
-This exposes the backend on the host at `localhost:8080`, and the frontend at `localhost:8080`. By default, PostgreSQL database is used in a container, and the data is mapped to the local `./pgdata` folder.
+This exposes the backend on the host at `localhost:8080`, and the frontend at `localhost:8000`. By default, PostgreSQL database is used in a container, and the data is mapped to the local `./pgdata` folder.
-You can CTRL+C to stop seeing the output in the console, but the containers will continue running. You can check back the current logs at any time with:
+You can CTRL+C to stop seeing the output in the console, but the containers will continue running. You can check the current logs at any time with:
```bash
# From inside the docker/deployment folder
@@ -314,9 +314,9 @@ Currently, `MLTE` supports the following Python versions:
- `3.10`
- `3.11`
-`pyenv` can be used to manage multiple Python versions locally. The following procedure can be used to ensure you are running the Python version you need. This procedure only needs to be performed once, during initial version establishment, meaning you _probably_ don't need to be repeating this step in order to contribute to `MLTE`.
+`pyenv` can be used to manage multiple Python versions locally. The following procedure can be used to ensure you are running the Python version you need. This procedure only needs to be performed once, during initial version establishment, meaning you _probably_ don't need to repeat this step to contribute to `MLTE`.
-### Establishing Depdencies for a Particular Python Version
+### Establishing Dependencies for a Particular Python Version
Install the desired version with:
@@ -345,4 +345,4 @@ Once all QA checks and unit tests pass, we can be assured that the environment d
## Contributing
-To contribute to `MLTE`, check out our GitHub!
\ No newline at end of file
+To contribute to `MLTE`, check out our GitHub repository!
\ No newline at end of file
diff --git a/docs/docs/img/schema_example.png b/docs/docs/img/schema_example.png
deleted file mode 100644
index 80e349c5..00000000
Binary files a/docs/docs/img/schema_example.png and /dev/null differ
diff --git a/docs/docs/index.md b/docs/docs/index.md
index 8d5971ef..0fbfce24 100644
--- a/docs/docs/index.md
+++ b/docs/docs/index.md
@@ -10,19 +10,19 @@
*Diagram to be updated October 2024*
### Continuous Negotiation
-To begin, model developers and project stakeholders meet to determine mission and system requirements that will influence model development such as the deployment environment, available data, model requirements, and system requirements. Throughout the process, teams continue to have meetings to update their assumptions and requirements.
+To begin, model developers and project stakeholders meet to determine mission, business, and system requirements that will influence model development such as the deployment environment, available data, model requirements, and system requirements. Throughout the process, teams continue to have meetings to update their assumptions and requirements.
#### MLTE Negotiation Card
-As part of the negotiation, teams fill out a `MLTE` [negotiation card](negotiation_card.md) which allows them to record agreements and drives model development and testing.
+As part of the negotiation, teams fill out a `MLTE` [Negotiation Card](negotiation_card.md) which allows them to record agreements and drives model development and testing.
#### Quality Attribute Scenarios
Quality attributes are a way to specify a system’s structural and behavioral requirements; MLTE leverages this approach during negotiations to help teams move from vague statements to concrete requirements.
### Initial Model Testing (IMT)
-Teams use information from the [negotiation card](negotiation_card.md) during initial model development to inform model requirements and thresholds. Once initial development is complete, model teams do initial testing during this step to determine when the model exceeds their baselines.
+Teams use information from the [Negotiation Card](negotiation_card.md) during initial model development to inform model requirements and thresholds. Once initial development is complete, model teams do initial testing during this step to determine when the model exceeds their baselines.
### System Dependent Model Testing (SDMT)
-Once a model passes its baseline requirements in IMT, teams can then focus on ensuring that it passes the larger set of system and model requirements. To do so, teams use system requirement and quality attribute information from the [negotiation card](negotiation_card.md) to develop a test specification, which contains code that will evaluate each model or system requirement.
+Once a model passes its baseline requirements in IMT, teams can then focus on ensuring that it passes the larger set of system and model requirements. To do so, teams use system requirement and quality attribute information from the [Negotiation Card](negotiation_card.md) to develop a test specification, which contains code that will evaluate each model or system requirement.
#### Test Catalog
The `MLTE` Test Catalog contains reusable — local or organizational — examples of test cases organized by quality attribute. Model developers can use it to find examples of tests (like looking for code examples on StackOverflow).
@@ -30,7 +30,7 @@ The `MLTE` Test Catalog contains reusable — local or organizational — exampl
#### Communicating Results
Once SDMT has provided evidence of how a model performs against required model and system qualities, a MLTE Report can be generated to communicate test results and provide the context for requirements and results.
-If results are satisfactory, the output is a production-ready model (meaning that is meets defined system and model requirements), along with all testing evidence (code, data, and results).
+If results are satisfactory, the output is a production-ready model (meaning that it meets defined system and model requirements), along with all testing evidence (code, data, and results).
If results are not satisfactory, more negotiation is required to determine if requirements are realistic, whether more experimentation is required, or whether results triggered additional requirements or tests.
@@ -39,23 +39,38 @@ If results are not satisfactory, more negotiation is required to determine if re
- [MLTE Process](mlte_process.md) (A more detailed guide than above)
- [Setting Up MLTE](setting_up_mlte.md)
- [Development](development.md)
-- MLTE Paper (ICSE 2023)
+- MLTE Paper (ICSE 2023 - 45th International Conference on Software Engineering)
+- Using Quality Attribute Scenarios for ML Model Test Case Generation (SAML 2024 - 3rd International Workshop on Software Architecture and Machine Learning)
## MLTE Metadata
-- Version: 0.3.0
+- Version: 1.0
- Contact Email: mlte dot team dot info at gmail dot com
-- Citation: While not required, it is highly encouraged and greatly appreciated if you cite our paper when you use `MLTE` for academic research.
-
-```
-@article{maffey2023mlteing,
- title={MLTEing Models: Negotiating, Evaluating, and Documenting
- Model and System Qualities},
- author={Maffey, Katherine R and Dotterrer, Kyle and Niemann, Jennifer
- and Cruickshank, Iain and Lewis, Grace A and K{\"a}stner, Christian},
- journal={arXiv preprint arXiv:2303.01998},
- year={2023}
-}
-```
+- Citations: While not required, it is highly encouraged and greatly appreciated if you cite our paper when you use `MLTE` for academic research.
+
+ ```
+ @inproceedings{maffey2023,
+ title={{MLTEing models: Negotiating, Evaluating, and Documenting Model and System Qualities}},
+ author={Maffey, Katherine R and Dotterrer, Kyle and Niemann, Jennifer and Cruickshank, Iain
+ and Lewis, Grace A and K{\"a}stner, Christian},
+ booktitle={2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and
+ Emerging Results (ICSE-NIER)},
+ pages={31--36},
+ year={2023},
+ organization={IEEE}
+ }
+ ```
+ ... or if you use, or are inspired by, quality attributes for ML model test case generation.
+ ```
+ @inproceedings{brower2024,
+ author={Brower-Sinning, Rachel and Lewis, Grace A. and Echeverría, Sebastián and Ozkaya, Ipek},
+ booktitle={2024 IEEE 21st International Conference on Software Architecture Companion (ICSA-C)},
+ title={Using Quality Attribute Scenarios for ML Model Test Case Generation},
+ year={2024},
+ pages={307-310},
+ organization={IEEE}
+ }
+ ```
+
## Check out `MLTE` on GitHub!
\ No newline at end of file
diff --git a/docs/docs/mlte_process.md b/docs/docs/mlte_process.md
index 25ac4883..822838f9 100644
--- a/docs/docs/mlte_process.md
+++ b/docs/docs/mlte_process.md
@@ -1,34 +1,33 @@
# MLTE Process
-![MLTE Diagram](img/placeholder_numbered_diagram.png)
-*Placeholder diagram*
+![MLTE Diagram](img/mlte_numbered_diagram.png)
## 1. Continuous Negotiation
The process starts with a negotiation step between model developers and project stakeholders where the goal is to share information about mission and system requirements that will influence model development, such as the deployment environment, available data, model requirements, and system requirements.
-This negotiation continues throughout model development, in response (for example) to missing information, unrealistic expectations, and/or test results that do not meet system requirements,
+This negotiation continues throughout model development, in response to, for example, to missing information, unrealistic expectations, and/or test results that do not meet system requirements.
-### 2. Quality Attribute Scenarios
+## 2. Quality Attribute Scenarios
Quality attributes are a way to specify a system’s structural and behavioral requirements; MLTE leverages this approach during negotiations to help teams move from vague statements to concrete requirements. More information on using quality attributes can be found by reading this paper.
-### 3. Negotiation Card
-As part of the negotiation, teams fill out a `MLTE` [negotiation card](negotiation_card.md) which allows them to record agreements and drives model development and testing.
+## 3. Negotiation Card
+As part of the negotiation, teams fill out a `MLTE` [Negotiation Card](negotiation_card.md) which allows them to record agreements and drives model development and testing.
## 4. Initial Model Testing (IMT)
-Teams use information from the [negotiation card](negotiation_card.md) during initial model development to inform model requirements and thresholds. Once initial development is complete, model teams do initial testing during this step to determine when the model exceeds their baselines.
+IMT recognizes the iterative and experimental nature of model development. Teams use information from the [Negotiation Card](negotiation_card.md) during initial model development to inform model requirements, performance thresholds, and design decisions. Once initial development is complete, model teams perform initial testing during this step to determine when the model exceeds baselines. Once model performance exceeds baselines, or if additional testing is needed to validate assumptions, the procees moves to SDMT.
## 5. System Dependent Model Testing (SDMT)
-Once a model passes its baseline requirements in IMT, teams can then focus on ensuring that it passes the larger set of system and model requirements. To do so, teams use system requirement and quality attribute information from the [negotiation card](negotiation_card.md) to develop a test specification, which contains code that will evaluate each model or system requirement.
+In SDMT, teams focus on ensuring that the model passes the larger set of system and model requirements. To do so, teams use system requirements and quality attribute information from the [Negotiation Card](negotiation_card.md) to develop a test specification, which contains code that will evaluate each model or system requirement.
-### 6. Test Catalog
-The `MLTE` Test Catalog contains reusable — local or organizational — examples of test cases organized by quality attribute. Model developers can use it to find examples of tests (like looking for code examples on StackOverflow). Model developers can also contribute test code back to the Test Catalog so that it can be used by others.
+## 6. Test Catalog
+The `MLTE` Test Catalog contains reusable — local or organizational — examples of test cases organized by quality attribute. Model developers can use the catalog to find examples of tests, similar to looking for code examples on StackOverflow. Model developers can also contribute test code back to the Test Catalog so that it can be used by others.
-### 7. Test Cases
+## 7. Test Cases
Test cases are derived from the test specification that defines metrics, measurement methods, and passing conditions.
-### 8. `MLTE` Report
+## 8. `MLTE` Report
Once test cases are executed, a `MLTE` Report can be generated to communicate test results and provide the context for requirements and results.
-If stakeholders consider the results to be satisfactory, the result is a production-ready model (meaning that is meets defined system and model requirements), along with all testing evidence (code, data, and results). This evidence can be used for stakeholders to repeat tests, expand tests, or make decisions about additional testing effort required. An additional benefit is support for regression testing in response to model maintenance and evolution.
+If stakeholders consider the results to be satisfactory, the outcome is a production-ready model (meaning that is meets defined system and model requirements), along with all testing evidence (code, data, and results). This evidence can be used for stakeholders to repeat tests, expand tests, or make decisions about additional testing effort required. An additional benefit is support for regression testing in response to model maintenance and evolution.
If stakeholders do not consider the results to be satisfactory, more negotiation is required to determine if requirements are realistic, whether more experimentation is required, or whether results triggered additional requirements or tests.
\ No newline at end of file
diff --git a/docs/docs/negotiation_card.md b/docs/docs/negotiation_card.md
index df7862c9..a9a53d58 100644
--- a/docs/docs/negotiation_card.md
+++ b/docs/docs/negotiation_card.md
@@ -224,7 +224,7 @@ Measures used to determine that the responses enumerated for the scenario have b
### System Quality Statement
-Scenario for {System Quality}: {Stimulus} from {Source} during {Environment}. {Response} {Response Measure}.
+As the information above is added, the text for the full scenario is constructed. Adjust the infornation until there is a clear statement of the quaity attribute scenario.
#### Example: Response Time
diff --git a/docs/docs/setting_up_mlte.md b/docs/docs/setting_up_mlte.md
index 71ca7bca..eeca0de0 100644
--- a/docs/docs/setting_up_mlte.md
+++ b/docs/docs/setting_up_mlte.md
@@ -1,6 +1,6 @@
# Setting Up `MLTE`
-`MLTE` is a framework (a process to follow) and an infrastructure (a Python package) for machine learning model and system evaluation. This section focuses on setting up the infrastructure, which is an integral part of following the `MLTE` [process](mlte_framework.md).
+`MLTE` is a process and an infrastructure (a Python package) for machine learning model and system evaluation. This section focuses on setting up the infrastructure, which is an integral part of following the `MLTE` [process](mlte_framework.md).
## Installation
@@ -41,13 +41,13 @@ from mlte.report ... #importing from report subpackage
### Setting up a MLTE session
-Before most operations can be done on MLTE, a context and artifact store need to be set. When using MLTE as a library, there are two commands that can be executed once on a script to set this global state. They can be imported like so:
+Before most operations can be done on MLTE, a context and artifact store need to be set. When using MLTE as a library, there are two commands that can be executed once in a script to set this global state. They can be imported using
```python
from mlte.session import set_context, set_store
```
-They are described used in the following way:
+They are described and used in the following way:
- ``set_context("model_name", "model_version")``: this command indicates the model and version you will be working on for the rest of the script. It is mostly used to point to the proper location in the store when saving and loading artifacts. The model name and version can be any string.
@@ -60,7 +60,7 @@ Alternatively, these two things can also be set by environment variables before
## Running the Backend and User Interface
-The web-based user interface (UI or frontend) allows you to edit some types of artifacts in the system (like a Negotiation Card), and review the existing models and test catalogs. It requires authentication to access, so it also allows admins to manage users. To access the UI, first you need to start the backend server. See details for running each component.
+The web-based user interface (UI or frontend) allows you to create and edit system artifacts, such as the Negotiation Card, and review existing models and test catalogs. It requires authentication for access and allows admins to manage users. To access the UI, first you need to start the backend server. See details for running each component.
### Backend
@@ -74,19 +74,19 @@ Some common flags used with the backend include the following:
- **Artifact store**: the default artifact store will store any artifacts in a non-persistent, in-memory store. To change the store type, use the `--store-uri` flag. Please see the [Store URIs](#store-uris) section for details about each store type and corresponding URI. Note that this flag will also set the internal user store, used to handle users and permissions needed for the UI. To use a relational database to store artifacts, you will need to set up the database engine separately; see the [Using a Relational DB](#using-a-relational-db-engine-backend) section below for details. For example, to run the backend with a store located in a folder called `store` relative to the folder where you are running mlte, you can run the backend like this:
-```bash
-$ mlte backend --store-uri fs://store
-```
+ ```bash
+ $ mlte backend --store-uri fs://store
+ ```
- - **Test catalog stores**: Optionally, you can specify one or more test catalog stores to use by the system. This is done with the `--catalog-uris` flag, which is similar to the flag for artifact stores. Unlike that one, however, catalogs need to have an ID, and this flag allows you to specify more than one test catalog if required. The value of this flag is a string with a dictionary with the ids and the actual store URIs. For example, to run the backend with two catalogs, one called "cat1" and another one called "cat2", the first one being in memory and the second one being in a local folder called `store`, you would run this command:
+ - **Test catalog stores**: Optionally, you can specify one or more test catalog stores to be used by the system. This is done with the `--catalog-uris` flag, which is similar to the flag for artifact stores. Unlike that flag, however, catalogs need to have an ID, and this flag allows you to specify more than one test catalog if required. The value of this flag is a string with a dictionary with the ids and the actual store URIs. For example, to run the backend with two catalogs, one called "cat1" and another one called "cat2", the first one being in memory and the second one being in a local folder called `store`, you would run this command:
-```bash
-$ mlte backend --catalog-uris '{"cat1": "memory://", "cat2": "fs://store"}'
-```
+ ```bash
+ $ mlte backend --catalog-uris '{"cat1": "memory://", "cat2": "fs://store"}'
+ ```
- **Token key**: The backend comes with a default secret for signing authentication tokens. In real deployments, you should define a new secret to be used for token signing instead of the default one. This can be done by either passing it as a command line argument with the `--jwt-secret` flag, or creating an `.env` file with the secret string on the variable `JWT_SECRET_KEY=""`
- - **Allowed origins**: In order for the frontend to be able to communicate with the backend, the frontend need to be allowed as an origin in the backend. This can be done by specifying the `--allowed-origins` flag when starting the backend. When ran through the mlte package, the frontend will be hosted at `http://localhost:8000`. This address is configured to be allowed by default, so the flag does not need to be used by default, but if the frontend is hosted on another address then this flag needs to be set with the correct address.
+ - **Allowed origins**: In order for the frontend to be able to communicate with the backend, the frontend needs to be allowed as an origin in the backend. This can be done by specifying the `--allowed-origins` flag when starting the backend. When run through the mlte package, the frontend will be hosted at `http://localhost:8080`. This address is configured to be allowed by default, so the flag does not need to be used by default, but if the frontend is hosted on another address then this flag needs to be set with the correct address.
### Frontend
diff --git a/docs/docs/using_mlte.md b/docs/docs/using_mlte.md
index 49b4e2a7..12c5da45 100644
--- a/docs/docs/using_mlte.md
+++ b/docs/docs/using_mlte.md
@@ -2,19 +2,19 @@
After [setting up `MLTE`](setting_up_mlte.md), the process begins at the inception of a project with requirements definition.
-If your team has an existing project you'd like to test using the `MLTE` infrastructure, navigate to the [Internal Model Testing](#internal-model-testing-imt) section for a description of testing a model with `MLTE`.
+If your team has an existing project and you would like to test it using `MLTE`, navigate to the [Internal Model Testing](#internal-model-testing-imt) section for a description of testing a model with `MLTE`. However, if stakeholders have not been activley involved in the process, it is recommended to start with the Negotiation Card to make sure that both system and model requirements have been elicited and defined.
## Negotiate Model Quality Requirements
-To begin the `MLTE` process, teams hold a negotiation - a discussion about requirements - amongst stakeholders, software engineers, data scientists, and anyone else involved in the project.
+To begin the `MLTE` process, teams hold a negotiation a discussion about requirements with stakeholders that should include system/product owners, software engineers, data scientists, and anyone else involved in the project.
-- The negotiation facilitator should review the instructions and content for the negotiation card, which can be found in the `MLTE` user interface. To set up `MLTE`, see the [Setting Up `MLTE`](setting_up_mlte.md) page, and to view the content in the Negotiation Card, see this [page](negotiation_card.md).
-- The negotiation is a collaborative discussion where all involved parties can agree on project requirements and discuss technical details.
-- Once the negotiation is complete and the negotiation card is filled in as much as possible (it does not have to all be filled out at once), development can begin. The negotiation card gives the team a reference for project goals and allows them to plan out their development cycles appropriately.
+- The negotiation facilitator should review the instructions and content of the Negotiation Card, which can be found in the `MLTE` user interface. To set up `MLTE`, see the [Setting Up `MLTE`](setting_up_mlte.md) page, and to view the content in the Negotiation Card, see this [page](negotiation_card.md).
+- The negotiation is a collaborative discussion where all involved parties aim to agree on project requirements and discuss technical details.
+- Once the negotiation is complete and the Negotiation Card is filled in as much as possible (it does not have to all be filled out at once), development can begin. The Negotiation Card gives the team a reference for project goals and allows them to plan out their development cycles appropriately.
## Internal Model Testing (IMT)
-After initial model development has been completed, the team should have a model that is ready for preliminary testing. In IMT, the development team evaluates how the model performs against its baseline on the chosen performance metrics for each system goal. Evaluation in `MLTE` follows this process:
+After initial model development has been completed, the team should have a model that is ready for preliminary testing. In IMT, the development team evaluates how the model performs against the baselines for the defined performance metrics for each system goal. Evaluation in `MLTE` follows this process:
1. Initialize the `MLTE` context.
2. Define a specification.
@@ -34,7 +34,7 @@ set_store(f"local://{store_path}")
### 2. Define a `Specification`
-A `Specification` (or `Spec`) represents the requirements the completed model must meet in order to be acceptable for use in the system into which it will be integrated. Full `Spec` definition will be completed in [SDMT](#system-dependent-model-testing-sdmt); in IMT, we use it in a preliminary fashion so the development team can do an initial round of model testing. However, the process is the same for both stages. Here we define a `Spec` using accuracy as a performance metric. We also add in further initial testing capacity by including a confusion matrix and class distribution.
+A `Specification` (or `Spec`) represents the requirements the model must meet in order to be acceptable for use in the system into which it will be integrated. Full `Spec` definition will be completed in [SDMT](#system-dependent-model-testing-sdmt); in IMT, we use it in a preliminary fashion so the development team can do an initial round of model testing. However, the process is the same for both stages. Here we define a `Spec` using accuracy as a performance metric. We also add in further initial testing capacity by including a confusion matrix and class distribution.
```python
from mlte.spec.spec import Spec
@@ -147,17 +147,19 @@ IMT is an iterative process - the development team will likely repeat it several
## Negotiate Model Requirements Beyond Task Efficacy
-After completing IMT, development teams should have a sense of how their model performs on the core project performance metric against the chosen baseline. After they have this additional information, the team conducts another negotiation amongst everyone involved in the project: stakeholders, software engineers, data scientists, and anyone else involved such as a project manager.
+After completing IMT, development teams should have a sense of how their model performs on the core project performance metric against the chosen baseline. After they have this additional information, the team conducts another negotiation amongst everyone involved in the project: stakeholders, software engineers, data scientists, and anyone else involved such as a project manager ot system/product owner.
- The emphasis of this negotiation is to review the discussion from [requirements negotiation](#negotiate-model-quality-requirements) and update it based on the intial evaluation that was performed in [IMT](#internal-model-testing-imt).
-- It is also important to ensure that the development team has all the information they need to build out a `Specification` (`Spec`) after this negotiation.
-- It is likely that the first negotiation only resulted in some sections of the negotiation card being filled out, and a goal of this second negotiation should be to complete more of the sections and have a better picture of what project success will be.
+- It is also important to ensure that the development team has all the information they need to build a `Specification` (`Spec`) after this negotiation.
+- It is likely that the first negotiation only resulted in some sections of the Negotiation Card being filled out, and a goal of this second negotiation should be to complete more of the sections and have a better picture of what project success will be.
-Once the negotiation is complete and the contents of the negotiation card have been updated, the development team will conduct a comprehensive round of testing as part of System Dependent Model Testing.
+Once the negotiation is complete and the contents of the Negotiation Card have been updated, the development team will conduct a comprehensive round of testing as part of System Dependent Model Testing.
## System Dependent Model Testing (SDMT)
-SDMT ensures that a model will function as intended when it is part of a larger system. Using the updated negotiation card, development teams must define a `Specification` (`Spec`) that evaluates all relevant dimensions for the overall system to function. This follows the same process described in [IMT](#internal-model-testing-imt), with more emphasis on building out the specification. As such, this section will focus on the specification and will offer more detail on collecting different types of evidence.
+SDMT ensures that a model will function as intended when it is part of a larger system. Using the updated Negotiation Card, development teams must define a `Specification` (`Spec`) that evaluates all relevant dimensions for the overall system to function. This follows the same process described in [IMT](#internal-model-testing-imt), with more emphasis on building out the specification. As such, this section will focus on the specification and will offer more detail on collecting different types of evidence.
+
+Teams can search the Test Catalog to find examples of how different quality attributes were tested by other teams. Note that MLTE comes with a sample test catalog simply for reference. The goal is for organizations to create and populate their own Test catalogs over time.
### Define a System-Oriented `Specification`
@@ -205,7 +207,7 @@ After building the `Spec`, teams must collect evidence to attest to whether or n
#### Evidence: MLTE Measurements
-The simplest use-case is to import a MLTE-defined `Measurement`, which is then invoked to produce a `Value`. This value can then be inspected and automatically saved to the artifact store. Following are two examples of this type of evidence collection.
+The simplest use case is to import a MLTE-defined `Measurement`, which is then invoked to produce a `Value`. This value can then be inspected and automatically saved to the artifact store. Following are two examples of this type of evidence collection.
```python
from mlte.measurement.storage import LocalObjectSize
@@ -314,7 +316,7 @@ img.save()
## Communicate ML Evaluation Results
-To communicate results and examine findings, `MLTE` produces a report that encapsulates all knowledge gained about the model and the system as a consequence of the evaluation process. Teams can import content from the negotiation card using the `MLTE` UI, and the fields can be customized as needed. The report is most easily generated using the UI, but can also be defined via code as demonstrated below.
+To communicate results and examine findings, `MLTE` produces a report that encapsulates all knowledge gained about the model and the system as a consequence of the evaluation process. Teams can import content from the Negotiation Card using the `MLTE` UI, and the fields can be customized as needed. The report is most easily generated using the UI, but can also be defined via code as demonstrated below.
```python
import time