Update readme to add more information (opensearch-project#81)

* Create JvmService instance on demand. Signed-off-by: Alex <[email protected]> * Move the ml_parameters from XContent to the request parameters to avoid the conflict with search XContent input. Signed-off-by: Alex <[email protected]> * Fix the security risks found by PenTest. 1. unhandled 500 server error. 2. Insecure Deserialization Signed-off-by: Alex <[email protected]> * Remove unnecessory '*' from the welcome list of model deserializer. Signed-off-by: Alex <[email protected]> * Update readme to add more information. Signed-off-by: Alex <[email protected]> * Add developer guide to the document. Signed-off-by: Alex <[email protected]> * Add documents for ml-commens. Signed-off-by: Alex <[email protected]> * Sync the build scripts Signed-off-by: Alex <[email protected]> * Remove the dependencies added to support Mleap. Signed-off-by: Alex <[email protected]> Co-authored-by: Alex <[email protected]>
HenryL27 · Nov 16, 2021 · adf5dcd · adf5dcd
1 parent 24a4c9c
commit adf5dcd
Show file tree

Hide file tree

Showing 4 changed files with 183 additions and 106 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,103 +1,4 @@
-# Contributing Guidelines
+## Contributing to this Project
 
-Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
-documentation, we greatly value feedback and contributions from our community.
-
-Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
-information to effectively respond to your bug report or contribution.
-
-
-## Reporting Bugs/Feature Requests
-
-We welcome you to use the GitHub issue tracker to report bugs or suggest features.
-
-When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
-reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
-
-* A reproducible test case or series of steps
-* The version of our code being used
-* Any modifications you've made relevant to the bug
-* Anything unusual about your environment or deployment
-
-
-## Contributing via Pull Requests
-Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
-
-1. You are working against the latest source on the *main* branch.
-2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
-3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
-
-To send us a pull request, please:
-
-1. Fork the repository.
-2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
-3. Ensure local tests pass.
-4. Commit to your fork using clear commit messages.
-5. Send us a pull request, answering any default questions in the pull request interface.
-6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
-
-GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
-[creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
-
-
-## Finding contributions to work on
-Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
-
-
-## Code of Conduct
-This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
-For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
-[email protected] with any additional questions or comments.
-
-## Developer Certificate of Origin
-
-OpenSearch is an open source product released under the Apache 2.0 license (see either [the Apache site](https://www.apache.org/licenses/LICENSE-2.0) or the [LICENSE.txt file](./LICENSE.txt)).  The Apache 2.0 license allows you to freely use, modify, distribute, and sell your own products that include Apache 2.0 licensed software.
-
-We respect intellectual property rights of others and we want to make sure all incoming contributions are correctly attributed and licensed. A Developer Certificate of Origin (DCO) is a lightweight mechanism to do that.
-
-The DCO is a declaration attached to every contribution made by every developer. In the commit message of the contribution, the developer simply adds a `Signed-off-by` statement and thereby agrees to the DCO, which you can find below or at [DeveloperCertificate.org](http://developercertificate.org/).
-
-```
-Developer's Certificate of Origin 1.1
-
-By making a contribution to this project, I certify that:
-
-(a) The contribution was created in whole or in part by me and I
-    have the right to submit it under the open source license
-    indicated in the file; or
-
-(b) The contribution is based upon previous work that, to the
-    best of my knowledge, is covered under an appropriate open
-    source license and I have the right under that license to
-    submit that work with modifications, whether created in whole
-    or in part by me, under the same open source license (unless
-    I am permitted to submit under a different license), as
-    Indicated in the file; or
-
-(c) The contribution was provided directly to me by some other
-    person who certified (a), (b) or (c) and I have not modified
-    it.
-
-(d) I understand and agree that this project and the contribution
-    are public and that a record of the contribution (including
-    all personal information I submit with it, including my
-    sign-off) is maintained indefinitely and may be redistributed
-    consistent with this project or the open source license(s)
-    involved.
- ```
-We require that every contribution to OpenSearch is signed with a Developer Certificate of Origin.  Additionally, please use your real name.  We do not accept anonymous contributors nor those utilizing pseudonyms.
-
-Each commit must include a DCO which looks like this
-
-```
-Signed-off-by: Jane Smith <[email protected]>
-```
-You may type this line on your own when writing your commit messages.  However, if your user.name and user.email are set in your git configs, you can use `-s` or `--signoff` to add the `Signed-off-by` line to the end of the commit message.
-
-## Security issue notifications
-If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
-
-
-## Licensing
-
-See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
+OpenSearch is a community project that is built and maintained by people just like **you**.
+[This document](https://github.com/opensearch-project/.github/blob/main/CONTRIBUTING.md) explains how you can contribute to this and related projects.
diff --git a/DEVELOPER_GUIDE.md b/DEVELOPER_GUIDE.md
@@ -0,0 +1,65 @@
+- [Developer Guide](#developer-guide)
+    - [Forking and Cloning](#forking-and-cloning)
+    - [Install Prerequisites](#install-prerequisites)
+        - [JDK 14](#jdk-14)
+    - [Setup](#setup)
+    - [Build](#build)
+        - [Building from the command line](#building-from-the-command-line)
+        - [Building from the IDE](#building-from-the-ide)
+        - [Debugging](#debugging)
+
+## Developer Guide
+
+### Forking and Cloning
+
+Fork this repository on GitHub, and clone locally with `git clone`.
+
+### Install Prerequisites
+
+#### JDK 14
+
+OpenSearch components build using Java 14 at a minimum. This means you must have a JDK 14 installed with the environment variable `JAVA_HOME` referencing the path to Java home for your JDK 14 installation, e.g. `JAVA_HOME=/usr/lib/jvm/jdk-14`.
+
+### Setup
+
+1. Clone the repository (see [Forking and Cloning](#forking-and-cloning))
+2. Make sure `JAVA_HOME` is pointing to a Java 14 JDK (see [Install Prerequisites](#install-prerequisites))
+3. Launch Intellij IDEA, Choose Import Project.
+
+### Build
+
+This package uses the [Gradle](https://docs.gradle.org/current/userguide/userguide.html) build system. Gradle comes with excellent documentation that should be your first stop when trying to figure out how to operate or modify the build. we also use the OpenSearch build tools for Gradle. These tools are idiosyncratic and don't always follow the conventions and instructions for building regular Java code using Gradle. Not everything in this package will work the way it's described in the Gradle documentation. If you encounter such a situation, the OpenSearch build tools [source code](https://github.com/opensearch-project/OpenSearch/tree/main/buildSrc/src/main/groovy/org/opensearch/gradle) is your best bet for figuring out what's going on.
+
+#### Building from the command line
+
+1. `./gradlew build` builds and tests
+2. `./gradlew :run` launches a single node cluster with ml-commons plugin installed
+3. `./gradlew :integTest` launches a single node cluster with ml-commons plugin installed and runs all integration tests except security
+4. ` ./gradlew :integTest --tests="**.test execute foo"` runs a single integration test class or method
+5. `./gradlew spotlessApply` formats code. And/or import formatting rules in `.eclipseformat.xml` with IDE.
+
+When launching a cluster using one of the above commands logs are placed in `/build/cluster/run node0/opensearch-<version>/logs`. Though the logs are teed to the console, in practices it's best to check the actual log file.
+
+#### Building from the IDE
+
+Currently, the only IDE we support is IntelliJ IDEA.  It's free, it's open source, it works. The gradle tasks above can also be launched from IntelliJ's Gradle toolbar and the extra parameters can be passed in via the Launch Configurations VM arguments.
+
+#### Debugging
+
+Sometimes it's useful to attach a debugger to either the OpenSearch cluster or the integ tests to see what's going on. When running unit tests you can just hit 'Debug' from the IDE's gutter to debug the tests.  To debug code running in an actual server run:
+
+```
+./gradlew :integTest --debug-jvm # to start a cluster and run integ tests
+OR
+./gradlew :run --debug-jvm # to just start a cluster that can be debugged
+```
+
+The OpenSearch server JVM will launch suspended and wait for a debugger to attach to `localhost:8000` before starting the OpenSearch server.
+
+To debug code running in an integ test (which exercises the server from a separate JVM) run:
+
+```
+./gradlew -Dtest.debug :integTest 
+```
+
+The test runner JVM will start suspended and wait for a debugger to attach to `localhost:5005` before running the tests.
diff --git a/LICENSE b/LICENSE
@@ -173,3 +173,30 @@
       defend, and hold each Contributor harmless for any liability
       incurred by, or claims asserted against, such Contributor by reason
       of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/README.md b/README.md
@@ -1,12 +1,96 @@
-## OpenSearch Machine Learning
+<img src="https://opensearch.org/assets/brand/SVG/Logo/opensearch_logo_default.svg" height="64px"/>
 
-Machine Learning Framework for OpenSearch is a new solution that make it easy to develop new machine learning feature. It allows engineers to leverage existing opensource machine learning algorithms and reduce the efforts to build any new machine learning feature. It also removes the necessity from engineers to manage the machine learning tasks which will help to speed the feature developing process.
+<!-- TOC -->
+
+- [OpenSearch Machine Learning Commons](#opensearch-machine-learning-commons)
+- [Contributing](#contributing)
+- [Code of Conduct](#code-of-conduct)
+- [Security](#security)
+- [License](#license)
+- [Copyright](#copyright)
+
+<!-- /TOC -->
+
+## OpenSearch Machine Learning Commons
+
+Machine Learning Commons for OpenSearch is a new solution that make it easy to develop new machine learning feature. It allows engineers to leverage existing opensource machine learning algorithms and reduce the efforts to build any new machine learning feature. It also removes the necessity from engineers to manage the machine learning tasks which will help to speed the feature developing process.
+
+### Problem Statement
+
+Until today, the challenge is significant to build a new machine learning feature inside OpenSearch. The reasons include:
+
+* **Disruption to OpenSearch Core features**. Machine learning is very computationally intensive. But currently there is no way to add dedicated computation resources in OpenSearch for machine learning jobs, hence these jobs have to share same resources with Core features, such as: indexing and searching. That might cause the latency increasing on search request, and cause circuit breaker exception on memory usage. To address this, we have to carefully distribute models and limit the data size to run the AD job. When more and more ML features are added into OpenSearch, it will become much harder to manage.
+* **Lack of support for machine learning algorithms.** Customers need more algorighms within Opensearch, otherwise the data need be exported to outside of elasticsearch, such as s3 first to do the job, which will bring extra cost and latency.
+* **Lack of resource management mechanism between multiple machine learning jobs.** It's hard to coordinate the resources between multi features.
+
+
+In the meanwhile, we observe more and more machine learning features required to be supported in OpenSearch to power end users’ business needs. For instance:
+
+* **Forecasting**: Forecasting is very popular in time series data analysis. Although the past data isn’t always an indicator for the future, it’s still very powerful tool used in some use cases, such as capacity planning to scale up/down the service hosts in IT operation.
+* **Root Cause Analysis in DevOps**: Today some customers use OpenSearch for IT operations. It becomes more and more complicated to identify the root cause of an outage or incident since it needs to gather all the information in the ecosystem, such as log, traces, metrics. Machine learning technique is a great fit to address this issue by building topology models of the system automatically, and understanding the similarity and casual relations between events, etc.
+* **Machine Learning in SIEM**: SIEM(Security Information and Event Management) is another domain in OpenSearch. Machine learning is also very useful in SIEM to help facilitate security analytics, and it can reduce the effort on sophisticated tasks, enable real time threat analysis and uncover anomalies.
+
+### Solution
+The solution is to introduce a new Machine Learning library inside the OpenSearch cluster. The major functionalities in this solution include:
+
+* **Unified Client Interfaces:** clients can use common interfaces for training and inference tasks, and then follow the algorithm interface to give right input parameters, such as input data, hyperparameters. A client library will be built for easy use.
+* **ML Plugin:** ML plugin will help to initiate the ML nodes, and choose the right nodes and allocate the resources for each request, and manage machine learning tasks with monitoring and failure handing supports, and store the model results; it will be the bridge for the communication between OpenSearch process and ML engine.
+* **ML Engine**: This engine will be the host for ML algorithms.  Java based machine learning algorithms will be supported in the first release.
+
+This solution makes it easy to develop new machine learning features. It allows engineers to leverage existing open-source machine learning algorithms, and reduce the efforts to build any new machine learning feature. It also removes the necessity from engineers to manage the machine learning tasks which will help to speed up the feature developing process.
+
+### How to use it for new feature development
+
+As mentioned above, new interfaces including both prediction and training will be provided to customers through Rest APIs, and to other plugins through transport action. Here are the transport action for prediction and training interfaces.
+
+* Predict Transport Action for prediction job request
+  ```
+  Request: {
+        "algorithm": "ARIMA",  //the name of algorithm
+        "parameters": {"forecasts_en":10, "seasonal"=true}, // parameters of the algorithm, can be null or empty
+        "modelId":123, //the id for trainded model.
+        "inputData": [[1.0, 2, 3.1, true, "v1"],[1.1, 4, 5.2, false, "v2"]] // internal data frame interface
+    }
+    
+    Response: {
+        "taskId": "123", //the id of the job request
+        "status": "SUCCESS", // the job execution status
+        "predictionResult": [[6.0],[7.0]] // internal data frame interface
+    }
+   ```      
+* Training Transport Action to start training job request - Async Interface
+  ```
+  Request: {
+     "algorithm": "ARIMA", //the name of algorithm
+     "parameters": {"forecasts_en":10, "seasonal"=true}, // parameters of the algorithm, can be null or empty
+     "inputData": [[1.0, 2, 3.1, true, "v1"],[1.1, 4, 5.2, false, "v2"]] // internal data frame interface
+    }
+    
+    
+    Response: {
+     "taskId": "123", //the id of the job request
+     "status": "IN_PROGRESS" // the job execution status
+    
+    }
+   ```
+
+## Contributing
+
+See [developer guide](DEVELOPER_GUIDE.md) and [how to contribute to this project](CONTRIBUTING.md).
+
+## Code of Conduct
+
+This project has adopted the [Amazon Open Source Code of Conduct](CODE_OF_CONDUCT.md). For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq), or contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
 
 ## Security
 
-See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
+If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public GitHub issue.
 
 ## License
 
-See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
+This project is licensed under the [Apache v2.0 License](LICENSE).
+
+## Copyright
+
+Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.