Skip to content

Commit

Permalink
Merge pull request caikit#235 from markstur/reranker
Browse files Browse the repository at this point in the history
Add rerank and sentence-similarity tasks to text embedding module
  • Loading branch information
gkumbhat authored Nov 20, 2023
2 parents 316ead6 + 6faabe2 commit d91ff47
Show file tree
Hide file tree
Showing 11 changed files with 636 additions and 360 deletions.
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,15 @@ Caikit-NLP implements concept of "task" from `caikit` framework to define (and c

Capabilities provided by `caikit-nlp`:

| Task | Module(s) | Salient Feature(s) |
|----------------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Text Generation | 1. `PeftPromptTuning` <br> 2. `TextGeneration` | 1. Prompt Tuning, Multi-task Prompt tuning <br> 2. Fine-tuning Both modules above provide optimized inference capability using Text Generation Inference Server |
| Text Classification | 1. `SequenceClassification` | 1. (Work in progress..) |
| Token Classification | 1. `FilteredSpanClassification` | 1. (Work in progress..) |
| Tokenization | 1. `RegexSentenceSplitter` | 1. Demo purposes only |
| Embedding | [COMING SOON] | [COMING SOON] |
| Task | Module(s) | Salient Feature(s) |
|-----------------------------------------------------|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| TextGenerationTask | 1. `PeftPromptTuning` <br> 2. `TextGeneration` | 1. Prompt Tuning, Multi-task Prompt tuning <br> 2. Fine-tuning Both modules above provide optimized inference capability using Text Generation Inference Server |
| TextClassificationTask | 1. `SequenceClassification` | 1. (Work in progress..) |
| TokenClassificationTask | 1. `FilteredSpanClassification` | 1. (Work in progress..) |
| TokenizationTask | 1. `RegexSentenceSplitter` | 1. Demo purposes only |
| EmbeddingTask <br> EmbeddingTasks | 1. `TextEmbedding` | 1. TextEmbedding returns a text embedding vector from a local sentence-transformers model <br> 2. EmbeddingTasks takes multiple input texts and returns a corresponding list of vectors.
| SentenceSimilarityTask <br> SentenceSimilarityTasks | 1. `TextEmbedding` | 1. SentenceSimilarityTask compares one source_sentence to a list of sentences and returns similarity scores in order of the sentences. <br> 2. SentenceSimilarityTasks uses a list of source_sentences (each to be compared to same list of sentences) and returns corresponding lists of outputs. |
| RerankTask <br> RerankTasks | 1. `TextEmbedding` | 1. RerankTask compares a query to a list of documents and returns top_n scores in order of relevance with indexes to the source documents and optionally returning the documents. <br> 2. RerankTasks takes multiple queries as input and returns a corresponding list of outputs. The same list of documents is used for all queries. |

## Getting Started

Expand Down
3 changes: 1 addition & 2 deletions caikit_nlp/data_model/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,5 @@
"""

# Local
from . import embedding_vectors, generation
from .embedding_vectors import *
from . import generation
from .generation import *
163 changes: 0 additions & 163 deletions caikit_nlp/data_model/embedding_vectors.py

This file was deleted.

17 changes: 16 additions & 1 deletion caikit_nlp/modules/text_embedding/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,21 @@
# See the License for the specific language governing permissions and
# limitations under the License.

"""
Text Embedding Module
=====================
Implements the following tasks:
1. EmbeddingTask: Returns an embedding from an input text string
2. EmbeddingsTasks: EmbeddingTask but with a list of inputs producing a list of outputs
3. SentenceSimilarityTask: Compare one source sentence to a list of sentences
4. SentenceSimilarityTasks: SentenceSimilarityTask but with a list of source sentences producing
a list of outputs
5. RerankTask: Return top_n documents ordered by relevance given a query
6. RerankTasks: RerankTask but with a list of queries producing a list of outputs
"""

# Local
from .embedding import EmbeddingModule
from .embedding_tasks import EmbeddingTask
Loading

0 comments on commit d91ff47

Please sign in to comment.