Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV2-5011 refactors for making alegre dual purpose on text encoding #103

Merged
merged 8 commits into from
Aug 16, 2024
2 changes: 1 addition & 1 deletion lib/model/generic_transformer.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this file has existed for a while, but is there a reason we override the respond() function instead of process()?
this would make this model unique and different from other Presto endpoints, and disables cache and error handling bc the standard get_response() function won't be called from model.py.

But I may be missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the reason we override is because we want to, from the get-go, natively be able to process jobs in batch on transformers instead of just walking through items one-by-one like we do with the others. I'll look into a refactor to keep our caching and error handling working for text though - good catch

Original file line number Diff line number Diff line change
Expand Up @@ -33,4 +33,4 @@ def vectorize(self, texts: List[str]) -> List[List[float]]:
"""
Vectorize the text! Run as batch.
"""
return {"hash_value": self.model.encode(texts).tolist()}
return self.model.encode(texts).tolist()
2 changes: 1 addition & 1 deletion test/lib/model/test_fptg.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def test_vectorize(self):
texts = [schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "Hello, how are you?"}, "model_name": "fptg__Model"}), schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "I'm doing great, thanks!"}, "model_name": "fptg__Model"})]
self.model.model = self.mock_model
self.model.model.encode = MagicMock(return_value=np.array([[4, 5, 6], [7, 8, 9]]))
vectors = self.model.vectorize(texts)["hash_value"]
vectors = self.model.vectorize(texts)
self.assertEqual(len(vectors), 2)
self.assertEqual(vectors[0], [4, 5, 6])
self.assertEqual(vectors[1], [7, 8, 9])
Expand Down
2 changes: 1 addition & 1 deletion test/lib/model/test_generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def test_vectorize(self):
texts = [schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "Hello, how are you?"}, "model_name": "fptg__Model"}), schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "I'm doing great, thanks!"}, "model_name": "fptg__Model"})]
self.model.model = self.mock_model
self.model.model.encode = MagicMock(return_value=np.array([[4, 5, 6], [7, 8, 9]]))
vectors = self.model.vectorize(texts)["hash_value"]
vectors = self.model.vectorize(texts)
self.assertEqual(len(vectors), 2)
self.assertEqual(vectors[0], [4, 5, 6])
self.assertEqual(vectors[1], [7, 8, 9])
Expand Down
2 changes: 1 addition & 1 deletion test/lib/model/test_indian_sbert.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def test_vectorize(self):
texts = [schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "Hello, how are you?"}, "model_name": "indian_sbert__Model"}), schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "I'm doing great, thanks!"}, "model_name": "indian_sbert__Model"})]
self.model.model = self.mock_model
self.model.model.encode = MagicMock(return_value=np.array([[4, 5, 6], [7, 8, 9]]))
vectors = self.model.vectorize(texts)["hash_value"]
vectors = self.model.vectorize(texts)
self.assertEqual(len(vectors), 2)
self.assertEqual(vectors[0], [4, 5, 6])
self.assertEqual(vectors[1], [7, 8, 9])
Expand Down
2 changes: 1 addition & 1 deletion test/lib/model/test_meantokens.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def test_vectorize(self):
texts = [schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "Hello, how are you?"}, "model_name": "mean_tokens__Model"}), schemas.parse_message({"body": {"id": "123", "callback_url": "http://example.com/callback", "text": "I'm doing great, thanks!"}, "model_name": "mean_tokens__Model"})]
self.model.model = self.mock_model
self.model.model.encode = MagicMock(return_value=np.array([[4, 5, 6], [7, 8, 9]]))
vectors = self.model.vectorize(texts)["hash_value"]
vectors = self.model.vectorize(texts)
self.assertEqual(len(vectors), 2)
self.assertEqual(vectors[0], [4, 5, 6])
self.assertEqual(vectors[1], [7, 8, 9])
Expand Down
Loading