enable support within aws_bedrockagent_knowledge_base for embedding_model_configuration and supplemental_data_storage_configuration #40737
+284
−38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Last month AWS introduced binary embedding support for Amazon Titan Text Embeddings V2. This PR makes it possible to choose that embedding data type as well as configure the dimensions. For good measure it also adds support for supplemental storage configuration.
I have created and successfully run a new acceptance test. I should note, however, that testing against OpenSearch Serverless Collections (what I was personally targeting) is difficult generally and a polished implementation for acceptance tests seems to have been deferred when this resource was created some months, resulting in a somewhat difficult and manually intensive situation for myself as there was no perfect/automated model to follow. You will note that the extant OSSC tests were set to "skip" and that is how I am committing my new one (though it was not skipped for my actual testing). To perform my testing I pointed at an appropriate extant/external OSSC and then had the KB created by the acceptance test runs point at that. The relevant parameters are XXX'd out in the acceptance test. I have successfully tested against both "BINARY" and "FLOAT32" index data types backed by real OSSC instances created out-of-band.
In real life fully automated environments I have used the aws_lambda_invocation resource to execute post-creation OSSC manipulations to get an index in place. Bedrock KB creation fails without this underlying index being in place because it blows up when doing validation. I think it would be reasonable to continue to defer this realm as tech debt but to circle back imminently to get these acceptance tests into a better place across the board (not just mine; and I am happy to help across the board). At the moment, however, I am in a rather urgent situation where the impetus to have done this is the need to convert vector DBs in an operational environment to the "BINARY" data type.
References
https://aws.amazon.com/blogs/machine-learning/build-cost-effective-rag-applications-with-binary-embeddings-in-amazon-titan-text-embeddings-v2-amazon-opensearch-serverless-and-amazon-bedrock-knowledge-bases/
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/bedrock-agent/client/create_knowledge_base.html
Output from Acceptance Testing