Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin'
Browse files Browse the repository at this point in the history
  • Loading branch information
DGaffney committed Dec 6, 2024
2 parents f871a21 + afb4431 commit b8a8b46
Show file tree
Hide file tree
Showing 30 changed files with 338 additions and 149 deletions.
Binary file modified .env_file.enc
Binary file not shown.
4 changes: 4 additions & 0 deletions .env_file.example
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ PROVIDER_IMAGE_CLASSIFICATION=google
# AWS_ACCESS_KEY_ID=
# AWS_SECRET_ACCESS_KEY=
# AWS_SESSION_TOKEN=
S3_ENDPOINT=http://minio:9000
AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY


# Service host URLs
Expand Down
2 changes: 1 addition & 1 deletion .env_file.test
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ DATABASE_HOST=postgres
DATABASE_USER=postgres
DATABASE_PASS=postgres
S3_ENDPOINT=http://minio:9000
AWS_DEFAULT_REGION=eu-west-1
AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Expand Down
177 changes: 177 additions & 0 deletions .github/workflows/ci-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
name: Build and Run Alegre Tests

on:
schedule:
- cron: '0 9 * * *' #Run daily at 9 UTC
push:
branches:
- master
- develop

pull_request:
branches:
- develop

env:
CC_TEST_REPORTER_ID: "${{ secrets.CC_TEST_REPORTER_ID }}"

jobs:
unit-tests:
runs-on:
labels: alegre
steps:
- name: Set permissions for _work directory
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
sudo chmod 755 $GITHUB_WORKSPACE
- uses: actions/checkout@v4

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: eu-west-1

- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2

- name: Set up QEMU
uses: docker/setup-qemu-action@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Decrypt env
env:
DECRYPTION_PASSWORD: ${{ secrets.DECRYPTION_PASSWORD }}
run: |
openssl enc -aes-256-cbc -d -in .env_file.enc -out .env_file -k $DECRYPTION_PASSWORD
- name: Decrypt Alegre credentials
env:
DECRYPTION_PASSWORD: ${{ secrets.DECRYPTION_PASSWORD }}
run: |
openssl aes-256-cbc -d -in google_credentials.json.enc -out google_credentials.json -k $DECRYPTION_PASSWORD
- name: Install redis tools
run: |
sudo apt-get -y install redis-tools
- name: Set up reporter
run: |
curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
chmod +x ./cc-test-reporter
- name: Before script
run: |
mkdir -p ~/.docker/cli-plugins/ && curl -SL https://github.com/docker/compose/releases/download/v2.30.1/docker-compose-linux-x86_64 -o ~/.docker/cli-plugins/docker-compose && chmod +x ~/.docker/cli-plugins/docker-compose && docker compose version
./cc-test-reporter before-build
docker compose build
docker compose -f docker-compose.yml -f docker-test.yml up -d
docker compose logs -t -f &
echo "Waiting for Elasticsearch indexes..." && until curl --silent --fail -I "http://localhost:9200/alegre_similarity_test"; do sleep 1; done
until curl --silent --fail -I "http://localhost:3100"; do sleep 1; done
echo "Waiting for model servers..."
- name: Run Unit Tests
id: unit-tests
run: |
docker compose exec alegre make test
- name: Generate Coverage Report
if: ${{ github.event_name != 'pull_request' }}
run: |
docker compose exec alegre coverage xml
- name: Upload Coverage Report
if: ${{ github.event_name != 'pull_request' }}
env:
CC_TEST_REPORTER_ID: ${{ secrets.CC_TEST_REPORTER_ID }}
run: ./cc-test-reporter after-build -t coverage.py --exit-code $?

- name: Cleanup Docker Resources
if: always()
run: |
echo "Cleaning up Docker resources..."
docker stop $(docker ps -q)
docker rm $(docker ps -aq)
docker rmi $(docker images -q)
docker volume rm $(docker volume ls -q)
contract-testing:
needs: unit-tests
runs-on:
labels: alegre
steps:
- name: Set permissions for _work directory
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
sudo chmod 755 $GITHUB_WORKSPACE
- uses: actions/checkout@v4

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
aws-region: eu-west-1

- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2

- name: Set up QEMU
uses: docker/setup-qemu-action@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Decrypt env
env:
DECRYPTION_PASSWORD: ${{ secrets.DECRYPTION_PASSWORD }}
run: |
openssl enc -aes-256-cbc -d -in .env_file.enc -out .env_file -k $DECRYPTION_PASSWORD
- name: Decrypt Alegre credentials
env:
DECRYPTION_PASSWORD: ${{ secrets.DECRYPTION_PASSWORD }}
run: |
openssl aes-256-cbc -d -in google_credentials.json.enc -out google_credentials.json -k $DECRYPTION_PASSWORD
- name: Install redis tools
run: |
sudo apt-get -y install redis-tools
- name: Set up reporter
run: |
curl -L https://codeclimate.com/downloads/test-reporter/test-reporter-latest-linux-amd64 > ./cc-test-reporter
chmod +x ./cc-test-reporter
- name: Before script
run: |
mkdir -p ~/.docker/cli-plugins/ && curl -SL https://github.com/docker/compose/releases/download/v2.30.1/docker-compose-linux-x86_64 -o ~/.docker/cli-plugins/docker-compose && chmod +x ~/.docker/cli-plugins/docker-compose && docker compose version
./cc-test-reporter before-build
docker compose build
docker compose -f docker-compose.yml -f docker-test.yml up -d
docker compose logs -t -f &
echo "Waiting for Elasticsearch indexes..." && until curl --silent --fail -I "http://localhost:9200/alegre_similarity_test"; do sleep 1; done
until curl --silent --fail -I "http://localhost:3100"; do sleep 1; done
echo "Waiting for model servers..."
- name: Run contract Tests
id: contract-tests
run: |
docker compose exec alegre make contract_testing
- name: Cleanup Docker Resources
if: always()
run: |
echo "Cleaning up Docker resources..."
docker stop $(docker ps -q)
docker rm $(docker ps -aq)
docker rmi $(docker images -q)
docker volume rm $(docker volume ls -q)
40 changes: 0 additions & 40 deletions .travis.yml

This file was deleted.

1 change: 0 additions & 1 deletion app/main/controller/audio_similarity_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,4 @@ class AudioSimilaritySearchResource(Resource):
@api.doc(params={'url': 'audio URL to be stored or queried for similarity', 'threshold': 'minimum score to consider, between 0.0 and 1.0 (defaults to 0.9)', 'context': 'context'} )
def post(self):
args = request.json
app.logger.debug(f"Args are {args}")
return jsonify({"message": "This endpoint is not implemented."}), 501
35 changes: 33 additions & 2 deletions app/main/controller/image_ocr_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,32 @@ def _after_log(retry_state):
CLIENT = get_credentialed_google_client(vision.ImageAnnotatorClient)
@api.route('/')
class ImageOcrResource(Resource):
@staticmethod
def polygon_area(vertices):
area = 0
for i in range(len(vertices)):
x1, y1 = vertices[i]
x2, y2 = vertices[(i + 1) % len(vertices)]
area += (x1 * y2 - x2 * y1)
return abs(area) / 2

@staticmethod
def calculate_text_percentage(response):
bounds = []
for page in response.full_text_annotation.pages:
for block in page.blocks:
bounds.append(block.bounding_box)
total_text_area = 0
for annotation in bounds:
vertices = [(v.x, v.y) for v in annotation.vertices]
area = ImageOcrResource.polygon_area(vertices)
total_text_area += area
# response object contains the whole image width and height in response.full_text_annotation.pages[0]
# as we are sending images, response.full_text_annotation.pages is always 1 page only
image_area = response.full_text_annotation.pages[0].width * response.full_text_annotation.pages[0].height
text_percentage = (total_text_area / image_area) * 100
return text_percentage

@api.response(200, 'text successfully extracted.')
@api.doc('Perform text extraction from an image')
@api.doc(params={'url': 'url of image to extract text from'})
Expand All @@ -37,8 +63,13 @@ def post(self):
if not texts:
return

app.logger.info(
f"[Alegre OCR] [image_uri {image.source.image_uri}] Image OCR response package looks like {convert_text_annotation_to_json(texts[0])}")
#### calculate bounding boxes areas.
try:
text_percentage = ImageOcrResource.calculate_text_percentage(response)
app.logger.info(
f"[Alegre OCR] [image_uri {image.source.image_uri}] [percentage of image area covered by text {text_percentage}%] Image OCR response package looks like {convert_text_annotation_to_json(texts[0])}")
except Exception as caught_exception:
app.logger.error(f"[image_uri {image.source.image_uri}] Error calculating percentage of image area covered by text. Error was {caught_exception}. Image OCR response package looks like {convert_text_annotation_to_json(texts[0])}")

return {
'text': texts[0].description
Expand Down
7 changes: 3 additions & 4 deletions app/main/controller/presto_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,9 @@ class PrestoResource(Resource):
def post(self, action, model_type):
data = request.json
item_id = data.get("body", {}).get("id")
app.logger.info(f"PrestoResource {action}/{model_type}")
app.logger.info(f"PrestoResource {action}/{model_type}, data is {data}")
return_value = None
if action == "add_item":
app.logger.info(f"Data looks like {data}")
result = similarity.callback_add_item(data.get("body"), model_type)
if data.get("body", {}).get("raw", {}).get("suppress_response"):
# requested not to reply to caller with similarity response, so suppress it
Expand All @@ -40,11 +39,11 @@ def post(self, action, model_type):
if result:
result["is_search_result_callback"] = True
callback_url = data.get("body", {}).get("raw", {}).get("callback_url", app.config['CHECK_API_HOST']) or app.config['CHECK_API_HOST']
if result and data.get("body", {}).get("raw", {}).get("requires_callback"):
if result and result.get("results") is not None and data.get("body", {}).get("raw", {}).get("requires_callback"):
app.logger.info(f"Sending callback to {callback_url} for {action} for model of {model_type} with body of {result}")
Webhook.return_webhook(callback_url, action, model_type, result)
return_value = {"action": action, "model_type": model_type, "data": result}
app.logger.info(f"PrestoResource value is {return_value}")
app.logger.info(f"PrestoResource {action}/{model_type}, data is {data}, return_value is {return_value}")
r = redis_client.get_client()
r.lpush(f"{model_type}_{item_id}", json.dumps(data))
r.expire(f"{model_type}_{item_id}", 60*60*24)
Expand Down
3 changes: 2 additions & 1 deletion app/main/controller/similarity_async_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ class AsyncSimilarityResource(Resource):
@api.doc(params={'text': 'text to be stored or queried for similarity', 'threshold': 'minimum score to consider, between 0.0 and 1.0 (defaults to 0.9)', 'model': 'similarity model to use: "elasticsearch" (pure Elasticsearch, default) or the key name of an active model'})
def post(self, similarity_type):
args = request.json
app.logger.debug(f"Args are {args}")
app.logger.info(f"[AsyncSimilarityResource] Starting Request - args are {args}, similarity_type is {similarity_type}")
if similarity_type == "text":
package = similarity.get_body_for_text_document(args, 'query')
else:
Expand All @@ -42,4 +42,5 @@ def post(self, similarity_type):
result["is_shortcircuited_search_result_callback"] = True
callback_url = args.get("callback_url", app.config['CHECK_API_HOST']) or app.config['CHECK_API_HOST']
Webhook.return_webhook(callback_url, "search", similarity_type, result)
app.logger.info(f"[AsyncSimilarityResource] Completing Request - args are {args}, similarity_type is {similarity_type}, reponse is {response}")
return response
6 changes: 4 additions & 2 deletions app/main/controller/similarity_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,5 +46,7 @@ class SimilaritySearchResource(Resource):
@api.doc(params={'text': 'text to be stored or queried for similarity', 'threshold': 'minimum score to consider, between 0.0 and 1.0 (defaults to 0.9)', 'model': 'similarity model to use: "elasticsearch" (pure Elasticsearch, default) or the key name of an active model'})
def post(self):
args = request.json
app.logger.debug(f"Args are {args}")
return similarity.get_similar_items(similarity.get_body_for_text_document(args, mode='query'), "text")
app.logger.info(f"[SimilaritySearchResource] Args are {args}")
response = similarity.get_similar_items(similarity.get_body_for_text_document(args, mode='query'), "text")
app.logger.info(f"[SimilaritySearchResource] Args are {args}, response is {response}")
return response
8 changes: 5 additions & 3 deletions app/main/controller/similarity_sync_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,12 @@ class SyncSimilarityResource(Resource):
@api.doc(params={'text': 'text to be stored or queried for similarity', 'threshold': 'minimum score to consider, between 0.0 and 1.0 (defaults to 0.9)', 'model': 'similarity model to use: "elasticsearch" (pure Elasticsearch, default) or the key name of an active model'})
def post(self, similarity_type):
args = request.json
app.logger.debug(f"Args are {args}")
app.logger.info(f"[SyncSimilarityResource] Starting Request - args are {args}, similarity_type is {similarity_type}")
if similarity_type == "text":
package = similarity.get_body_for_text_document(args, 'query')
return similarity.get_similar_items(package, similarity_type)
response = similarity.blocking_get_similar_items(package, similarity_type)
else:
package = similarity.get_body_for_media_document(args, 'query')
return similarity.blocking_get_similar_items(package, similarity_type)
response = similarity.blocking_get_similar_items(package, similarity_type)
app.logger.info(f"[SyncSimilarityResource] Completing Request - args are {args}, similarity_type is {similarity_type}, reponse is {response}")
return response
Loading

0 comments on commit b8a8b46

Please sign in to comment.