Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vector_search function for pipeline aggregation #30

Merged
merged 6 commits into from
Jun 1, 2024

Conversation

Devasy23
Copy link
Owner

This pull request adds a new function called vector_search to the Database class in database.py. The vector_search function performs a pipeline aggregation vector search on the MongoDB Atlas database using the provided embedding. It returns a list of results with the name, image, and score of the closest matches. This functionality is useful for performing similarity searches based on face embeddings.

@Devasy23 Devasy23 added enhancement New feature or request Testing Perform Testing labels Mar 16, 2024
@Devasy23 Devasy23 added this to the Vector Search Enabled milestone Mar 16, 2024
Copy link

You can only raise issues from one repo every day!

Copy link

@senior-dev-bot senior-dev-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from Senior Dev Bot

Comment on lines 22 to +50

def update_one(self, collection, query, update):
return self.db[collection].update_one(query, update)

# add a function for pipeline aggregation vector search
def vector_search(self, collection, embedding):

result = self.db[collection].aggregate([
{
"$vectorSearch": {
"index": "vector_index",
"path": "face_embedding",
"queryVector": embedding,
"numCandidates": 20,
"limit": 20
}
}, {
'$project': {
'_id': 0,
'Name': 1,
'Image': 1,
'score': {
'$meta': 'vectorSearchScore'
}
}
}
])
result_arr = [i for i in result]
return result_arr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting the query and projection into variables for better readability and maintainability. This practice enhances code clarity and simplifies future modifications.

def vector_search(self, collection, embedding):
    query = {
        "$vectorSearch": {
            "index": "vector_index",
            "path": "face_embedding",
            "queryVector": embedding,
            "numCandidates": 20,
            "limit": 20
        }
    }
    projection = {
        '$project': {
            '_id': 0, 
            'Name': 1,
            'Image': 1,
            'score': {'$meta': 'vectorSearchScore'}
        }
    }
    result = self.db[collection].aggregate([query, projection])
    return [i for i in result]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

API/route.py Show resolved Hide resolved
CHANGELOG.md Outdated
Comment on lines 48 to 65
- Resolved various bugs and issues identified during the testing process.

### Removed
- Removed deprecated code and unused dependencies from the project. No newline at end of file
- Removed deprecated code and unused dependencies from the project.

## [0.1.4] - 2024-03-16 - 23:00

### Added
- Implemented a new `recognize_Face` endpoint in [`route.py`](API/route.py). This endpoint accepts a base64 string as input, converts it into embeddings, and performs a vector search query on the MongoDB Atlas database. Changes made by @Devasy23.
- Added a new `vector_search` function in [`database.py`](API/database.py). This function performs a vector similarity search on the MongoDB Atlas database using Euclidean distance as the similarity measure. Changes made by @Devasy23.
- Updated [`index.ipynb`](index.ipynb) to include examples and usage of the new `recognize_Face` endpoint and `vector_search` function. Changes made by @Devasy23.

### Changed
- Updated the `Database` class in [`database.py`](API/database.py) to include the new `vector_search` function. Changes made by @Devasy23.

### Fixed
- Resolved various bugs and issues identified during the implementation and testing of the new features. Fixes made by @Devasy23.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on extending the project's functionality with the new recognize_Face endpoint and vector_search function. Here's a little feedback:

  1. Code Reusability and Clarity: For the vector_search function, consider defining a separate utility for Euclidean distance calculation if not done already. This promotes code reuse, especially if other parts of the system perform similar computations.
# Suggested improvement for database.py
def euclidean_distance(vector1, vector2):
    return np.linalg.norm(vector1-vector2)

def vector_search(query_vector):
    # Use euclidean_distance in the search algorithm.
  1. Documentation and Examples: Ensure the updated index.ipynb has clear, concise examples that not only illustrate how to use the new features but also highlight any potential edge cases or limitations.

  2. Consistent Formatting: The addition of a newline at the end of files is a good practice; make sure this is consistently applied across all edited files.

Remember to consider unit tests for the new functionalities if not already included, ensuring robustness and future maintainability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

API/route.py Outdated
Comment on lines 272 to 308
client.find_one_and_delete(collection, {"EmployeeCode": EmployeeCode})

return {"Message": "Successfully Deleted"}


@router.post("/recognize_face", response_class=Response)
async def recognize_face(Face: UploadFile = File(...)):
"""
Recognize a face from the provided image.

Args:
Face (UploadFile): The image file to be recognized.

Returns:
Response: A response object containing the recognized employee information in JSON format.

Raises:
HTTPException: If an internal server error occurs.
"""
logging.info("Recognizing Face")
try:
img_data = await Face.read()
with open("temp.png", "wb") as f:
f.write(img_data)

embedding = DeepFace.represent(img_path="temp.png", model_name="Facenet")
result = client2.vector_search(collection2, embedding[0]['embedding'])
logging.info(f"Result: {result}")
os.remove("temp.png")
except Exception as e:
logging.error(f"Error: {e}")
os.remove("temp.png")
raise HTTPException(status_code=500, detail="Internal server error")
return Response(
content=bytes(json.dumps(result[0], default=str), "utf-8"),
media_type="application/json",
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Temporary File Creation: Directly writing the uploaded image to a file (temp.png) can lead to concurrency issues and security concerns. Use a temporary file with a context manager to ensure it gets cleaned up properly, even in case of errors.
from tempfile import NamedTemporaryFile

async def recognize_face(Face: UploadFile = File(...)):
    logging.info("Recognizing Face")
    try:
        img_data = await Face.read()
        with NamedTemporaryFile(delete=True, suffix=".png") as temp_file:
            temp_file.write(img_data)
            temp_file.flush()
            embedding = DeepFace.represent(img_path=temp_file.name, model_name="Facenet")
            result = client2.vector_search(collection2, embedding[0]['embedding'])
    except Exception as e:
        logging.error(f"Error: {e}")
        raise HTTPException(status_code=500, detail="Internal server error")
  • Error Handling: Current error handling might catch too broad of a range of exceptions, potentially swallowing unexpected errors and making debugging difficult. Be specific about which errors you catch or ensure to re-raise unexpected ones.

  • File Reading Directly in Endpoint: It's a good practice to separate out logic into service layers or utility functions. This aids in keeping your endpoint functions clean and more maintainable.

  • Use Environment Variables for file paths or model names to make the application more flexible and secure.

  • DRY Principle: Consider whether the pattern of removing a file is repeated elsewhere in your code. If so, abstract the cleanup logic into a utility function.

Overall, ensure every aspect adheres to scalability, security, and maintainability principles.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes has been made 🎉

@codeshwar-preview codeshwar-preview bot force-pushed the Vector-search-feature branch 20 times, most recently from 99f5c9d to 3feddf0 Compare March 16, 2024 17:58
@codeshwar-preview codeshwar-preview bot force-pushed the Vector-search-feature branch 19 times, most recently from fa33e0b to 38e4d08 Compare March 16, 2024 18:09
@Devasy23 Devasy23 force-pushed the Vector-search-feature branch from a28eb2a to a86697a Compare March 17, 2024 14:39
Copy link

Quality Gate Passed Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Copy link

sonarqubecloud bot commented Jun 1, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

@Devasy23 Devasy23 merged commit a0af3de into main Jun 1, 2024
6 of 7 checks passed
@Devasy23 Devasy23 deleted the Vector-search-feature branch October 29, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Testing Perform Testing
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Utility Function for Vector Similarity Search Feature Request: New Endpoint for recognise_face()
2 participants