Feat: Add functionality to calculate intra-cluster distances and compare them between original and fine-tuned models #49

Devasy23 · 2024-07-29T17:27:44Z

This pull request adds functionality to calculate intra-cluster distances and compare them between the original and fine-tuned models. It includes code changes to
load the fine-tuned model, extract faces, calculate embeddings, and calculate intra-cluster distances. The results are outputted, including the shift in intra-cluster
distances, mean distance change, positive and negative impact, and average effects. This functionality will provide valuable insights into the performance of the
fine-tuned model compared to the original model.

Add vector_search function for pipeline aggregation
Added Vector Search
chore: Add error handling for no match found in face recognition
Refactor face recognition code to use the Facenet512 model for better accuracy
Code added for recognize_face API
temp commit to resolve merge
Updated gitignore

Co-authored-by: Devansh Shah [email protected] @devansh-shah-11

…are them between original and fine-tuned models

…o feat/Finetuned_Model_Performance

senior-dev-bot

Feedback from Senior Dev Bot

senior-dev-bot · 2024-07-29T17:28:07Z

Model-Training/eval-mark-I.py

+import os
+import numpy as np
+from keras.models import load_model
+from keras.preprocessing import image
+from sklearn.metrics.pairwise import euclidean_distances
+
+# Function to load and preprocess images
+def load_and_preprocess_image(img_path, target_size=(160, 160)):
+    img = image.load_img(img_path, target_size=target_size)
+    img_array = image.img_to_array(img)
+    img_array = np.expand_dims(img_array, axis=0)
+    img_array /= 255.0
+    return img_array
+
+# Function to generate embeddings
+def generate_embeddings(model, dataset_path):
+    embeddings = []
+    labels = []
+
+    for class_name in os.listdir(dataset_path):
+        class_path = os.path.join(dataset_path, class_name)
+        if not os.path.isdir(class_path):
+            continue
+
+        for img_name in os.listdir(class_path):
+            img_path = os.path.join(class_path, img_name)
+            img_array = load_and_preprocess_image(img_path)
+            embedding = model.predict(img_array)
+            embeddings.append(embedding[0])
+            labels.append(class_name)
+
+    embeddings = np.array(embeddings)
+    labels = np.array(labels)
+    return embeddings, labels
+
+# Function to calculate intra-cluster distances
+def calculate_intra_cluster_distances(embeddings, labels):
+    unique_labels = np.unique(labels)
+    distances = []
+
+    for label in unique_labels:
+        cluster_embeddings = embeddings[labels == label]
+        avg_embedding = np.mean(cluster_embeddings, axis=0)
+        max_distance = np.max(euclidean_distances(cluster_embeddings, [avg_embedding]))
+        distances.append(max_distance)
+
+    return np.array(distances)
+
+# Load the pre-trained FaceNet model (replace 'facenet_model.h5' with your model file)
+model_path = 'facenet_model.h5'
+model = load_model(model_path)
+
+# Path to the dataset
+dataset_path = 'path_to_your_dataset'
+
+# Generate embeddings for the original model
+embeddings_original, labels = generate_embeddings(model, dataset_path)
+
+# Load the fine-tuned model (replace 'facenet_model_finetuned.h5' with your fine-tuned model file)
+finetuned_model_path = 'facenet_model_finetuned.h5'
+finetuned_model = load_model(finetuned_model_path)
+
+# Generate embeddings for the fine-tuned model
+embeddings_finetuned, _ = generate_embeddings(finetuned_model, dataset_path)
+
+# Calculate intra-cluster distances for both models
+intra_distances_original = calculate_intra_cluster_distances(embeddings_original, labels)
+intra_distances_finetuned = calculate_intra_cluster_distances(embeddings_finetuned, labels)
+
+# Compare intra-cluster distances
+intra_distance_change = intra_distances_finetuned - intra_distances_original
+
+# Output the results
+print(f"Intra-Cluster Distance Change: {intra_distance_change}")
+print(f"Mean Distance Change: {np.mean(intra_distance_change)}")
+print(f"Positive Impact: {np.sum(intra_distance_change < 0)}")
+print(f"Negative Impact: {np.sum(intra_distance_change > 0)}")
+print(f"Average Impact: {np.sum(intra_distance_change == 0)}")


CODE REVIEW

Consider modularizing the code further by creating separate utility functions and improving variable naming for better readability.

# Update imports from os import listdir, path import numpy as np from keras.models import load_model from keras.preprocessing import image from sklearn.metrics.pairwise import euclidean_distances # Function to calculate intra-cluster distances def calculate_intra_cluster_distances(embeddings, labels): unique_labels = np.unique(labels) distances = [] for label in unique_labels: cluster_embeddings = embeddings[labels == label] avg_embedding = np.mean(cluster_embeddings, axis=0) max_distance = np.max(euclidean_distances(cluster_embeddings, [avg_embedding])) distances.append(max_distance) return np.array(distances)

This refactoring separates concerns and improves code maintainability.

senior-dev-bot · 2024-07-29T17:28:08Z

API/route.py

 from datetime import datetime
 from io import BytesIO
 from typing import List
-
+from tensorflow.keras.models import load_model
 from bson import ObjectId
 from deepface import DeepFace
 from dotenv import load_dotenv


CODE REVIEW

Import statements should be grouped in the following order:

Standard library imports

Related third party imports

Local application/library specific imports

from datetime import datetime from io import BytesIO from typing import List from bson import ObjectId from dotenv import load_dotenv from tensorflow.keras.models import load_model from deepface import DeepFace

senior-dev-bot · 2024-07-29T17:28:08Z

.gitignore

 FaceRec/static/Images/uploads/*
 Images/dbImages/*
 Images/Faces/*
+Images/


CODE REVIEW

The changes seem to involve moving directories and adding an entire directory. It would be beneficial to provide more context and explanation behind these changes to ensure they are necessary. Consider breaking up these changes into smaller, more meaningful commits for better clarity and version control.

Consider providing more descriptive commit messages for clarity

senior-dev-bot · 2024-07-29T17:28:08Z

API/route.py

    Department: str
    Images: list[str]

+def load_and_preprocess_image(img_path, target_size=(160, 160)):
+
+    img = image.load_img(img_path, target_size=target_size)
+    img_array = image.img_to_array(img)
+    img_array = np.expand_dims(img_array, axis=0)
+    img_array /= 255.0
+    return img_array
+
+def calculate_embeddings(image_filename):
+
+    """
+    Calculate embeddings for the provided image.
+
+    Args:
+        image_filename (str): The path to the image file.
+
+    Returns:
+        list: A list of embeddings for the image.
+    """
+
+    face_image_data = DeepFace.extract_faces(
+        image_filename, detector_backend='mtcnn', enforce_detection=False,
+    )
+    new_image_path = f'Images/Faces/tmp.jpg'
+
+    if face_image_data[0]['face'] is not None:
+        plt.imsave(new_image_path, face_image_data[0]['face'])
+
+        img_array = load_and_preprocess_image(new_image_path)
+        model=load_model('Model/embedding_trial3.h5')
+        embedding = model.predict(img_array)[0]
+        embedding_list = embedding.tolist()
+        logging.info(f'Embedding created')
+
+        return embedding_list
+
+@router.post('/recalculate_embeddings')
+async def recalculate_embeddings():
+    """
+    Recalculate embeddings for all the images in the database.
+
+    Returns:
+        dict: A dictionary with a success message.
+
+    Raises:
+        None
+    """
+    logging.info('Recalculating embeddings')
+    employees_mongo = client2.find(collection2)
+    for employee in employees_mongo:
+        print(employee, type(employee))
+        embeddings = []
+
+        # In the initial version, the images were stored in the 'Image' field
+        if 'Images' in employee:
+            images = employee['Images']
+        else:
+            images = employee['Image']
+
+        for encoded_image in images:
+            encoded_image = encoded_image.replace('data:image/png;base64,', '')
+            encoded_image = encoded_image.strip()
+            encoded_image += '=' * (-len(encoded_image) % 4)
+            img_recovered = base64.b64decode(encoded_image)
+            pil_image = Image.open(BytesIO(img_recovered))
+            image_filename = f'{employee["Name"]}.png'
+            pil_image.save(image_filename)
+            logging.debug(f'Image saved {employee["Name"]}')
+            embeddings.append(calculate_embeddings(image_filename))
+            # os.remove(image_filename)
+
+        logging.debug(f'About to update Embeddings: {embeddings}')
+        # Store the data in the database
+        client2.update_one(
+            collection2,
+            {'EmployeeCode': employee['EmployeeCode']},
+            {'$set': {'embeddings': embeddings, 'Images': images}},
+        )
+
+    return {'message': 'Embeddings Recalculated successfully'}
+

 # To create new entries of employee
 @router.post('/create_new_faceEntry')


CODE REVIEW

The code changes look good. One suggestion is to handle potential errors during image processing (e.g., invalid image formats). Consider adding input validation and error handling to improve the robustness of the code.

try: pil_image = Image.open(BytesIO(img_recovered)) except IOError as e: logging.error(f'Error in opening image: {str(e)}') continue # Skip to the next image

senior-dev-bot · 2024-07-29T17:28:08Z

API/route.py

        list[Employee]: A list of Employee objects containing employee information.
    """
    logging.info('Displaying all employees')
-    employees_mongo = client.find(collection)
+    employees_mongo = client2.find(collection2)
    logging.info(f'Employees found {employees_mongo}')
    employees = [
        Employee(


CODE REVIEW

Consider adding more descriptive variable names for clarity. Utilize type hints for better code readability.

employees_mongo = client.find(collection) # consider renaming to employees_mongo = client.find_employee_data(collection)

senior-dev-bot · 2024-07-29T17:28:08Z

API/route.py

    """
    logging.info('Deleting Employee')
    logging.debug(f'Deleting for EmployeeCode: {EmployeeCode}')
-    client.find_one_and_delete(collection, {'EmployeeCode': EmployeeCode})
+    client2.find_one_and_delete(collection2, {'EmployeeCode': EmployeeCode})

    return {'Message': 'Successfully Deleted'}



CODE REVIEW

Consider abstracting the database client logic to improve modularity and maintainability.

def delete_employee(client, collection, EmployeeCode): client.find_one_and_delete(collection, {'EmployeeCode': EmployeeCode})

senior-dev-bot · 2024-07-29T17:28:08Z

API/route.py

    """
    logging.debug(f'Updating for EmployeeCode: {EmployeeCode}')
    try:
-        user_id = client.find_one(
-            collection, {'EmployeeCode': EmployeeCode}, projection={'_id': True},
+        user_id = client2.find_one(
+            collection2, {'EmployeeCode': EmployeeCode}, projection={'_id': True},
        )
        print(user_id)
        if not user_id:


CODE REVIEW

It's good practice to use meaningful variable names. Consider renaming client to something like previous_client for clarity. Also, ensure consistency in variable naming (collection vs collection2). Lastly, consider handling potential exceptions when calling client2.find_one.

previous_client = client new_client = client2 user_id = new_client.find_one( new_collection, {'EmployeeCode': EmployeeCode}, projection={'_id': True}, )

senior-dev-bot · 2024-07-29T17:28:08Z

API/route.py

        '\r\n',
        '',
    ).replace('\n', '')
-    EmployeeCode = Employee.EmployeeCode.replace('\r\n', '').replace('\n', '')
+    EmployeeCode = Employee.EmployeeCode
    gender = Employee.gender.replace('\r\n', '').replace('\n', '')
    Department = Employee.Department.replace('\r\n', '').replace('\n', '')
    encoded_images = Employee.Images


CODE REVIEW

Consider simplifying and enhancing code readability by removing unnecessary .replace functions. Also, ensure code consistency by following a consistent naming convention.

EmployeeCode = Employee.EmployeeCode gender = Employee.gender Department = Employee.Department encoded_images = Employee.Images

senior-dev-bot · 2024-07-29T17:28:09Z

API/route.py

            image_filename = f'{Employee.Name}.png'
            pil_image.save(image_filename)
            logging.debug(f'Image saved {Employee.Name}')
-            face_image_data = DeepFace.extract_faces(
-                image_filename, detector_backend='mtcnn', enforce_detection=False,
-            )
-            embedding = DeepFace.represent(
-                image_filename, model_name='Facenet', detector_backend='mtcnn',
-            )
-            logging.debug(f'Embedding created {Employee.Name}')
-            embeddings.append(embedding)
-            os.remove(image_filename)
+
+            # embedding = DeepFace.represent(
+            #     image_filename, model_name='Facenet', detector_backend='mtcnn',
+            # )
+
+            embeddings.append(calculate_embeddings(image_filename))
+            # os.remove(image_filename)
+
        Employee_data['embeddings'] = embeddings

        try:
-            update_result = client.update_one(
-                collection,
+            update_result = client2.update_one(
+                collection2,
                {'_id': ObjectId(user_id['_id'])},
                update={'$set': Employee_data},
            )


CODE REVIEW

Consider removing commented-out code for clarity. Simplify by extracting the logic for computing embeddings into a separate function for better separation of concerns.

def calculate_embeddings(image_filename): embedding = DeepFace.represent( image_filename, model_name='Facenet', detector_backend='mtcnn', ) return embedding

senior-dev-bot · 2024-07-29T17:28:09Z

API/route.py

    """
    logging.info('Recognizing Face')
    try:
+        # Code to calculate embeddings via Original Facenet model
+
        img_data = await Face.read()
-        with open('temp.png', 'wb') as f:
+        image_filename = 'temp.png'
+        with open(image_filename, 'wb') as f:
            f.write(img_data)
-
-        embedding = DeepFace.represent(
-            img_path='temp.png', model_name='Facenet512', detector_backend='mtcnn',
+        # embedding = DeepFace.represent(
+        #     img_path='temp.png', model_name='Facenet512', detector_backend='mtcnn',
+        # )
+
+        # Code to calculate embeddings via Finetuned Facenet model
+        face_image_data = DeepFace.extract_faces(
+            image_filename, detector_backend='mtcnn', enforce_detection=False,
        )
-        result = client2.vector_search(collection2, embedding[0]['embedding'])
-        logging.info(f"Result: {result[0]['Name']}, {result[0]['score']}")
-        os.remove('temp.png')
-        if result[0]['score'] < 0.5:
-            return Response(
-                status_code=404, content=json.dumps({'message': 'No match found'}),
-            )
+
+        if face_image_data and face_image_data[0]['face'] is not None:
+
+            plt.imsave(f'Images/Faces/tmp.jpg', face_image_data[0]['face'])
+            face_image_path = f'Images/Faces/tmp.jpg'
+            img_array = load_and_preprocess_image(face_image_path)
+
+            model = load_model('Model/embedding_trial3.h5')
+            embedding_list = model.predict(img_array)[0]  # Get the first prediction
+            print(embedding_list, type(embedding_list))
+            embedding = embedding_list.tolist()
+            result = client2.vector_search(collection2, embedding)
+            logging.info(f"Result: {result[0]['Name']}, {result[0]['score']}")
+            os.remove('temp.png')
+            if result[0]['score'] < 0.5:
+                return Response(
+                    status_code=404, content=json.dumps({'message': 'No match found'}),
+                )
    except Exception as e:
        logging.error(f'Error: {e}')
        os.remove('temp.png')


CODE REVIEW

Consider refactoring to extract common functionality into separate functions for better organization and maintainability.

# Extract common functionality into separate functions async def calculate_face_embedding(img_data: bytes) -> List[float]: image_filename = 'temp.png' with open(image_filename, 'wb') as f: f.write(img_data) # Code to calculate embeddings via Original Facenet model embedding = calculate_original_facenet_embedding(image_filename) return embedding async def recognize_face(img_data: bytes): try: embedding = await calculate_face_embedding(img_data) # Code to calculate embeddings via Finetuned Facenet model if face_image_data and face_image_data[0]['face'] is not None: process_and_search_face(face_image_data) except Exception as e: handle_recognition_error(e)

API/route.py

+    logging.info('Recalculating embeddings')
+    employees_mongo = client2.find(collection2)
+    for employee in employees_mongo:
+        print(employee, type(employee))


API/route.py

+            pil_image = Image.open(BytesIO(img_recovered))
+            image_filename = f'{employee["Name"]}.png'
+            pil_image.save(image_filename)
+            logging.debug(f'Image saved {employee["Name"]}')


sonarqubecloud · 2024-07-30T19:54:48Z

Quality Gate failed

Failed conditions
37.8% Coverage on New Code (required ≥ 80%)
E Security Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Devasy23 and others added 15 commits July 14, 2024 18:22

feat: Add functionality to calculate intra-cluster distances and comp…

8631b6f

…are them between original and fine-tuned models

Created using Colab

b5c55c0

Updated code to load finetuned model

1e4ac9c

made fixing changes

bcffd5d

feat: Add functionality to calculate intra-cluster distances and comp…

20096e8

…are them between original and fine-tuned models

updated gitignore

1010427

Updated code to load finetuned model

51225f2

Now everything works

0b452e9

Created using Colab

856476e

Kaggle Notebook | face rec_trail on indian faces | Version 2

70f67b6

Added finetuned model

0971d16

Updated endpoints to calculate embeddings via finetuned model

bc89ae8

Added code to extract face, working Recognize_Face endpoint

8afe690

updated embedding function

19a57cd

Merge branch 'main' of https://github.com/devansh-shah-11/FaceRec int…

1b7c685

…o feat/Finetuned_Model_Performance

Devasy23 self-assigned this Jul 29, 2024

senior-dev-bot bot reviewed Jul 29, 2024

View reviewed changes

github-advanced-security bot found potential problems Jul 29, 2024

View reviewed changes

devansh-shah-11 added 3 commits July 31, 2024 00:34

updated code to recalculate embeddings

02bd51e

Updated vector search and recognize face endpoint

0a207e1

removed "mtcnn" from detector backend to check embedding accuracy

d3a2916

devansh-shah-11 merged commit 9f97d1a into main Aug 2, 2024
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add functionality to calculate intra-cluster distances and compare them between original and fine-tuned models #49

Feat: Add functionality to calculate intra-cluster distances and compare them between original and fine-tuned models #49

Devasy23 commented Jul 29, 2024

senior-dev-bot bot left a comment

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

senior-dev-bot bot Jul 29, 2024

sonarqubecloud bot commented Jul 30, 2024

Feat: Add functionality to calculate intra-cluster distances and compare them between original and fine-tuned models #49

Feat: Add functionality to calculate intra-cluster distances and compare them between original and fine-tuned models #49

Conversation

Devasy23 commented Jul 29, 2024

senior-dev-bot bot left a comment

Choose a reason for hiding this comment

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

senior-dev-bot bot Jul 29, 2024

Choose a reason for hiding this comment

CODE REVIEW

sonarqubecloud bot commented Jul 30, 2024

Quality Gate failed