In the following capstone project for the Machine Learning Zoomcamp from Alexey Grigorev , I decided to do an image classification task of different kinds of fruits using the fruits-360 dataset, which provides 131! types of differents fruits, with some that may look pretty similar to the human eye, the dataset also presents the fruit images without any background , thus making it somewhat easier to train.
The problem: Fruits are often hard to classify manually given the wide variety of types, it can be a really burdensome challenge due to the mass in which fruits are produced, creating an automated classifier could help in the future, so that this task can be done instantly.
The dataset I choose for my project can be found here you can download it using the kaggle website, to reply the notebooks you should place it inside data folder
. Inside this dataset you can find tons of images containing different types of fruits, with the following structure
.
├── data
│ ├── fruits-360
│ │ ├── papers # Paper of the dataset
│ │ ├── preview # folder I created to store the data augmentation previews
│ │ ├── Test # Test folder with 22688 images of 131 fruits
│ │ ├── test-multiple_fruits # Contains images with multiple fruits. This is a good test for real-world detection.
│ │ └── Training # Training folder with 67692 images of 131 fruits
│ └── fruits-360-original-size
Filename format: image_index_100.jpg (e.g. 32_100.jpg) or r_image_index_100.jpg (e.g. r_32_100.jpg) or r2_image_index_100.jpg or r3_image_index_100.jpg. "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).
NOTE : Don't use the data inside fruits-360-original-size
. This a new version that is not yet completed by the author.
Project Description:
Description | Link |
---|---|
Notebook | Explanatory notebook with EDA and Training of XCeption Model |
Second Notebook | Explanatory notebook with Traning and tunning of VGA16 Model |
Third Notebook | Notebook TF to TFLite Conversion |
DockerFile | dockerfile |
XCeption Deployed Model | Tflite Xception |
Lambda Function | Lambda function |
test.py | To test a fruit prediction |
Project Structure:
ML_Zoomcamp-Capstone-Project/ # Main Folder
├── data # directory where the data should be placed
│ ├── fruits-360 # data directory
│ │ ├── papers
│ │ ├── preview
│ │ ├── Test
│ │ ├── test-multiple_fruits
│ │ └── Training
│ └── fruits-360-original-size # alternative dataset still in works (do not use)
├── img # images for readme
│ ├── deployment # images of deployment
│ ├── notebooks # images from the notebook
│ └── test_own_fruits #url images of fruits that can be tested
├── models # models tflite and .h5
└── __pycache__
In my project I decided to use two CNN architectures: VGG16 and XCeption, both are considered quite good for image classification tasks, and should make the task of classifying tons of fruit images duable.
VGG16:
XCeption:
Model | Train | Validation | Test |
---|---|---|---|
XCeption Vanila | 99.60% | 94.48 % | 90.45% |
Tuned XCeption | 99.07% | 96.41% | 92.94% |
VGG16 Vanila | 99.60% | 96.50% | 91.99% |
VGG16 Tuned | 99.46 | 97.33% | 95.53% |
Model chosen for deployment: XCeption In this case I choose XCeption because, even though VGG16 was obtaining a higher accuracy, it failed to generalize to images outside the main dataset, this should be further optimized.
The model was deployed using TFLite, You can see the output for the image I prepared like this:
Python version: Python 3.8
🐍
Versions/requirements used inside the virtual environment:
keras-image-helper
tflite-aws-lambda
Before running this dockerbuild, please verify you got docker daemon running.
$ sudo systemctl start docker
OR:
$ sudo /etc/init.d/docker start
For arch based systems:
$ systemctl start docker.service
To build the docker image I prepared from this project, move inside the main directory, and run the following command :
$ docker build -t fruits-model .
Now run the docker build mapping the port 8080 to your host computer.
docker run -it --rm -p 8080:8080 fruits-model
You should see:
Inside another terminal session, run the following command inside the main folder of the project:
python test.py
You should see(This really long input):
This is the output for the image I prepared, we can see that is indeed a 🍌
This image for testing I downloaded from Google ( it's not from the dataset), you can try your own images too, just change the data = {'url': 'https://i.imgur.com/Wj4Lajm.png'}
Inside test.py
for the image url of your choice.
The repository was created with the following command:
aws ecr create-repository --repository-name fruits-tflite-images
Output:
Pushing the docker image to the cloud:
Testing the aws lambda function:
In this fruits project, I used AWS Lambda to deploy my docker container to the cloud , I followed the steps described in zoomcamp week 9 from Alexey Grigorev . To run this easily , you should simply uncomment the following lines from test.py
#url = 'http://localhost:8080/2015-03-31/functions/function/invocations'
url= "https://w832b3ab81.execute-api.us-east-1.amazonaws.com/Test/predict"
Now, simply run
python test.py
Inside the main Folder, you should see the following output:
[1] https://github.com/alexeygrigorev/mlbookcamp-code "Alexey Grigorev"
[2] Chollet, F. (2021). Deep learning with Python. Simon and Schuster.