-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cv2 3833 compute and save sscd embeddings for images #47
Open
ahmednasserswe
wants to merge
28
commits into
master
Choose a base branch
from
CV2-3833-Compute-and-save-SSCD-embeddings-for-images
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
28 commits
Select commit
Hold shift + click to select a range
cae6e6e
adding `self,` to `def compute_pdq(self, iobytes: io.BytesIO) -> str:`
ahmednasserswe 2dad0b6
create `image_sscd.py` and corresponding changes in Dockerfile
ahmednasserswe 3c3fc9f
removing `--extra-index-url https://download.pytorch.org/whl/cu113` f…
ahmednasserswe 8224cfb
installing sscd requirements directly in Dockerfile
ahmednasserswe b22b803
changing `pytorch-lightning` version to `1.5.10`
ahmednasserswe 1797637
trying to build with sscd installation commented in dockerfile
ahmednasserswe 295e0b8
Changing `Model.compute_pdq(io.BytesIO(image_content))` to `result = …
ahmednasserswe 40e5bcf
uncommenting lines to install sscd-copy-detection in Dockerfile
ahmednasserswe bcc77e3
installing sscd requirements directly in dockerfile instead from requ…
ahmednasserswe c4bb746
Comment git clone line - Update Dockerfile
computermacgyver 1350858
Move packages to requirements.txt, remove possibly unneeded ones
4049828
add back model download
fef66c4
Remove possibly unused requirements
11fd99c
Merge branch 'master' into CV2-3833-Compute-and-save-SSCD-embeddings-…
5372d57
Large refactor
4713a35
Adding missing files
1e9227b
Move SSCD model download to __init__
d702e40
fix typo
3449286
Revert comments to lib/queue/worker.py
690d606
Revert comments to lib/queue/worker.py
2a4e158
update import in test
a799f21
changing `from lib.model.image import Model` to `from lib.model.image…
ahmednasserswe 998d05f
adding `test_image_sscd.py` and `img/presto_flowchart.jpg`
ahmednasserswe eb5f6f5
Merge branch 'CV2-3833-Compute-and-save-SSCD-embeddings-for-images' o…
ahmednasserswe d3dc05f
Use numpy.allclose to accomodate OS/chipset differences
cd191d8
fix test
5697bee
Update image_sscd.py with comments describing of normalization and re…
ahmednasserswe c433a0e
drop file
DGaffney File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
*.cpython-39.pyc | ||
*.pyc | ||
sscd_disc_mixup.torchscript.pt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
from lib.model.model import Model | ||
|
||
from lib import schemas | ||
import urllib.request | ||
import io | ||
|
||
class GenericImageModel(Model): | ||
|
||
def get_iobytes_for_image(self, image: schemas.Message) -> io.BytesIO: | ||
""" | ||
Read file as bytes after requesting based on URL. | ||
""" | ||
return io.BytesIO( | ||
urllib.request.urlopen( | ||
urllib.request.Request( | ||
image.body.url, | ||
headers={'User-Agent': 'Mozilla/5.0'} | ||
) | ||
).read() | ||
) | ||
|
||
def process(self, image: schemas.Message) -> schemas.GenericItem: | ||
""" | ||
Generic function for returning the actual response. | ||
""" | ||
|
||
return self.compute_imagehash(self.get_iobytes_for_image(image)) |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from typing import Dict | ||
import io | ||
|
||
from lib.model.generic_image import GenericImageModel | ||
|
||
from pdqhashing.hasher.pdq_hasher import PDQHasher | ||
from lib import schemas | ||
|
||
class Model(GenericImageModel): | ||
def compute_pdq(self, iobytes: io.BytesIO) -> str: | ||
"""Compute perceptual hash using ImageHash library | ||
:param im: Numpy.ndarray | ||
:returns: Imagehash.ImageHash | ||
""" | ||
pdq_hasher = PDQHasher() | ||
hash_and_qual = pdq_hasher.fromBufferedImage(iobytes) | ||
return hash_and_qual.getHash().dumpBitsFlat() | ||
|
||
def compute_imagehash(self, iobytes: io.BytesIO) -> str: | ||
return self.compute_pdq(iobytes) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
from typing import Dict | ||
import io | ||
|
||
from lib.model.generic_image import GenericImageModel | ||
from lib import schemas | ||
from torchvision import transforms | ||
import torch | ||
from lib.logger import logger | ||
import numpy as np | ||
from PIL import Image | ||
import urllib.request | ||
|
||
class Model(GenericImageModel): | ||
def __init__(self): | ||
super().__init__() | ||
#FIXME: Load from a Meedan S3 bucket | ||
try: | ||
self.model = torch.jit.load("sscd_disc_mixup.torchscript.pt") | ||
except: | ||
logger.info("Downloading SSCD model...") | ||
m=urllib.request.urlopen("https://dl.fbaipublicfiles.com/sscd-copy-detection/sscd_disc_mixup.torchscript.pt").read() | ||
with open("sscd_disc_mixup.torchscript.pt","wb") as fh: | ||
fh.write(m) | ||
self.model = torch.jit.load("sscd_disc_mixup.torchscript.pt") | ||
logger.info("SSCD model loaded") | ||
|
||
def compute_sscd(self, iobytes: io.BytesIO) -> str: | ||
"""Compute perceptual hash using ImageHash library | ||
:param im: Numpy.ndarray #FIXME | ||
:returns: Imagehash.ImageHash #FIXME | ||
""" | ||
# from SSCD-copy-detection readme https://github.com/facebookresearch/sscd-copy-detection/tree/main#preprocessing | ||
# Normalization using the mean and std of Imagenet | ||
normalize = transforms.Normalize( | ||
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], | ||
) | ||
# It is recommended by publishers of SSCD-copy-detection to preprocess images for inference either resizing the small edge to 288 or resizing the image to a square tensor. | ||
# resizing the image to a square tensor is more effecient on gpus but can lead to skewed images and so loss of information. So, we are resizing the small edge to 288 | ||
small_288 = transforms.Compose([ | ||
ahmednasserswe marked this conversation as resolved.
Show resolved
Hide resolved
|
||
transforms.Resize(288), | ||
transforms.ToTensor(), | ||
normalize, | ||
]) | ||
# Keeping the code example of resizing the image to a square tensor | ||
# skew_320 = transforms.Compose([ | ||
# transforms.Resize([320, 320]), | ||
# transforms.ToTensor(), | ||
# normalize, | ||
# ]) | ||
|
||
image = Image.open(iobytes) | ||
batch = small_288(image).unsqueeze(0) | ||
embedding = self.model(batch)[0, :] | ||
return np.asarray(embedding.detach().numpy()).tolist() | ||
|
||
def compute_imagehash(self, iobytes: io.BytesIO) -> str: | ||
return self.compute_sscd(iobytes) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still to do