Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image processors #196

Merged
merged 18 commits into from
Jan 16, 2017
Merged

Image processors #196

merged 18 commits into from
Jan 16, 2017

Conversation

jlibovicky
Copy link
Contributor

This PR contains:

  • image reader that reads a list of image files and ensures they have the same size (which replaces old image_utils.py)
  • simplifies the code of the CNN encoder, such that there only the convolutions (it requires more work anyway)
  • adds another test case with text recognition with CNN -> RNN deocder architure
  • the test case includes sample data with (40 images)

TBS
guises
poppadoms
SAVAGENESS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAVAGENESS!!!! :-D

@@ -172,50 +155,8 @@ def __init__(self, data_id, convolutions, rnn_layers,
last_layer, keep_prob=self.dropout_placeholder)

last_layer = last_layer * last_padding_masks
last_layer_size = last_n_channels * image_height * image_width
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Co dělal tenhle kód? (Ne že by to bylo důležitý..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Byla za tim cnn ještě optional bidi rnn vrstva a tohle číslo se používalo k reshapování. Tu RNN jsem zahodil, protože mi to přišlo poněkud monster.



# pylint: disable=invalid-name
EditDistance = _EditDistance()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To asi neni moc chytrý, když budeš chtít změnit jméno (jeden z mála důvodů, proč to je class :-) ) tak budeš muset do konfiguráku dávat podtržítkovou třídu..
Můžeš třídu přejmenovat na EditDistanceEvaluator nebo tak nějak a helper tam nechat.


self.encoded = encoder_state

self.encoded = tf.reduce_mean(last_layer, [2, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tady by pomoh koment řikající jaký má last_layer dimenze

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ten komentář pomoh hlavně mě, protože to průměruju přes blbý dimenze samozřejmě :-D

@@ -17,3 +17,14 @@ def untruecase(
yield [sentence[0].capitalize()] + sentence[1:]
else:
yield []


def pipeline(processors: List[Callable]) -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takže v datasetu dáš jako preprocess <něco> a v [něco] dáš class=pipeline a jako processors tomu dáš seznam preprocess objektů?

Nebylo by lepší to nějak vyřešit přímo v tom datasetu? Jakože teď je to nějaká n-tice nebo co, tak přidat možnost do tý entice dát místo jednoho procesoru seznam?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To je dobrej nápad, ale možná bych tam nechal oba způsoby. V tom INI file nejde zalamovat řády a napsat pak do jednoho seznamu víc vícenásobných preprocesorů, tak by se to mohlo dost znepřehlednit.

from typing import Callable, List, Optional
import os
import numpy as np
from scipy import ndimage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

máme scipy v requirements? máme to tam přidat? máme to jen zmínit v dokumentaci? (podobně jako bychom to měli udělat u nltk bleu evaluátora, až/jestli se přimerǧne z dekoderiho refaktoru)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scipy je docela velká věc a na překlad není potřeba. Tak bych napsal do dokumentace, že kdo chce dělat image processing, musí mít scipy.

NLTK bych tam asi nedával, protože NLTK je GPL a pak by NeMo taky musela být GPL.

if len(image.shape) == 2:
channels = 1
image = np.expand_dims(image, 2)
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

čistě z bezpečnostních důvodů: elif len(image.shape) == 3 nebo >2 a do else pak narvat nějakou bad file format error exception

pad_w: Optional[int]=None,
pad_h: Optional[int]=None,
rescale: bool=False,
mode: str='RGB') -> Callable:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring, nebo ho nechat za dalšího panáka Dušanovi :-)



def _rescale(image, pad_w, pad_h):
orig_w, orig_h = image.shape[:2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tyhle tři funkce jsou ukázkoví kandidáti na unit testy.. :-) jestli víceméně souhlasíš s #197, tak založ test issue

# we average out by the image size -> shape is number
# channels from the last convolution
self.encoded = tf.reduce_mean(last_layer, [1, 2])
# TODO assert shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:D
Nechceš tam ten assert přidat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Začal jsem si v jiný větvi dělat chytrý assertkovátko shapů, protože jsem koukal, že se to dělá často a furt se opakuje ten samej kód. A často jsou taky shapy v komentářích. To všechno přepíšu na asserty (když to jde pustit, je to nejlepší možná dokumentace).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@jlibovicky jlibovicky force-pushed the image_processors branch 2 times, most recently from d74d84e to 282f135 Compare December 16, 2016 17:14
@jindrahelcl
Copy link
Member

Travis má blbej setup:
raise ImportError("Could not import the Python Imaging Library (PIL)

@jindrahelcl jindrahelcl reopened this Dec 20, 2016
Copy link
Member

@tomasmcz tomasmcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion scipy should either be a dependency, or this code should be clearly separated from the rest of the package.

@jindrahelcl
Copy link
Member

As I mentioned in #214 - can we move image processing to the TF graph?

@jlibovicky
Copy link
Contributor Author

Yes, we can.

@jlibovicky jlibovicky mentioned this pull request Jan 12, 2017
@jlibovicky jlibovicky self-assigned this Jan 12, 2017
@jlibovicky
Copy link
Contributor Author

I investigated the Tensor flow image ops and don't think we want to use them. It is not capable of loading a batch of images, there must be one operation per image (i.e., we would need to know the batch size in advance). Having a reader like this allows easier cropping and reshaping the images to the same size if we don't know the image shape in advance.

I suggest merging the PR as is. @tomasmcz, @jindrahelcl, what do you say?

@jlibovicky jlibovicky merged commit 0b1d4c5 into master Jan 16, 2017
@jlibovicky jlibovicky deleted the image_processors branch January 16, 2017 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants