New document class #20

corentin-larose · 2014-01-04T23:14:42Z

Work in progress, just to know if you like this lead...

camspiers · 2014-01-05T01:14:37Z

I like the idea, but I am still thinking about the implications.

corentin-larose · 2014-01-05T08:43:16Z

Just figured out that I put my explanations in the commit message, don't know if you saw them (but yes, a lot to think... It's just a lead):

For instance, this document can be used directly in classify() method replacing the commented code and thus the related properties/accessors:

    public function classify(Document $document)
    {
    $results = array();

    /*
        if ($this->documentNormalizer) {
            $document = $this->documentNormalizer->normalize($document);
        }

        $tokens = $this->tokenizer->tokenize($document);

        if ($this->tokenNormalizer) {
            $tokens = $this->tokenNormalizer->normalize($tokens);
        }

        $tokens = array_count_values($tokens);
    */

    $tokens = $document;
    [...]

My pros

Strong contract for Documents through interface
Document is in a frequency state ASAP
Document API is very wide open (cf Unit Tests)
Document can still be manipulated as an array/Iterable (shame, Symfony config component (DataStore) doesn't like ArrayObject)
Since document is an object, it is more RAM-efficient (no multiple copies as with an array)
Agnostic approach using SPL
One can even use closures/built-in functions for normalizers/tokenizers (faster?)
Hydrators/Extractors made simplier
Some more document-level calculations could be done in the instance
TokenCountByDocument no longer necessary

My cons

Not sure if it should be in tokens state rather than in frequency state (need for calculation/count? we could store these information either)
Loose contracts for normalizers/tokenizers since it uses callables instead of classes with interfaces (could still be enforced though, but we would loose the closures/built-in functions advantage)
Slower than arrays? (not sure, needs a bench since SPL is incredibly fast, and it removes a lot of logic/iterations around)
Static approach for accessors which is sometimes hated by developpers (Unit Tests...)
Your cons?

For instance, this document can be used directly in classify() method replacing the commented code and thus the related properties/accessors: ```php public function classify(Document $document) { $results = array(); /* if ($this->documentNormalizer) { $document = $this->documentNormalizer->normalize($document); } $tokens = $this->tokenizer->tokenize($document); if ($this->tokenNormalizer) { $tokens = $this->tokenNormalizer->normalize($tokens); } $tokens = array_count_values($tokens); */ $tokens = $document; [...] ``` My pros - Strong contract for Documents through interface - Document is in a frequency state ASAP - Document API is very wide open (cf Unit Tests) - Document can still be manipulated as an array/Iterable (shame, Symfony config component (DataStore) doesn't like ArrayObject) - Since document is an object, it is more RAM-efficient (no multiple copies as with an array) - Agnostic approach using SPL - One can even use closures/built-in functions for normalizers/tokenizers (faster?) - Hydrators/Extractors made simplier - Some more document-level calculations could be done in the instance - TokenCountByDocument no longer necessary My cons - Not sure if it should be in tokens state rather than in frequency state (need for calculation/count? we could store these information either) - Loose contracts for normalizers/tokenizers since it uses callables instead of classes with interfaces (could still be enforced though, but we would loose the closures/built-in functions advantage) - Slower than arrays? (not sure, needs a bench since SPL is incredibly fast, and it removes a lot of logic/iterations around) - Static approach for accessors which is sometimes hated by developpers (Unit Tests...) - Your cons?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New document class #20

New document class #20

corentin-larose commented Jan 4, 2014

camspiers commented Jan 5, 2014

corentin-larose commented Jan 5, 2014

New document class #20

Are you sure you want to change the base?

New document class #20

Conversation

corentin-larose commented Jan 4, 2014

camspiers commented Jan 5, 2014

corentin-larose commented Jan 5, 2014