Add currency tests #53

ivsanro1 · 2022-10-18T09:51:13Z

Hello,

Since we might add some features for fuzzy search, especially when the currency is not found (see #28 (comment)), we think it'd be nice to have some evaluation script and dataset that we can use to compare the quality of the extractions (thanks for the idea @lopuhin !).

The purpose of the evaluation is to compare the currency extraction quality of Price.fromstring.
If the functionality of this function is extended via PR (e.g. we perform a fuzzy search if the currency is not found), we should be able to see an improvement in the metrics.

I decided not to add it as a test because the evaluation metrics can be interpreted in several ways.

For now, the metrics are simply global accuracy and accuracy per symbol (ground truth) for quality evaluation, and total extraction time and extraction time per sample for performance evaluation. These metrics can be easily extended, as the script is quite simple (evaluation/evaluate_currency_extraction.py).

If in the future we add other evaluation metrics or datasets, this script can be improved and generalized (e.g. adding argparse). For now I decided to keep it simple.

The dataset (dataset_eval.json) can also be easily extended to include more cases.

Here's a sample of the dataset_eval.json:

{
    "string": "14.00 SGD / Each",
    "currency": "SGD"
}

The evaluation script will extract the currency from string using Price.fromstring and compare it with currency. A correct extraction happens when both are equal.

With the current dataset and price_parser versions, this is the output of the evaluation:

./evaluation/evaluate_currency_extraction.py 
-----------------------------------
symbol (target)      acc      support   
-----------------------------------
           None      ‎1.0     13        
              $      ‎1.0     12        
             $U      ‎1.0     1         
             $b      ‎0.0     1         
           .د.ب      ‎0.0     1         
            AED      ‎1.0     1         
           AU $      ‎0.0     1         
            AU$      ‎1.0     1         
            AUD      ‎1.0     1         
             Ar      ‎0.0     1         
            B/.      ‎1.0     1         
             BD      ‎1.0     1         
            BZ$      ‎1.0     1         
             Br      ‎1.0     2         
             Bs      ‎1.0     1         
             C$      ‎1.0     2         
            CA$      ‎1.0     1         
           CAD$      ‎0.0     1         
           CAN$      ‎0.0     1         
            CFA      ‎1.0     1         
           CUC$      ‎1.0     1         
              D      ‎0.0     1         
            DKK      ‎1.0     1         
             DT      ‎1.0     1         
             Db      ‎1.0     1         
              E      ‎0.0     1         
            FBu      ‎1.0     1         
           FCFA      ‎1.0     1         
             FG      ‎1.0     1         
            FRw      ‎0.0     1         
            Fdj      ‎1.0     1         
             Ft      ‎1.0     1         
              G      ‎0.0     1         
            GEL      ‎1.0     1         
             Gs      ‎0.0     1         
             J$      ‎1.0     1         
              K      ‎0.0     1         
             KD      ‎1.0     1         
             KM      ‎1.0     1         
            KMF      ‎1.0     1         
            KSh      ‎0.0     1         
             Kr      ‎0.0     1         
             Kč      ‎1.0     1         
              L      ‎0.0     2         
             LD      ‎1.0     1         
            LEI      ‎1.0     1         
            Lek      ‎1.0     1         
             MK      ‎1.0     1         
           MOP$      ‎1.0     1         
             MT      ‎0.0     1         
            NT$      ‎1.0     1         
            Nfk      ‎1.0     1         
            Nu.      ‎1.0     1         
              P      ‎0.0     1         
              Q      ‎0.0     1         
              R      ‎0.0     1         
             R$      ‎1.0     2         
            RD$      ‎1.0     1         
             RF      ‎0.0     1         
             RM      ‎1.0     1         
             Rf      ‎1.0     1         
             Rp      ‎1.0     1         
             R₣      ‎0.0     1         
            S/.      ‎1.0     1         
            SGD      ‎1.0     1         
            SPL      ‎0.0     1         
             T$      ‎1.0     1         
             TL      ‎1.0     1         
            TSh      ‎1.0     1         
            TT$      ‎1.0     1         
             UM      ‎1.0     1         
            US$      ‎1.0     1         
            USD      ‎1.0     1         
            USh      ‎1.0     1         
             VT      ‎1.0     1         
             Z$      ‎1.0     1         
             ZK      ‎1.0     1         
          franc      ‎0.0     1         
         francs      ‎0.0     1         
             kn      ‎1.0     1         
             kr      ‎1.0     1         
            lei      ‎1.0     1         
           null      ‎0.0     1         
              q      ‎0.0     1         
            yen      ‎0.0     1         
             zl      ‎0.0     1         
             zł      ‎1.0     1         
              ¢      ‎0.0     1         
              £      ‎1.0     2         
              ƒ      ‎1.0     2         
             ЅM      ‎0.0     1         
            Дин      ‎0.0     1         
            ден      ‎0.0     1         
             лв      ‎1.0     1         
           руб.      ‎1.0     1         
              ֏      ‎1.0     1         
              ؋      ‎1.0     1         
           ج.س.      ‎0.0     1         
             د.      ‎0.0     1         
            د.إ      ‎0.0     2         
            د.ت      ‎0.0     1         
            د.ك      ‎0.0     1         
             دج      ‎0.0     1         
            ع.د      ‎0.0     1         
            ل.د      ‎0.0     1         
          نافكا      ‎0.0     1         
              ৳      ‎1.0     1         
              ლ      ‎0.0     1         
             ብር      ‎0.0     1         
            ናቕፋ      ‎0.0     1         
              ៛      ‎1.0     1         
              ₡      ‎1.0     1         
              ₣      ‎1.0     1         
              ₦      ‎1.0     1         
              ₨      ‎1.0     1         
              ₪      ‎1.0     1         
              ₫      ‎1.0     1         
              €      ‎1.0     5         
              ₭      ‎1.0     1         
              ₮      ‎1.0     1         
              ₱      ‎1.0     1         
              ₴      ‎1.0     1         
              ₹      ‎1.0     1         
              ₺      ‎1.0     1         
              ₼      ‎1.0     1         
              ₽      ‎1.0     1         
              ₿      ‎1.0     1         
              元      ‎0.0     1         
              ﷼      ‎1.0     1         

Global accuracy: 0.7117


Total processing time: 3.03 ms
Processing time per sample: 0.018604 ms

I'm open to suggestions

kmike · 2022-10-18T10:48:32Z

hey @ivsanro1! It seems in most cases the support is 1. What do you think about considering those as test cases instead? It seems these examples are very similar to the existing test cases we have (https://github.com/scrapinghub/price-parser/blob/master/tests/test_price_parsing.py), but with more variety for currency.

kmike · 2022-10-18T10:49:49Z

My main worry about these things is the workflow. If we go with an evaluation approach, we should figure out how contributors should be using it. Probably we should run it on CI. But it's less clear to me what to do with the results.

ivsanro1 · 2022-10-18T15:02:21Z

What do you think about considering those as test cases instead? It seems these examples are very similar to the existing test cases we have

Hey @kmike , thanks for the suggestion! I also thought about this, but how we'd handle the failing cases? Should we add them in a different test and mark is as xfail? I think it would be nice to easily add cases that we know we fail until they're fixed.

If we go with an evaluation approach, we should figure out how contributors should be using it

Good point. I also think putting it in CI would be the best option if we decide to go with the evaluation and not move it to tests.
My original idea was to use it merely as info, or just check that the global accuracy didn't go down with every contribution - but it might be too overkill.
However, I like more the tests idea, and maybe I'll just add the failing cases in a different test marked as xfail until they get fixed. What do you think?

kmike · 2022-10-18T17:31:04Z

However, I like more the tests idea, and maybe I'll just add the failing cases in a different test marked as xfail until they get fixed. What do you think?

This sounds good! That's exactly how the current testing suite is working. In the current testing suite each example is a separate test (via pytest.mark.parametrize); the ones which are failing are marked as xfail.

The existing testing suite was actually created in a similar way - a few hundred example input strings from a dataset were annotated and converted to tests. Then, there was a round of improving the parsing rules, and after that another set of examples was added, to control the overfitting (mostly to get an impression, that's not very scientific).

kmike · 2022-10-18T17:36:42Z

Unlike standard ML problems, I'm not too worried about overfitting here. I.e. it seems to be fine to fix xfail tests one-by-one, as in most cases it'd just mean adding more aliases for currencies. It still could be good to do a second round afterwards, e.g. to confirm that the accuracy improves after the first round of fixes.

kmike · 2022-10-18T17:38:08Z

By the way, in the current testing suite there is a lot of overlapping or similar examples; that's because many websites show prices in a similar way. So, it's more like a dataset, not like a hand-crafted testing suite, with a few exceptions. This allows to make trade-off decisions: it could be the case that a rule may fix some examples and break others; we can use the test count to help us decide which way to go in such situations.

ivsanro1 · 2022-10-20T06:59:54Z

That's exactly how the current testing suite is working. In the current testing suite each example is a separate test (via pytest.mark.parametrize); the ones which are failing are marked as xfail.

I see! Thanks for the explanation 👍 Then, what I'll do instead of adding this evaluation script is to add all the examples as tests, and those that fail add them in PRICE_PARSING_EXAMPLES_XFAIL until they're fixed. I like more this method since it's easier to document in the tests each of the cases.

Please feel free to close the PR if you consider it appropiate

Gallaecio · 2022-10-20T07:08:43Z

You can also implement the new approach in this PR if you want, instead of creating a separate one.

tests/test_price_parsing.py

kmike

Nice tests, thanks @ivsanro1! 👍

…ng (#54)

ivsanro1 · 2022-11-03T14:54:30Z

Those were the last tests I wanted to add for this PR. I was also thinking to use this PR to implement these cases and then move the tests from xfail to pass, but it might take some time. Should we merge it and add the cases to price-parser in another PR @kmike @Gallaecio ?

Gallaecio · 2022-11-03T19:05:15Z

Should we merge it and add the cases to price-parser in another PR?

Sounds good to me.

kmike · 2022-11-04T08:20:49Z

Before merging, could you please ensure that all tests pass @ivsanro1? I think some of them fail currently.

…ss that were just failing because small local change

ivsanro1 · 2022-11-04T12:30:15Z

Before merging, could you please ensure that all tests pass @ivsanro1?

Yes, of course! I made the changes in 7529b2e

These mypy errors were showing:

price_parser/_currencies.py:30: error: syntax error in type comment
price_parser/_currencies.py:1731: error: syntax error in type comment
price_parser/_currencies.py:1732: error: syntax error in type comment
price_parser/_currencies.py:1733: error: syntax error in type comment

I didn't see any particular problems with these lines. I saw we could update mypy from ver 0.761 to 0.982, and after updating it, it showed a couple of type errors in the tests I added (now fixed).

Also, I added the -v flag to mypy in tox.ini so it would report the errors (it wasn't showing them before)

Now everything is passing:

Success: no issues found in 5 source files
_________________________________________________________________________ summary _________________________________________________________________________
  py36: commands succeeded
  py37: commands succeeded
  py38: commands succeeded
  mypy: commands succeeded
  congratulations :)

Gallaecio · 2022-11-04T15:47:21Z

It may make sense to look into addressing #52 before merging here.

kmike · 2022-11-15T17:54:36Z

Thanks @ivsanro1!

Ivan Sanchez added 2 commits October 18, 2022 09:31

Add currency evaluation dataset and script

dbae072

Ensure dataset_eval.json is found at runtime

d747b48

ivsanro1 requested review from Gallaecio and kmike October 18, 2022 10:15

Moved evaluation examples to tests

b28d496

kmike reviewed Nov 2, 2022

View reviewed changes

tests/test_price_parsing.py Outdated Show resolved Hide resolved

ivsanro1 changed the title ~~Add evaluation script and data for currency extraction~~ [WIP] Add currencies and tests Nov 2, 2022

ivsanro1 changed the title ~~[WIP] Add currencies and tests~~ Add currencies and tests Nov 2, 2022

Fixed comment

6b630c9

kmike reviewed Nov 2, 2022

View reviewed changes

tests/test_price_parsing.py Outdated Show resolved Hide resolved

kmike approved these changes Nov 2, 2022

View reviewed changes

kmike and others added 2 commits November 2, 2022 20:39

Fail on XPASS tests; make it easier to figure out which test is faili…

c42214b

…ng (#54)

Added more failing examples to tests (cases to be fixed/implemented)

d1a9b14

Fix copy error in added test

54da108

Gallaecio approved these changes Nov 3, 2022

View reviewed changes

Updated mypy ver. Tox passing. Moved couple of tests from xfail to pa…

7529b2e

…ss that were just failing because small local change

ivsanro1 changed the title ~~Add currencies and tests~~ Add currency tests Nov 4, 2022

ivsanro1 mentioned this pull request Nov 7, 2022

Migrate CI to github actions #55

Merged

kmike merged commit 61eaf52 into master Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add currency tests #53

Add currency tests #53

ivsanro1 commented Oct 18, 2022

kmike commented Oct 18, 2022 •

edited

Loading

kmike commented Oct 18, 2022 •

edited

Loading

ivsanro1 commented Oct 18, 2022

kmike commented Oct 18, 2022 •

edited

Loading

kmike commented Oct 18, 2022

kmike commented Oct 18, 2022 •

edited

Loading

ivsanro1 commented Oct 20, 2022

Gallaecio commented Oct 20, 2022

kmike left a comment

ivsanro1 commented Nov 3, 2022 •

edited

Loading

Gallaecio commented Nov 3, 2022

kmike commented Nov 4, 2022

ivsanro1 commented Nov 4, 2022

Gallaecio commented Nov 4, 2022

kmike commented Nov 15, 2022

Add currency tests #53

Add currency tests #53

Conversation

ivsanro1 commented Oct 18, 2022

kmike commented Oct 18, 2022 • edited Loading

kmike commented Oct 18, 2022 • edited Loading

ivsanro1 commented Oct 18, 2022

kmike commented Oct 18, 2022 • edited Loading

kmike commented Oct 18, 2022

kmike commented Oct 18, 2022 • edited Loading

ivsanro1 commented Oct 20, 2022

Gallaecio commented Oct 20, 2022

kmike left a comment

Choose a reason for hiding this comment

ivsanro1 commented Nov 3, 2022 • edited Loading

Gallaecio commented Nov 3, 2022

kmike commented Nov 4, 2022

ivsanro1 commented Nov 4, 2022

Gallaecio commented Nov 4, 2022

kmike commented Nov 15, 2022

kmike commented Oct 18, 2022 •

edited

Loading

kmike commented Oct 18, 2022 •

edited

Loading

kmike commented Oct 18, 2022 •

edited

Loading

kmike commented Oct 18, 2022 •

edited

Loading

ivsanro1 commented Nov 3, 2022 •

edited

Loading