Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add currency tests #53

Merged
merged 8 commits into from
Nov 15, 2022
Merged

Add currency tests #53

merged 8 commits into from
Nov 15, 2022

Conversation

ivsanro1
Copy link
Contributor

Hello,

Since we might add some features for fuzzy search, especially when the currency is not found (see #28 (comment)), we think it'd be nice to have some evaluation script and dataset that we can use to compare the quality of the extractions (thanks for the idea @lopuhin !).

The purpose of the evaluation is to compare the currency extraction quality of Price.fromstring.
If the functionality of this function is extended via PR (e.g. we perform a fuzzy search if the currency is not found), we should be able to see an improvement in the metrics.

I decided not to add it as a test because the evaluation metrics can be interpreted in several ways.

For now, the metrics are simply global accuracy and accuracy per symbol (ground truth) for quality evaluation, and total extraction time and extraction time per sample for performance evaluation. These metrics can be easily extended, as the script is quite simple (evaluation/evaluate_currency_extraction.py).

If in the future we add other evaluation metrics or datasets, this script can be improved and generalized (e.g. adding argparse). For now I decided to keep it simple.

The dataset (dataset_eval.json) can also be easily extended to include more cases.

Here's a sample of the dataset_eval.json:

{
    "string": "14.00 SGD / Each",
    "currency": "SGD"
}

The evaluation script will extract the currency from string using Price.fromstring and compare it with currency. A correct extraction happens when both are equal.

With the current dataset and price_parser versions, this is the output of the evaluation:

./evaluation/evaluate_currency_extraction.py 
-----------------------------------
symbol (target)      acc      support   
-----------------------------------
           None      ‎1.0     13        
              $      ‎1.0     12        
             $U      ‎1.0     1         
             $b      ‎0.0     1         
           .د.ب      ‎0.0     1         
            AED      ‎1.0     1         
           AU $      ‎0.0     1         
            AU$      ‎1.0     1         
            AUD      ‎1.0     1         
             Ar      ‎0.0     1         
            B/.      ‎1.0     1         
             BD      ‎1.0     1         
            BZ$      ‎1.0     1         
             Br      ‎1.0     2         
             Bs      ‎1.0     1         
             C$      ‎1.0     2         
            CA$      ‎1.0     1         
           CAD$      ‎0.0     1         
           CAN$      ‎0.0     1         
            CFA      ‎1.0     1         
           CUC$      ‎1.0     1         
              D      ‎0.0     1         
            DKK      ‎1.0     1         
             DT      ‎1.0     1         
             Db      ‎1.0     1         
              E      ‎0.0     1         
            FBu      ‎1.0     1         
           FCFA      ‎1.0     1         
             FG      ‎1.0     1         
            FRw      ‎0.0     1         
            Fdj      ‎1.0     1         
             Ft      ‎1.0     1         
              G      ‎0.0     1         
            GEL      ‎1.0     1         
             Gs      ‎0.0     1         
             J$      ‎1.0     1         
              K      ‎0.0     1         
             KD      ‎1.0     1         
             KM      ‎1.0     1         
            KMF      ‎1.0     1         
            KSh      ‎0.0     1         
             Kr      ‎0.0     1         
             Kč      ‎1.0     1         
              L      ‎0.0     2         
             LD      ‎1.0     1         
            LEI      ‎1.0     1         
            Lek      ‎1.0     1         
             MK      ‎1.0     1         
           MOP$      ‎1.0     1         
             MT      ‎0.0     1         
            NT$      ‎1.0     1         
            Nfk      ‎1.0     1         
            Nu.      ‎1.0     1         
              P      ‎0.0     1         
              Q      ‎0.0     1         
              R      ‎0.0     1         
             R$      ‎1.0     2         
            RD$      ‎1.0     1         
             RF      ‎0.0     1         
             RM      ‎1.0     1         
             Rf      ‎1.0     1         
             Rp      ‎1.0     1         
             R₣      ‎0.0     1         
            S/.      ‎1.0     1         
            SGD      ‎1.0     1         
            SPL      ‎0.0     1         
             T$      ‎1.0     1         
             TL      ‎1.0     1         
            TSh      ‎1.0     1         
            TT$      ‎1.0     1         
             UM      ‎1.0     1         
            US$      ‎1.0     1         
            USD      ‎1.0     1         
            USh      ‎1.0     1         
             VT      ‎1.0     1         
             Z$      ‎1.0     1         
             ZK      ‎1.0     1         
          franc      ‎0.0     1         
         francs      ‎0.0     1         
             kn      ‎1.0     1         
             kr      ‎1.0     1         
            lei      ‎1.0     1         
           null      ‎0.0     1         
              q      ‎0.0     1         
            yen      ‎0.0     1         
             zl      ‎0.0     1         
             zł      ‎1.0     1         
              ¢      ‎0.0     1         
              £      ‎1.0     2         
              ƒ      ‎1.0     2         
             ЅM      ‎0.0     1         
            Дин      ‎0.0     1         
            ден      ‎0.0     1         
             лв      ‎1.0     1         
           руб.      ‎1.0     1         
              ֏      ‎1.0     1         
              ؋      ‎1.0     1         
           ج.س.      ‎0.0     1         
             د.      ‎0.0     1         
            د.إ      ‎0.0     2         
            د.ت      ‎0.0     1         
            د.ك      ‎0.0     1         
             دج      ‎0.0     1         
            ع.د      ‎0.0     1         
            ل.د      ‎0.0     1         
          نافكا      ‎0.0     1         
              ৳      ‎1.0     1         
              ლ      ‎0.0     1         
             ብር      ‎0.0     1         
            ናቕፋ      ‎0.0     1         
              ៛      ‎1.0     1         
              ₡      ‎1.0     1         
              ₣      ‎1.0     1         
              ₦      ‎1.0     1         
              ₨      ‎1.0     1         
              ₪      ‎1.0     1         
              ₫      ‎1.0     1         
              €      ‎1.0     5         
              ₭      ‎1.0     1         
              ₮      ‎1.0     1         
              ₱      ‎1.0     1         
              ₴      ‎1.0     1         
              ₹      ‎1.0     1         
              ₺      ‎1.0     1         
              ₼      ‎1.0     1         
              ₽      ‎1.0     1         
              ₿      ‎1.0     1         
              元      ‎0.0     1         
              ﷼      ‎1.0     1         

Global accuracy: 0.7117


Total processing time: 3.03 ms
Processing time per sample: 0.018604 ms

I'm open to suggestions

@kmike
Copy link
Member

kmike commented Oct 18, 2022

hey @ivsanro1! It seems in most cases the support is 1. What do you think about considering those as test cases instead? It seems these examples are very similar to the existing test cases we have (https://github.com/scrapinghub/price-parser/blob/master/tests/test_price_parsing.py), but with more variety for currency.

@kmike
Copy link
Member

kmike commented Oct 18, 2022

My main worry about these things is the workflow. If we go with an evaluation approach, we should figure out how contributors should be using it. Probably we should run it on CI. But it's less clear to me what to do with the results.

@ivsanro1
Copy link
Contributor Author

What do you think about considering those as test cases instead? It seems these examples are very similar to the existing test cases we have

Hey @kmike , thanks for the suggestion! I also thought about this, but how we'd handle the failing cases? Should we add them in a different test and mark is as xfail? I think it would be nice to easily add cases that we know we fail until they're fixed.

If we go with an evaluation approach, we should figure out how contributors should be using it

Good point. I also think putting it in CI would be the best option if we decide to go with the evaluation and not move it to tests.
My original idea was to use it merely as info, or just check that the global accuracy didn't go down with every contribution - but it might be too overkill.
However, I like more the tests idea, and maybe I'll just add the failing cases in a different test marked as xfail until they get fixed. What do you think?

@kmike
Copy link
Member

kmike commented Oct 18, 2022

However, I like more the tests idea, and maybe I'll just add the failing cases in a different test marked as xfail until they get fixed. What do you think?

This sounds good! That's exactly how the current testing suite is working. In the current testing suite each example is a separate test (via pytest.mark.parametrize); the ones which are failing are marked as xfail.

The existing testing suite was actually created in a similar way - a few hundred example input strings from a dataset were annotated and converted to tests. Then, there was a round of improving the parsing rules, and after that another set of examples was added, to control the overfitting (mostly to get an impression, that's not very scientific).

@kmike
Copy link
Member

kmike commented Oct 18, 2022

Unlike standard ML problems, I'm not too worried about overfitting here. I.e. it seems to be fine to fix xfail tests one-by-one, as in most cases it'd just mean adding more aliases for currencies. It still could be good to do a second round afterwards, e.g. to confirm that the accuracy improves after the first round of fixes.

@kmike
Copy link
Member

kmike commented Oct 18, 2022

By the way, in the current testing suite there is a lot of overlapping or similar examples; that's because many websites show prices in a similar way. So, it's more like a dataset, not like a hand-crafted testing suite, with a few exceptions. This allows to make trade-off decisions: it could be the case that a rule may fix some examples and break others; we can use the test count to help us decide which way to go in such situations.

@ivsanro1
Copy link
Contributor Author

That's exactly how the current testing suite is working. In the current testing suite each example is a separate test (via pytest.mark.parametrize); the ones which are failing are marked as xfail.

I see! Thanks for the explanation 👍 Then, what I'll do instead of adding this evaluation script is to add all the examples as tests, and those that fail add them in PRICE_PARSING_EXAMPLES_XFAIL until they're fixed. I like more this method since it's easier to document in the tests each of the cases.

Please feel free to close the PR if you consider it appropiate

@Gallaecio
Copy link
Member

You can also implement the new approach in this PR if you want, instead of creating a separate one.

tests/test_price_parsing.py Outdated Show resolved Hide resolved
@ivsanro1 ivsanro1 changed the title Add evaluation script and data for currency extraction [WIP] Add currencies and tests Nov 2, 2022
@ivsanro1 ivsanro1 changed the title [WIP] Add currencies and tests Add currencies and tests Nov 2, 2022
tests/test_price_parsing.py Outdated Show resolved Hide resolved
Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests, thanks @ivsanro1! 👍

@ivsanro1
Copy link
Contributor Author

ivsanro1 commented Nov 3, 2022

Those were the last tests I wanted to add for this PR. I was also thinking to use this PR to implement these cases and then move the tests from xfail to pass, but it might take some time. Should we merge it and add the cases to price-parser in another PR @kmike @Gallaecio ?

@Gallaecio
Copy link
Member

Should we merge it and add the cases to price-parser in another PR?

Sounds good to me.

@kmike
Copy link
Member

kmike commented Nov 4, 2022

Before merging, could you please ensure that all tests pass @ivsanro1? I think some of them fail currently.

…ss that were just failing because small local change
@ivsanro1
Copy link
Contributor Author

ivsanro1 commented Nov 4, 2022

Before merging, could you please ensure that all tests pass @ivsanro1?

Yes, of course! I made the changes in 7529b2e

These mypy errors were showing:

price_parser/_currencies.py:30: error: syntax error in type comment
price_parser/_currencies.py:1731: error: syntax error in type comment
price_parser/_currencies.py:1732: error: syntax error in type comment
price_parser/_currencies.py:1733: error: syntax error in type comment

I didn't see any particular problems with these lines. I saw we could update mypy from ver 0.761 to 0.982, and after updating it, it showed a couple of type errors in the tests I added (now fixed).

Also, I added the -v flag to mypy in tox.ini so it would report the errors (it wasn't showing them before)

Now everything is passing:

Success: no issues found in 5 source files
_________________________________________________________________________ summary _________________________________________________________________________
  py36: commands succeeded
  py37: commands succeeded
  py38: commands succeeded
  mypy: commands succeeded
  congratulations :)

@ivsanro1 ivsanro1 changed the title Add currencies and tests Add currency tests Nov 4, 2022
@Gallaecio
Copy link
Member

It may make sense to look into addressing #52 before merging here.

@kmike
Copy link
Member

kmike commented Nov 15, 2022

Thanks @ivsanro1!

@kmike kmike merged commit 61eaf52 into master Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants