Skip to content
This repository has been archived by the owner on Mar 31, 2020. It is now read-only.

Problem with fuzzy comparison where the Abstract is "Not Available" #54

Open
hollandjg opened this issue Feb 3, 2015 · 1 comment
Open

Comments

@hollandjg
Copy link

When importing these two papers:
http://adsabs.harvard.edu//abs/1916AnP...354..769E
http://adsabs.harvard.edu/abs/1917SPAW.......142E
the second replaces the first.

The abstract in each case is "Not Available" but the titles are very different.

@RuiPereira
Copy link
Contributor

Actually the titles are not very different, the standard cutoff of 0.7 needs to be raised to 0.76 to be able to separate both:

In [9]: difflib.get_close_matches(u'Die Grundlage der allgemeinen Relativitätstheorie', [u'Kosmologische Betrachtungen zur allgemeinen Relativitätstheorie'], n=1, cutoff=.7)
Out[9]: [u'Kosmologische Betrachtungen zur allgemeinen Relativit\xe4tstheorie']

In [10]: difflib.get_close_matches(u'Die Grundlage der allgemeinen Relativitätstheorie', [u'Kosmologische Betrachtungen zur allgemeinen Relativitätstheorie'], n=1, cutoff=.76)
Out[10]: []

As mentionned in the code, the cutoff parameter could be moved to the preferences to prevent this type of corner cases, but for a quick fix you could just change line 311 of your adsbibdesk.py copy.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants