You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a second time, we can work on improving that example and maybe using a different dataset. Indeed, at the moment
- it downloads a very large dataset
- the join with weather data does not improve predictions significantly IIRC
- the predictions are not super far from chance level which makes the example less compelling IMO
Suggest a potential alternative/fix
We could try to either:
Find another dataset on which fuzzy joining would boost performance
Change the task to joining, without focusing on learning. A common use case for people working with databases is to create "proxy keys" from multiple columns. This technique is helpful to join tables coming from different storing systems or databases, where there isn't a foreign key to join on. Without necessarily having a downstream learning task, we can quantify the fuzzy join with e.g. recall and precision, by defining "false positive" as incorrect joins and "false negative" as missed joins.
The text was updated successfully, but these errors were encountered:
Describe the issue linked to the documentation
As @jeromedockes mentioned in #1145, example 07 has several flaws:
Suggest a potential alternative/fix
We could try to either:
The text was updated successfully, but these errors were encountered: