Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity Reconciliation #113

Open
byndcivilization opened this issue Jun 8, 2016 · 3 comments
Open

Entity Reconciliation #113

byndcivilization opened this issue Jun 8, 2016 · 3 comments

Comments

@byndcivilization
Copy link
Contributor

byndcivilization commented Jun 8, 2016

We need a deduplication process to identify possible matches in incoming data.

Solution #1: simple fuzzy matching algorithm that attempts to match on one or a few fields. (name and some additional info)

Solution #2: ML assisted entity reconciliation process. This would use ML methodology to derive a matching score to identify possible duplicates. There would then need to be a UI to either merge the matched entities (show matches and partial matches, i.e exact match of first word, for both project names and company names. Allow user to confirm all or some of the matches for example) or to at least display possible links on an entity page.

Moved from #6

@mattfullerton
Copy link
Contributor

Leaving this open for now, but once we are done with the initial deduplification and reconciliation processes and have some experience with them we can decide whether we need to make improvements (ML, etc.) Assigning @davidmihalyi to be arbitrator of whether the process is working well.

@mattfullerton
Copy link
Contributor

See #147 for a nice use case of how entities need to/should be merged

@davidmihalyi
Copy link

We have a good first workflow for this. Further improvement will require more thinking in a next phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants