You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I performed a simple check to test inventor names.
0 = i detect a middle name conflict
1 = a name is matched against a name without a middle name
2 = the names contain a middle name and the middle initial matches
For example:
The nstr=0 includes the following (first 10 entries):
(inv_id, #patents, unique names clumped together)
03858572-2|31|JOHN F DYE,JOHN DYE,JOHN D DYE
03858760-1|45|ANTONIN GONCALVES,ANTONIN L GONCALVES,ANTONIN C GONCALVES
03858787-3|19|ROGER M FLOYD,ROGER N FLOYD
03859063-2|8|STEVEN I TAUB,STEVEN L TAUB
03859092-1|42|HENRY J GYSLING,HENRY L GYSLING,HENRY JAMES GYSLING
03859097-1|4|FREDRICK L HAMB,FREDERICK L HAMB,FREDERICK T HAMB,FREDERICK D HAMB
03859113-2|18|WILLIAM C STUMPHAUZER,WILLIAM S STUMPHAUZER
03859119-1|316|JAMES C FLETCHER,JAMES ADMINISTR FLETCHER,JAMES CORVIN FLETCHER,J CLINT FLETCHER,J CLINTON FLETCHER
03859298-1|72|JOHN H SELLSTEDT,JOHN H SELLSTED,JOHN M SELLSTEDT
03859356-1|109|WILLIAM J HOULIHAN,WILLIAM H HOULIHAN
As you can see here, the first record --
John F Dye gets clumped with John D Dye which is clearly incorrect
Same idea for the remainder. The James Fletcher one is particularly concerning (316 patents) and looks to be at least 3+ individuals mashed together. (#03859298-1)
While this is a relatively small % of all inventors identified -- the avgpats for these individuals is extremely high compared to the others. I've run into these individuals when creating networks and they create some strange networks! That said, visually observing the data also presents some interesting blocking mechanisms for further disambiguation which I would love to share. I think the more we show these results visually via APIs, some data issues may become obvious.
The text was updated successfully, but these errors were encountered:
I performed a simple check to test inventor names.
0 = i detect a middle name conflict
1 = a name is matched against a name without a middle name
2 = the names contain a middle name and the middle initial matches
nstr invs patents avgpats = patents/invs
0 8,644 176,307 20.39
1 1,611,032 6,279,199 3.89
2 1,480,296 3,955,163 2.67
For example:
The nstr=0 includes the following (first 10 entries):
(inv_id, #patents, unique names clumped together)
03858572-2|31|JOHN F DYE,JOHN DYE,JOHN D DYE
03858760-1|45|ANTONIN GONCALVES,ANTONIN L GONCALVES,ANTONIN C GONCALVES
03858787-3|19|ROGER M FLOYD,ROGER N FLOYD
03859063-2|8|STEVEN I TAUB,STEVEN L TAUB
03859092-1|42|HENRY J GYSLING,HENRY L GYSLING,HENRY JAMES GYSLING
03859097-1|4|FREDRICK L HAMB,FREDERICK L HAMB,FREDERICK T HAMB,FREDERICK D HAMB
03859113-2|18|WILLIAM C STUMPHAUZER,WILLIAM S STUMPHAUZER
03859119-1|316|JAMES C FLETCHER,JAMES ADMINISTR FLETCHER,JAMES CORVIN FLETCHER,J CLINT FLETCHER,J CLINTON FLETCHER
03859298-1|72|JOHN H SELLSTEDT,JOHN H SELLSTED,JOHN M SELLSTEDT
03859356-1|109|WILLIAM J HOULIHAN,WILLIAM H HOULIHAN
As you can see here, the first record --
While this is a relatively small % of all inventors identified -- the avgpats for these individuals is extremely high compared to the others. I've run into these individuals when creating networks and they create some strange networks! That said, visually observing the data also presents some interesting blocking mechanisms for further disambiguation which I would love to share. I think the more we show these results visually via APIs, some data issues may become obvious.
The text was updated successfully, but these errors were encountered: