Skip to content

Latest commit

 

History

History
31 lines (16 loc) · 2.14 KB

File metadata and controls

31 lines (16 loc) · 2.14 KB

34. How to define human-level performance

->

Suppose you are working on a medical imaging application that automatically makes diagnoses from x-ray images. A typical person with no previous medical background besides some basic training achieves 15% error on this task. A junior doctor achieves 10% error. An experienced doctor achieves 5% error. And a small team of doctors that discuss and debate each image achieves 2% error. Which one of these error rates defines “human-level performance”? ->

In this case, I would use 2% as the human-level performance proxy for our optimal error rate. You can also set 2% as the desired performance level because all three reasons from the previous chapter for comparing to human-level performance apply: ->

  • Ease of obtaining labeled data from human labelers.​ You can get a team of doctors to provide labels to you with a 2% error rate. ->

  • Error analysis can draw on human intuition. ​By discussing images with a team of doctors, you can draw on their intuitions. ->

  • Use human-level performance to estimate the optimal error rate and also set achievable “desired error rate.”​ It is reasonable to use 2% error as our estimate of the optimal error rate. The optimal error rate could be even lower than 2%, but it cannot be higher, since it is possible for a team of doctors to achieve 2% error. In contrast, it is not reasonable to use 5% or 10% as an estimate of the optimal error rate, since we know these estimates are necessarily too high. ->

When it comes to obtaining labeled data, you might not want to discuss every image with an entire team of doctors since their time is expensive. Perhaps you can have a single junior doctor label the vast majority of cases and bring only the harder cases to more experienced doctors or to the team of doctors. ->

If your system is currently at 40% error, then it doesn’t matter much whether you use a junior doctor (10% error) or an experienced doctor (5% error) to label your data and provide intuitions. But if your system is already at 10% error, then defining the human-level reference as 2% gives you better tools to keep improving your system. ->