Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI] Automatic preprocessing of ePIs #34

Open
joofio opened this issue Apr 11, 2024 · 18 comments
Open

[AI] Automatic preprocessing of ePIs #34

joofio opened this issue Apr 11, 2024 · 18 comments
Assignees
Labels
MVP3 Issue to be delivered in MVP3

Comments

@joofio
Copy link
Contributor

joofio commented Apr 11, 2024

No description provided.

@joofio joofio converted this from a draft issue Apr 11, 2024
@joofio joofio added the MVP3 Issue to be delivered in MVP3 label Apr 11, 2024
@joofio
Copy link
Contributor Author

joofio commented Apr 11, 2024

  1. "dumb" preprocessor - done @aalonsolopez can you help here? link?
  2. "smarter" preprocessor - start to tackle this

@aalonsolopez
Copy link

So this is the first draft of the "dumb" automatic preprocessor. It's based on a Tree Search Algorithm to search for certain texts, which are terminologies, but this makes it way faster than its planned AI version. This has been deployed on the dev server for months, so you can use it now. You can see the available preprocessor here

@joofio
Copy link
Contributor Author

joofio commented Apr 15, 2024

so, if i am not mistaken, this should work?

### preprocessing dumb

POST https://gravitate-health.lst.tfo.upm.es/focusing/focus/bundlepackageleaflet-es-da0fc2395ce219262dfd4f0c9a9f72e1?preprocessors=preprocessing-service-mvp2&lenses=lens-selector-mvp2_pregnancy&patientIdentifier=alicia-1

returning

HTTP/1.1 503 Service Unavailable
content-length: 145
content-type: text/plain
date: Mon, 15 Apr 2024 09:05:00 GMT
server: istio-envoy
x-envoy-upstream-service-time: 7229
connection: close

upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error: 111

@aalonsolopez
Copy link

This is kinda weird, let me check

@aalonsolopez
Copy link

Sorry I'm looking this now

@aalonsolopez
Copy link

aalonsolopez commented Apr 17, 2024

@joofio Better for testing purposes use

POST https://gravitate-health.lst.tfo.upm.es/focusing/preprocessing/bundlepackageleaflet-es-da0fc2395ce219262dfd4f0c9a9f72e1?preprocessors=preprocessing-service-mvp2

@joofio
Copy link
Contributor Author

joofio commented Apr 18, 2024

this works for every raw epi right?
doesnt work with "Accept: application/json" correct?

@aalonsolopez
Copy link

its working now as expected with extensions included

@aalonsolopez
Copy link

aalonsolopez commented Apr 23, 2024

POST https://gravitate-health.lst.tfo.upm.es/focusing/preprocessing/bundlepackageleaflet-es-da0fc2395ce219262dfd4f0c9a9f72e1?preprocessors=preprocessing-service-mvp2

Same endpoint

@joofio
Copy link
Contributor Author

joofio commented May 2, 2024

so i have some questions still;

  1. tried with a raw epi (bundlepackageleafletxyntha) and returns gh-focusing-warnings: {"preprocessingWarnings":[{"serviceName":"preprocessing-service-mvp2","error":"Preprocessed version of ePI could not be handled by preprocessor."}],"lensesWarnings":[]}
  2. what categories is the preprocessor applying ? I see a lot of codes for pregancy but just that (the ones i could test)
  3. when the error occurs, tihe "Accept:" is taken into account? I remove it and still receive a json

@aalonsolopez
Copy link

Answers:

  1. I will check it to have an answer ASAP
  2. A (very) short version of some SNOMED codes (https://github.com/Gravitate-Health/terminology-service/blob/testing-simplified-terminologies/controllers/db/Simplification.csv)
  3. No, the step of looking at the Accept header is only visited if everything goes well.

@aalonsolopez
Copy link

PS: preprocessing with bundlepackageleafletxyntha works for me

@github-project-automation github-project-automation bot moved this from Todo to Done in MVP Issues May 6, 2024
@aalonsolopez aalonsolopez reopened this May 6, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in MVP Issues May 6, 2024
@aalonsolopez
Copy link

(sorry i closed this on error)

@joofio
Copy link
Contributor Author

joofio commented Jul 4, 2024

new list of requirements for this based on discussed today : 4/7/2024

  • use the base words / code already built-in preprocessor - but enlarge it and improve it (to be done by TS and me)
  • the codes are checked against terminology server for translations and synonims
  • possibly check other places and formats for other synonyms
  • check the resulting words, expressions and acronyms in text (regex, text distance, etc)
  • tag the whole sentence, paragraph and/or section

from this ,i can envision the following list of requirements in order of importance:

  1. check all words/code in the text
  2. produce a compliant preprocessed epi

notes: equal concepts can be stored inside the same codeableconcept. So everything related to pregnancy is attached to a single class name and 1 or more codes.

  1. check performance (use cache, or whatever is needed to get the preprocessor fast enough ~ <5s)
  2. check the terminology server for synonyms and translations
  3. use the codes stated in the csv on demand and live (or mostly live), when i update the csv, the preprocessor will take them into account
  4. use another method for synynomins and acronyms
  5. use a method for selecting if a sentence, paragraph or section should be highlihted.
  6. logs

@aalonsolopez @amedranogil
something i might have forgotten? havent tried the current preproc but will do asap and update this if needed.

@amedranogil
Copy link

we can turn this list into Issues in the preprocessor repo, so we can track the progress and discuss each point.

@joofio
Copy link
Contributor Author

joofio commented Jul 5, 2024

i wanted to test the current preproc before that. give me a day or so

@joofio
Copy link
Contributor Author

joofio commented Jul 9, 2024

so i tested with the current preprocessor. It lacks in terms that it founds (and some terms dont seem usefull (like Possible?) , and adds a lot for the same concept.
example:

            {
              "url": "elementClass",
              "valueString": "Pregnancy"
            },
            {
              "url": "concept",
              "valueCodeableReference": {
                "concept": {
                  "coding": [
                    {
                      "system": "http://snomed.info/sct",
                      "code": "11082009",
                      "display": "Pregnancy"
                    }
                  ]
                }
              }
            }
          ],
          [
            {
              "url": "elementClass",
              "valueString": "Pregnancy"
            },
            {
              "url": "concept",
              "valueCodeableReference": {
                "concept": {
                  "coding": [
                    {
                      "system": "http://snomed.info/sct",
                      "code": "416413003",
                      "display": "Pregnancy"
                    }
                  ]
                }
              }
            }
          ],

This not only creates a ton of different extensions for no reason and the display is not as in the Code System.
CodeableConcept in FHIR is 1..* which means it can store several codes for the same concept.

So, taking the example above, it should look like

            {
              "url": "elementClass",
              "valueString": "Pregnancy"
            },
            {
              "url": "concept",
              "valueCodeableReference": {
                "concept": {
                  "coding": [
                    {
                      "system": "http://snomed.info/sct",
                      "code": "11082009",
                      "display": "Pregnancy"
                    },
             {
                      "system": "http://snomed.info/sct",
                      "code": "416413003",
                      "display": "Pregnancy"
                    }
                  ]
                }
              }
            }
          ],         

and this assuming they are the same concept. For example in the case above, the code 11082009 is abnormal pregancy and not pregnancy (which is quite different ) https://www.findacode.com/snomed/11082009--abnormal-pregnancy.html
also, the 416413003 is Advanced maternal age gravida (which is better but still not ok...) https://www.findacode.com/snomed/416413003--advanced-maternal-age-gravida.html

So, it this, for starters, we need to correct the codes and the idea that the same concepts can be stored inside the same codeableconcept

Other minor stuff:

  • "code": "1,07605810001191E+016",
  • "coding": [
    {
    "system": "http://snomed.info/sct",
    "code": "60001007",
    "display": "Not"
    might not be very usefull on its own..

@joofio
Copy link
Contributor Author

joofio commented Jul 16, 2024

todos from my previous point:

  1. reduce the number of codes related to pregnancy (and others) to just 1 (2,3 max) and enlarge the number of words that it looks for (pregnancy, pregnancies, pregnant, etc) - the original description of the code may help with that.
    a) i would prefer to have a small number of codes, but tag a lot of usable things
  2. enlarge the code basis and make it core - i want to be able to change the data there and the preprocessor would change ASAP its behaviour (how quick depends on how long it will take to develop)
    a) in here i want to add several code system (like snomed, gsrs, icpc2, etc)
  3. have a terminology to enlarge codes, translation and relationship between codes.
    a) related to 1., i would like to have the code with a key concept (like pregnancy and 1 code - and then it would expand into another set of codes and text, words to look for ( feasible? performance wise is achieavable?)

helpfull @aalonsolopez ? let me know

@joofio joofio moved this from In Progress to Testing in MVP Issues Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MVP3 Issue to be delivered in MVP3
Projects
Status: Testing
Development

No branches or pull requests

3 participants