Using multiple models in NER #1334

linlinloo · 2024-01-22T07:41:19Z

I want to run the following code, but an error occurred.

import stanza
pipe = stanza.Pipeline("en", processors="tokenize,ner", package={"ner": ["ncbi_disease", "ontonotes"]})
doc = pipe("John Bauer works at Stanford and has hip arthritis. He works for Chris Manning")
print(doc.ents)

WARNING: Language en package default expects mwt, which has been added

I have downloaded ncbi_disease.pt and placed it in site-packages\stanza\stanza_resources\en\ner What's the problem？and why?

AngledLuffa · 2024-01-22T07:46:02Z

That's not an error, though. It should work just fine with that warning

…

On Sun, Jan 21, 2024, 11:41 PM linlinloo ***@***.***> wrote: I want to run the following code, but an error occurred. import stanza pipe = stanza.Pipeline("en", processors="tokenize,ner", package={"ner": ["ncbi_disease", "ontonotes"]}) doc = pipe("John Bauer works at Stanford and has hip arthritis. He works for Chris Manning") print(doc.ents) WARNING: Language en package default expects mwt, which has been added I have downloaded ncbi_disease.pt and placed it in site-packages\stanza\stanza_resources\en\ner What's the problem？and why? — Reply to this email directly, view it on GitHub <#1334>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWOCQ57BIGJKMEGETMLYPYJ2ZAVCNFSM6AAAAABCEXKUXCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TGMRYGIYDONY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

linlinloo · 2024-01-22T08:12:57Z

However, the operation did not yield any results, and a series of errors would appear: ConnectTimeout, MaxRetryError......
When I run other code, there is no ncbi_disease in ner. Is it the wrong package I have put?
Loading these models for language: en (English):

| Processor | Package |

| tokenize | combined |
| mwt | combined |
| pos | combined_charlm |
| lemma | combined_nocharlm |
| constituency | ptb3-revised_charlm |
| depparse | combined_charlm |
| sentiment | sstplus |
| ner | ontonotes-ww-multi_charlm |

AngledLuffa · 2024-01-22T08:26:46Z

If it's giving a timeout error, I would guess the most likely culprit is it's trying to download missing resources and isn't able to connect. You can add download_method=None to the Pipeline to stop it from downloading

…

On Mon, Jan 22, 2024 at 12:13 AM linlinloo ***@***.***> wrote: However, the operation did not yield any results, and a series of errors would appear: ConnectTimeout, MaxRetryError...... When I run other code, there is no ncbi_disease in ner. Is it the wrong package I have put? Loading these models for language: en (English): | Processor | Package | | tokenize | combined | | mwt | combined | | pos | combined_charlm | | lemma | combined_nocharlm | | constituency | ptb3-revised_charlm | | depparse | combined_charlm | | sentiment | sstplus | | ner | ontonotes-ww-multi_charlm | — Reply to this email directly, view it on GitHub <#1334 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2AYWJBCZJTBAHSZ67PITLYPYNRHAVCNFSM6AAAAABCEXKUXCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBTGQ2TSNBVHA> . You are receiving this because you commented.Message ID: ***@***.***>

AngledLuffa · 2024-01-22T08:29:06Z

Also, I should note that for version 1.7.0, the default NER model is now "ontonotes-ww-multi_charlm" there's also "ontonotes_charlm" They are named this way so that you can get "nocharlm" models if you want faster processing. If there's some stale documentation, please let me know and I'll update it.

linlinloo · 2024-01-22T09:56:15Z

I find ontonotes_charlm.pt, and I can download it, do you meant that I should replace ontonotes-ww-multi_charlm?
And sorry, how to add download_method=None. Like this? pipe = stanza.Pipeline("en", download_method=None )

AngledLuffa · 2024-01-22T14:29:56Z

I find ontonotes_charlm.pt, and I can download it, do you meant that I should replace ontonotes-ww-multi_charlm?

You can do whatever you like, of course. The ww-multi model was trained on both OntoNotes and the dataset described in this paper

And sorry, how to add download_method=None. Like this? pipe = stanza.Pipeline("en", download_method=None )

Yes, exactly. I suggest that because it's the most likely reason you're getting timeouts. If the problem is somewhere else, please include the complete stack trace.

linlinloo added the question label Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using multiple models in NER #1334

Using multiple models in NER #1334

linlinloo commented Jan 22, 2024

AngledLuffa commented Jan 22, 2024 via email

linlinloo commented Jan 22, 2024

AngledLuffa commented Jan 22, 2024 via email

AngledLuffa commented Jan 22, 2024 via email

linlinloo commented Jan 22, 2024

AngledLuffa commented Jan 22, 2024

Using multiple models in NER #1334

Using multiple models in NER #1334

Comments

linlinloo commented Jan 22, 2024

AngledLuffa commented Jan 22, 2024 via email

linlinloo commented Jan 22, 2024

However, the operation did not yield any results, and a series of errors would appear: ConnectTimeout, MaxRetryError...... When I run other code, there is no ncbi_disease in ner. Is it the wrong package I have put? Loading these models for language: en (English):

| Processor | Package |

| tokenize | combined | | mwt | combined | | pos | combined_charlm | | lemma | combined_nocharlm | | constituency | ptb3-revised_charlm | | depparse | combined_charlm | | sentiment | sstplus | | ner | ontonotes-ww-multi_charlm |

AngledLuffa commented Jan 22, 2024 via email

AngledLuffa commented Jan 22, 2024 via email

linlinloo commented Jan 22, 2024

AngledLuffa commented Jan 22, 2024

However, the operation did not yield any results, and a series of errors would appear: ConnectTimeout, MaxRetryError......
When I run other code, there is no ncbi_disease in ner. Is it the wrong package I have put?
Loading these models for language: en (English):

| tokenize | combined |
| mwt | combined |
| pos | combined_charlm |
| lemma | combined_nocharlm |
| constituency | ptb3-revised_charlm |
| depparse | combined_charlm |
| sentiment | sstplus |
| ner | ontonotes-ww-multi_charlm |