Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: index out of range in self #3134

Open
lennertvandevelde opened this issue Mar 7, 2023 · 3 comments
Open

[Bug]: index out of range in self #3134

lennertvandevelde opened this issue Mar 7, 2023 · 3 comments
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@lennertvandevelde
Copy link

lennertvandevelde commented Mar 7, 2023

Describe the bug

ner-dutch model throws index out of range in self error on some character combinations.

To Reproduce

Example 1

import flair
from flair.data import Sentence
from flair.models import SequenceTagger

tagger = SequenceTagger.load("flair/ner-dutch")

error_chars = []
for ichar in [chr(i) for i in range(ord('a'),ord('z')+1)]:
  for jchar in [chr(i) for i in range(ord('a'),ord('z')+1)]:
    try:
      tagger.predict(Sentence(jchar+ichar))

    except:
      error_chars.append(jchar+ichar)
print(error_chars)

OUTPUT:

['aa', 'ea', 'ia', 'oa', 'qa', 'ra', 'sa', 'ta', 'ua', 'xa', 'ya', 'cb', 'eb', 'fb', 'gb', 'hb', 'ib', 'jb', 'kb', 'lb', 'mb', 'nb', 'pb', 'qb', 'rb', 'sb', 'tb', 'ub', 'xb', 'yb', 'ec', 'fc', 'gc', 'hc', 'ic', 'jc', 'kc', 'lc', 'mc', 'nc', 'oc', 'qc', 'rc', 'tc', 'uc', 'xc', 'yc', 'ad', 'ed', 'fd', 'gd', 'hd', 'id', 'jd', 'kd', 'ld', 'md', 'nd', 'od', 'pd', 'qd', 'rd', 'sd', 'td', 'ud', 'xd', 'yd', 'ae', 'ee', 'fe', 'ie', 'ke', 'ne', 'oe', 'qe', 're', 'se', 'ue', 'xe', 'ye', 'cf', 'ef', 'ff', 'gf', 'hf', 'if', 'jf', 'kf', 'lf', 'mf', 'nf', 'pf', 'qf', 'rf', 'sf', 'tf', 'uf', 'xf', 'yf', 'ag', 'cg', 'eg', 'fg', 'gg', 'hg', 'ig', 'jg', 'kg', 'lg', 'mg', 'ng', 'og', 'pg', 'qg', 'rg', 'sg', 'tg', 'ug', 'xg', 'yg', 'ah', 'ch', 'eh', 'fh', 'gh', 'hh', 'ih', 'jh', 'kh', 'lh', 'mh', 'nh', 'oh', 'ph', 'qh', 'rh', 'sh', 'uh', 'xh', 'yh', 'ai', 'ei', 'ii', 'ji', 'ni', 'oi', 'qi', 'si', 'ti', 'xi', 'yi', 'aj', 'cj', 'ej', 'fj', 'gj', 'hj', 'ij', 'jj', 'kj', 'lj', 'mj', 'nj', 'oj', 'pj', 'qj', 'rj', 'sj', 'tj', 'uj', 'xj', 'yj', 'ak', 'ck', 'ek', 'fk', 'gk', 'hk', 'jk', 'kk', 'lk', 'mk', 'nk', 'ok', 'qk', 'rk', 'sk', 'tk', 'uk', 'xk', 'yk', 'el', 'hl', 'il', 'jl', 'll', 'ml', 'nl', 'ol', 'ql', 'rl', 'sl', 'tl', 'ul', 'xl', 'yl', 'am', 'em', 'fm', 'gm', 'hm', 'im', 'jm', 'lm', 'nm', 'pm', 'qm', 'rm', 'sm', 'tm', 'um', 'xm', 'ym', 'an', 'cn', 'fn', 'gn', 'hn', 'jn', 'kn', 'ln', 'mn', 'nn', 'pn', 'qn', 'rn', 'sn', 'tn', 'un', 'xn', 'yn', 'ao', 'eo', 'io', 'oo', 'qo', 'uo', 'xo', 'yo', 'ap', 'cp', 'ep', 'fp', 'gp', 'hp', 'ip', 'jp', 'kp', 'lp', 'mp', 'np', 'pp', 'qp', 'rp', 'tp', 'xp', 'yp', 'aq', 'cq', 'eq', 'fq', 'gq', 'hq', 'iq', 'jq', 'kq', 'lq', 'mq', 'nq', 'oq', 'pq', 'qq', 'rq', 'sq', 'tq', 'uq', 'xq', 'yq', 'ar', 'hr', 'ir', 'jr', 'lr', 'mr', 'nr', 'or', 'qr', 'rr', 'sr', 'tr', 'ur', 'xr', 'yr', 'as', 'cs', 'es', 'fs', 'gs', 'hs', 'js', 'ks', 'ls', 'ms', 'ns', 'os', 'ps', 'qs', 'rs', 'ss', 'ts', 'us', 'xs', 'ys', 'at', 'ct', 'et', 'ft', 'gt', 'ht', 'it', 'jt', 'kt', 'lt', 'mt', 'nt', 'ot', 'pt', 'qt', 'rt', 'tt', 'ut', 'xt', 'yt', 'cu', 'eu', 'iu', 'ou', 'tu', 'uu', 'xu', 'yu', 'av', 'ev', 'fv', 'gv', 'hv', 'iv', 'jv', 'kv', 'lv', 'mv', 'nv', 'ov', 'pv', 'qv', 'rv', 'sv', 'uv', 'xv', 'yv', 'aw', 'cw', 'ew', 'fw', 'gw', 'hw', 'iw', 'jw', 'lw', 'mw', 'nw', 'ow', 'pw', 'qw', 'rw', 'sw', 'tw', 'xw', 'yw', 'ax', 'cx', 'fx', 'gx', 'hx', 'ix', 'jx', 'kx', 'lx', 'mx', 'nx', 'ox', 'px', 'qx', 'rx', 'sx', 'tx', 'ux', 'xx', 'yx', 'ay', 'cy', 'ey', 'fy', 'gy', 'iy', 'jy', 'ky', 'ly', 'ny', 'oy', 'qy', 'ry', 'sy', 'ty', 'uy', 'xy', 'yy', 'az', 'cz', 'ez', 'fz', 'gz', 'hz', 'iz', 'jz', 'kz', 'lz', 'mz', 'nz', 'oz', 'pz', 'qz', 'rz', 'sz', 'tz', 'uz', 'xz', 'yz']

Example 2

tagger.predict(Sentence("eedaflegging"))

OUTPUT

IndexError: index out of range in self

Expected behaivor

Expected the pipeline to return entities.

Logs and Stack traces

No response

Screenshots

No response

Additional Context

Same Issue as: #2813

Environment

Versions:

Flair

0.12

Pytorch

1.13.1+cu116

Transformers

4.26.1

GPU

True

@lennertvandevelde lennertvandevelde added the bug Something isn't working label Mar 7, 2023
@helpmefindaname
Copy link
Collaborator

Hi @lennertvandevelde
looks like this happens, because the vocabulary of the bertje-embeddings, was updated after release see here.

@alanakbik I suppose we should just retrain the model.

In the meantime you can just use a hotfix by adding the embeddings for the tokens that wer added afterwards using the following code:

from flair.models import SequenceTagger
from flair.embeddings import TransformerWordEmbeddings
from torch import nn
import torch

tagger = SequenceTagger.load("flair/ner-dutch")
embeddings = TransformerWordEmbeddings('GroNLP/bert-base-dutch-cased')
new_embedding_tensor = torch.cat([tagger.embeddings.model.get_input_embeddings().weight, embeddings.model.get_input_embeddings().weight[tagger.embeddings.model.get_input_embeddings().num_embeddings:-1]])
new_input_embeddings = nn.Embedding.from_pretrained(new_embedding_tensor , freeze=False)
tagger.embeddings.model.set_input_embeddings(new_input_embeddings)
tagger.embeddings.base_model_name="GroNLP/bert-base-dutch-cased"
tagger.save("ner-dutch-fixed.pt")

@Nickyboo1194
Copy link

#nDD v nevzvvbqlekvl m s das
Z umm Vvelvet

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants