You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think we need a better representation of fused tokens in Treex. Now it is just sketched using the wild attributes but it will probably be needed in future, as it is part of the UD guidelines. So we need a less wild solution. Once we have it, we could try to implement directly in Treex the heuristics that will collapse fused words whenever desirable. And once we have this, we should probably use it before exporting data for Kontext. Because the surface matters here.
The text was updated successfully, but these errors were encountered:
I agree we need a better (less wild) API for fused (aka multi-word) tokens in Treex.
I am not sure how it will solve the problem in KonText, which probably can display either only tokens or only words. There are scripts distributed with UD (e.g. conllu-w2t.py) for converting the CoNLL-U word-indexed format to other formats.
From ufal/lindat-corpora-conversions#3 (comment) :
I think we need a better representation of fused tokens in Treex. Now it is just sketched using the wild attributes but it will probably be needed in future, as it is part of the UD guidelines. So we need a less wild solution. Once we have it, we could try to implement directly in Treex the heuristics that will collapse fused words whenever desirable. And once we have this, we should probably use it before exporting data for Kontext. Because the surface matters here.
The text was updated successfully, but these errors were encountered: