Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cmp: forleddet forsvinner i sent-proc.sh (Bugzilla Bug 2683) #56

Open
albbas opened this issue Sep 17, 2020 · 5 comments
Open

Cmp: forleddet forsvinner i sent-proc.sh (Bugzilla Bug 2683) #56

albbas opened this issue Sep 17, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@albbas
Copy link
Contributor

albbas commented Sep 17, 2020

This issue was created automatically with bugzilla2github

Bugzilla Bug 2683

Date: 2020-09-17T20:02:45+02:00
From: Lene Antonsen <<lene.antonsen>>
To: Chiara Argese <<chiara.argese>>
CC: lene.antonsen, sjur.n.moshagen, trond.trosterud

Last updated: 2020-09-18T11:01:20+02:00

@albbas
Copy link
Contributor Author

albbas commented Sep 17, 2020

Comment 14006

Date: 2020-09-17 20:02:45 +0200
From: Lene Antonsen <<lene.antonsen>>

Cmp: forleddet forsvinner i sent-proc.sh

echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g tools/tokenisers/mwe-dis.cg3 | vislcg3 -g src/cg3/disambiguator.cg3 -t
"<skuvlakássa>"
"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> ADD:2171:sme
"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> ADD:2171:sme
"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
; "kássa" N G3 Sem/Ctain_Furn Sg Acc <W:0.0> ADD:2171:sme REMOVE:13055:KillAcc
; "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
; "kássa" N G3 Sem/Ctain_Furn Sg Gen Allegro <W:0.0> ADD:2171:sme REMOVE:2511:allegro
; "skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
:\n

svhum-hsl-m0283:lang-sme lan000$ echo skuvlakássa|smedist
using hfst-tokenize
... pos disambiguating ...
"<skuvlakássa>"
"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0> ADD:2171:sme
"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> ADD:2171:sme
; "kássa" N G3 Sem/Ctain_Furn Sg Acc <W:0.0> ADD:2171:sme REMOVE:13055:KillAcc
; "kássa" N G3 Sem/Ctain_Furn Sg Gen Allegro <W:0.0> ADD:2171:sme REMOVE:2511:allegro
:\n

echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |vislcg3 -g tools/tokenisers/mwe-dis.cg3 | vislcg3 -g src/cg3/disambiguator.cg3
"<skuvlakássa>"
"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0>
"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0>
"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
:\n

svhum-hsl-m0283:lang-sme lan000$ echo skuvlakássa|smedis
using hfst-tokenize
... pos disambiguating ...
"<skuvlakássa>"
"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0>
"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0>
:\n

@albbas
Copy link
Contributor Author

albbas commented Sep 18, 2020

Comment 14007

Date: 2020-09-18 09:48:24 +0200
From: Chiara Argese <<chiara.argese>>

Ser ut for meg at sent-proc.sh bruker forskjellige pipeline i forhold til input parameter.
Hvis æ kjører den som det står i 'usage':

sh sent-proc.sh -t -l sme 'skuvlakassa'

da får jeg:

using hfst-tokenize
... pos tagging ...
""
"kássa" N Err/Orth-a-á Sem/Dummytag Sg Acc <W:0.0>

"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen <W:0.0>

"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen Allegro <W:0.0>

"kássa" N Err/Orth-a-á Sem/Dummytag Sg Nom <W:0.0>

:\n

og hvis jeg skriver ut cmd, får jeg:

echo skuvlakassa | /usr/local/bin/hfst-tokenize --giella-cg --weight-classes=1 /Users/car010/all-gut/giellalt/lang-sme/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |cut -f1,2

Og hvis jeg kjører kommando, da får jeg samme resultat:
echo skuvlakassa | /usr/local/bin/hfst-tokenize --giella-cg --weight-classes=1 /Users/car010/all-gut/giellalt/lang-sme/tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst |cut -f1,2
""
"kássa" N Err/Orth-a-á Sem/Dummytag Sg Acc <W:0.0>

"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen <W:0.0>

"kássa" N Err/Orth-a-á Sem/Dummytag Sg Gen Allegro <W:0.0>

"kássa" N Err/Orth-a-á Sem/Dummytag Sg Nom <W:0.0>

:\n

Dvs at det er ikke i skriptet som forleddet forsvinner men i hfst-tokenize.

@albbas
Copy link
Contributor Author

albbas commented Sep 18, 2020

Comment 14008

Date: 2020-09-18 10:18:55 +0200
From: Lene Antonsen <<lene.antonsen>>

hvorfor cut -f1,2 ?
Er cut -f1,2 med i skriptet?

echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g src/cg3/disambiguator.cg3 | cut -f1,2
"<skuvlakássa>"
"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0>

"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0> <sme>

echo skuvlakássa |hfst-tokenize --giella-cg --weight-classes=1 tools/tokenisers/tokeniser-disamb-gt-desc.pmhfst | vislcg3 -g src/cg3/disambiguator.cg3
"<skuvlakássa>"
"kássa" N G3 Sem/Ctain_Furn Sg Gen <W:0.0>
"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>
"kássa" N G3 Sem/Ctain_Furn Sg Nom <W:0.0>
"skuvla" N Sem/Edu_Org Cmp/SgNom Cmp <W:0.0>

En annen ting at skriptet skal også ha denne delen: |vislcg3 -g tools/tokenisers/mwe-dis.cg3 (selv om det ikke gjør forskjell i dette tilfellet)

@albbas
Copy link
Contributor Author

albbas commented Sep 18, 2020

Comment 14009

Date: 2020-09-18 10:38:22 +0200
From: Chiara Argese <<chiara.argese>>

Æ la ikke merke til cut -f1. Ja det er med i skriptet. Jeg fiksa det.

@albbas
Copy link
Contributor Author

albbas commented Sep 18, 2020

Comment 14010

Date: 2020-09-18 11:01:20 +0200
From: Chiara Argese <<chiara.argese>>

Lagt til også mwe-dis

@albbas albbas transferred this issue from giellalt/bugzilla-dummy Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant