Skip to content

for a Singlish branch of the English Resource Grammar

License

Notifications You must be signed in to change notification settings

siewyeng/SinglishERG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SinglishERG

A branch of the English Resource Grammar (ERG) that is used for Singlish. It is published under the same license as the ERG, the MIT license.

The singlish subdirectory contains the files that have been added or in the case of parse-nodes.tdl, cloned from the trunk and changed. Refer to ace/config-singlish.tdl and singlish.tdl to see which files are in use.

To compile this grammar using LKB most conveniently:

  • Under Lkb Top's Advanced menu, select 'Evaluate Lisp expression' and type in (push :singlish \*features\*)
  • Load the ERG as usual, via the Load--Complete Grammar on the LKB Top menu: trunk/lkb/script

To compile this grammar using ace:

  • In the ERG directory $ ace -G singlish.dat -g ace/config-singlish.tdl

To check the semantics:

$ echo "sentence" | ace -g [grammar].dat -Tfq

And to generate in another grammar:

echo "sentence" | ace -g [grammar1].dat -Tfq | ace -g [grammar2].dat -e

To generate from MRS:

cat [mrs] | ace -g [grammar2].dat -e
  • to change the generation root, add -r [root] to the command
  • and add --disable-subsumption-test for easier generation

When merged with the ERG, there are a few places outside the singlish directory that refer to the files here:

  • ../singlish.tdl is in the trunk top level directory
  • ../lkb/script has the feature :singlish
    • To compile with lkb, open go to options > expand menu followed by advanced > Evaluate quick Lisp expression > (push :singlish *features*)
  • ../ace/config-singlish.tdl contains the config for ace
  • there are testsuites at ../tsdb/skeletons/singlish
  • there are gold trees at ../tsdb/gold/singlish

In the github repository these are all local (ignore the ../).

Data

Data was extracted from examples on Wiktionary pages with words that were marked to be Singlish. It includes also the other non-Singlish definitions and usages of the words. The example sentences include some that are offensive and racist but they not taken out as it reflects how this variety is used.

To parse the data using ace (parts in brackets are optional)

  • To parse with only top tree: cat wikiexamples_next300.txt | ace (--max-words=20) -g singlish.dat -Tf1(> output.txt)
  • To remove lines starting with "#": grep -vP "^#" wikiexamples_next300.txt | ace...

lexicon_goldtrees.tdl contains the words added to the standard English lexicon when parsing the 30 Singlish sentences from skeletons/treebankset.

Testsuite

A testsuite first had to be made. Go to the folder containing make_item:

$ ./make_item –map translat i-comment [rawtestsuitename] item

Transfer the item that was made (and renamed) into the skeletons directory. And make a testsuite in the trees folder

$ delphin mkprof -s tsdb/skeletons/[testsuitename]/ trees/[name of newfolder]
$ delphin process -g [grammar].dat trees/[name of newfolder]

Note that delphin has to be accessible.

The testsuite that was used for development is contained in data/constructed_singlish_testsuite and the one that was used for testing (mentioned in the paper) can be found at data/skeletons/treebankset. The parses (with decision trees) of the treebankset are contained in data/trees.

Viewing results

To view selected combinations of results, use this line with different combination of 'i-wf' and 'readings' values. This line, for example, selects false negatives (sentences that should parse but give no readings)

$ delphin select ‘i-id readings i-input where i-wf = 1 and readings = 0’ trees/[name of newfolder]

Treebanking

$ art -f -a ‘ace - -disable-generalization -g [grammar].dat -O’ trees/[name of newfolder]

The next line of code launches the browser so the treebanking can be done.

$ fftb -g [grammar].dat - -browser - -webdir ~/bin/acetools-x86-0.9.30/assets/ tree/[name of newfolder]

To transfer gold trees for example from testsuite.16 to testsuite.17:

$ fftb -g [grammar].dat - -browser - -webdir ~/bin/acetools-x86-0.9.30/assets/ --gold tree/testsuite.16 trees/testsuite.17 - -auto

About

for a Singlish branch of the English Resource Grammar

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published