A branch of the English Resource Grammar (ERG) that is used for Singlish. It is published under the same license as the ERG, the MIT license.
The singlish subdirectory contains the files that have been added or in the
case of parse-nodes.tdl
, cloned from the trunk and changed.
Refer to ace/config-singlish.tdl
and singlish.tdl
to see which files are in use.
To compile this grammar using LKB most conveniently:
- Under Lkb Top's Advanced menu, select 'Evaluate Lisp expression' and type in
(push :singlish \*features\*)
- Load the ERG as usual, via the Load--Complete Grammar on the LKB Top menu:
trunk/lkb/script
To compile this grammar using ace:
- In the ERG directory
$ ace -G singlish.dat -g ace/config-singlish.tdl
To check the semantics:
$ echo "sentence" | ace -g [grammar].dat -Tfq
And to generate in another grammar:
echo "sentence" | ace -g [grammar1].dat -Tfq | ace -g [grammar2].dat -e
To generate from MRS:
cat [mrs] | ace -g [grammar2].dat -e
- to change the generation root, add
-r [root]
to the command - and add
--disable-subsumption-test
for easier generation
When merged with the ERG, there are a few places outside the
singlish
directory that refer to the files here:
../singlish.tdl
is in the trunk top level directory../lkb/script
has the feature:singlish
- To compile with lkb, open go to options > expand menu followed by advanced > Evaluate quick Lisp expression >
(push :singlish *features*)
- To compile with lkb, open go to options > expand menu followed by advanced > Evaluate quick Lisp expression >
../ace/config-singlish.tdl
contains the config for ace- there are testsuites at
../tsdb/skeletons/singlish
- there are gold trees at
../tsdb/gold/singlish
In the github repository these are all local (ignore the ../
).
Data was extracted from examples on Wiktionary pages with words that were marked to be Singlish. It includes also the other non-Singlish definitions and usages of the words. The example sentences include some that are offensive and racist but they not taken out as it reflects how this variety is used.
To parse the data using ace (parts in brackets are optional)
- To parse with only top tree:
cat wikiexamples_next300.txt | ace (--max-words=20) -g singlish.dat -Tf1(> output.txt)
- To remove lines starting with "#":
grep -vP "^#" wikiexamples_next300.txt | ace...
lexicon_goldtrees.tdl contains the words added to the standard English lexicon when parsing the 30 Singlish sentences from skeletons/treebankset.
A testsuite first had to be made. Go to the folder containing make_item:
$ ./make_item –map translat i-comment [rawtestsuitename] item
Transfer the item that was made (and renamed) into the skeletons directory. And make a testsuite in the trees folder
$ delphin mkprof -s tsdb/skeletons/[testsuitename]/ trees/[name of newfolder]
$ delphin process -g [grammar].dat trees/[name of newfolder]
Note that delphin has to be accessible.
The testsuite that was used for development is contained in data/constructed_singlish_testsuite
and the one that was used for testing (mentioned in the paper) can be found at data/skeletons/treebankset
. The parses (with decision trees) of the treebankset are contained in data/trees
.
To view selected combinations of results, use this line with different combination of 'i-wf' and 'readings' values. This line, for example, selects false negatives (sentences that should parse but give no readings)
$ delphin select ‘i-id readings i-input where i-wf = 1 and readings = 0’ trees/[name of newfolder]
$ art -f -a ‘ace - -disable-generalization -g [grammar].dat -O’ trees/[name of newfolder]
The next line of code launches the browser so the treebanking can be done.
$ fftb -g [grammar].dat - -browser - -webdir ~/bin/acetools-x86-0.9.30/assets/ tree/[name of newfolder]
To transfer gold trees for example from testsuite.16 to testsuite.17:
$ fftb -g [grammar].dat - -browser - -webdir ~/bin/acetools-x86-0.9.30/assets/ --gold tree/testsuite.16 trees/testsuite.17 - -auto