Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenize and noun-phrase extraction #119

Merged
merged 46 commits into from
Nov 12, 2018
Merged

Tokenize and noun-phrase extraction #119

merged 46 commits into from
Nov 12, 2018

Conversation

amatsuo
Copy link
Collaborator

@amatsuo amatsuo commented Jun 8, 2018

This PR includes the implementation of two new functions discussed in #109 and #117 :

  • spacy_tokenize for tokenizing documents either to list or to data.frame
  • spacy_extract_nounphrase for noun-phrase extraction

@kbenoit
We need followings before merging

  • Tidy up documentation
  • Implement tests for these two functions
  • Increment the version number

@amatsuo amatsuo requested a review from kbenoit June 8, 2018 10:33
@codecov-io
Copy link

codecov-io commented Jun 8, 2018

Codecov Report

Merging #119 into master will decrease coverage by 1%.
The diff coverage is 35.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #119      +/-   ##
==========================================
- Coverage    42.2%   41.19%   -1.01%     
==========================================
  Files           9       11       +2     
  Lines         699      852     +153     
==========================================
+ Hits          295      351      +56     
- Misses        404      501      +97
Impacted Files Coverage Δ
R/parse-extractor-functions.R 65.21% <0%> (-23.02%) ⬇️
R/spacy_extract_nounphrases.R 0% <0%> (ø)
R/spacy_parse.R 76.92% <18.75%> (-15.02%) ⬇️
R/spacy_tokenize.R 77.27% <77.27%> (ø)
R/spacy_initialize.R 44.44% <0%> (+1.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 186eb12...586a55f. Read the comment docs.

kbenoit added 5 commits July 16, 2018 17:53
- Rename `type` to `output`
- Some linting
Many of these are not working - but that's part of test-driven development.
@kbenoit
Copy link
Collaborator

kbenoit commented Oct 7, 2018

We have some interesting functionality issues with remove_separators, with respect to repeated separators and tabs and newlines in particular. The quanteda::tokens() behaviour seems better to me.

> txt <- "One space  two spaces one\ttab\t\ttwo one\nnewline\n\ntwo."
> spacy_tokenize(txt, remove_separators = FALSE)
$t
 [1] "One"     " "       "space"   " "       " "       "two"     " "       "spaces"  " "      
[10] "one"     "\t"      "tab"     "\t\t"    "two"     " "       "one"     "\n"      "newline"
[19] "\n\n"    "two"     "."      

> spacy_tokenize(txt, remove_separators = TRUE)
$t
 [1] "One"     "space"   " "       "two"     "spaces"  "one"     "\t"      "tab"     "\t\t"   
[10] "two"     "one"     "\n"      "newline" "\n\n"    "two"     "."      

> quanteda::tokens(txt, remove_separators = FALSE)
tokens from 1 document.
text1 :
 [1] "One"     " "       "space"   " "       " "       "two"     " "       "spaces"  " "      
[10] "one"     "\t"      "tab"     "\t"      "\t"      "two"     " "       "one"     "\n"     
[19] "newline" "\n"      "\n"      "two"     "."      

> quanteda::tokens(txt, remove_separators = TRUE)
tokens from 1 document.
text1 :
 [1] "One"     "space"   "two"     "spaces"  "one"     "tab"     "two"     "one"     "newline"
[10] "two"     "."  

Copy link
Collaborator

@kbenoit kbenoit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the tests I added. They are breaking but we should work on the code until they pass. If we decide the tests are inappropriate, we should discuss that before changing them.

Other changes: I fixed a bug in the padding condition in the Python code, and renamed the separator argument.

What are the merits of adding arguments to match the quanteda behaviour of remove_hyphens, remove_twitter, and remove_symbols? (The last we could easily do on the final R side.)

remove_twitter in particular behaves very differently for the spacyr version:

> spacy_tokenize("I am @kenbenoit on Twitter #quanteda.")
$t
[1] "I"          "am"         "@kenbenoit" "on"         "Twitter"    "#"         
[7] "quanteda"   "."         

> spacy_tokenize("I am @kenbenoit on Twitter #quanteda.", remove_punct = TRUE)
$t
[1] "I"          "am"         "@kenbenoit" "on"         "Twitter"    "quanteda" 

remove_hyphens comparison:

> txt <- "Jacob Rees-Mogg is a floccinaucinihilipilificator"
> spacy_tokenize(txt)
$t
[1] "Jacob"  "Rees"   "-"      "Mogg"   "is"     "a"      "floccinaucinihilipilificator"

> tokens(txt, remove_hyphens = TRUE)
tokens from 1 document.
text1 :
[1] "Jacob"  "Rees"   "-"      "Mogg"   "is"     "a"      "floccinaucinihilipilificator"

> tokens(txt, remove_hyphens = FALSE)
tokens from 1 document.
text1 :
[1] "Jacob"     "Rees-Mogg" "is"        "a"         "floccinaucinihilipilificator" 

@amatsuo
Copy link
Collaborator Author

amatsuo commented Oct 22, 2018

@kbenoit
Question:

At the moment, following two lines:

txt <- "This: £ = GBP! 15% not! > 20 percent?"
spacy_tokenize(txt, remove_punct = TRUE, padding = FALSE) %>% dput

Returns:

list(text1 = c("This", "£", "=", "GBP", "15", "not", ">", "20", "percent"))

That's different from the test expectation. Do you think we should remove characters that may or may not be counted as punctuations such as "£"?

@kbenoit
Copy link
Collaborator

kbenoit commented Nov 2, 2018

There is a discrepancy in what spaCy considers to be a SYM and the Unicode category classification:

> spacy_tokenize("Contains symbols £ ±", remove_symbols = TRUE)
$text1
[1] "Contains" "symbols"  "±"       

> spacy_parse("Contains symbols £ ±")
  doc_id sentence_id token_id    token   lemma  pos  entity
1  text1           1        1 Contains contain NOUN        
2  text1           1        2  symbols  symbol VERB        
3  text1           1        3        £       £  SYM        
4  text1           1        4        ±       ± NOUN MONEY_B
> stringi::stri_detect_charclass(c("£", "±"), "\\p{S}")
[1] TRUE TRUE

R/spacy_parse.R Outdated
@@ -46,6 +47,7 @@ spacy_parse <- function(x,
lemma = TRUE,
entity = TRUE,
dependency = FALSE,
noun_phrase = FALSE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove the underscore, since none of the other arguments have it. So just nounphrase.

@amatsuo
Copy link
Collaborator Author

amatsuo commented Nov 5, 2018

I implemented the first version of noun_phrase option in spacy_parse(). At the moment it works like below. Since the noun phrases od not have the categories like entities, different set of fields are returned.

@kbenoit, what do you think?

Code:

library(spacyr)
txt <- c(doc1 = "Natural Language Processing is a branch of computer science that employs various Artificial Intelligence (AI) techniques to process content written in natural language. NLP-enhanced wikis can support users in finding, developing and organizing knowledge contained inside the wiki repository. ", 
  doc2 = "Paul earned a postgraduate degree from MIT.")
spacy_parse(txt, noun_phrase = TRUE)

Output:

   doc_id sentence_id token_id        token        lemma   pos entity
1    doc1           1        1      Natural      natural PROPN  ORG_B
2    doc1           1        2     Language     language PROPN  ORG_I
3    doc1           1        3   Processing   processing PROPN  ORG_I
4    doc1           1        4           is           be  VERB       
5    doc1           1        5            a            a   DET       
6    doc1           1        6       branch       branch  NOUN       
7    doc1           1        7           of           of   ADP       
8    doc1           1        8     computer     computer  NOUN       
9    doc1           1        9      science      science  NOUN       
10   doc1           1       10         that         that   ADJ       
11   doc1           1       11      employs       employ  VERB       
12   doc1           1       12      various      various   ADJ       
13   doc1           1       13   Artificial   artificial PROPN  ORG_B
14   doc1           1       14 Intelligence intelligence PROPN  ORG_I
15   doc1           1       15            (            ( PUNCT  ORG_I
16   doc1           1       16           AI           ai PROPN  ORG_I
17   doc1           1       17            )            ) PUNCT       
18   doc1           1       18   techniques    technique  NOUN       
19   doc1           1       19           to           to  PART       
20   doc1           1       20      process      process  VERB       
21   doc1           1       21      content      content  NOUN       
22   doc1           1       22      written        write  VERB       
23   doc1           1       23           in           in   ADP       
24   doc1           1       24      natural      natural   ADJ       
25   doc1           1       25     language     language  NOUN       
26   doc1           1       26            .            . PUNCT       
27   doc1           2        1          NLP          nlp PROPN  ORG_B
28   doc1           2        2            -            - PUNCT       
29   doc1           2        3     enhanced      enhance  VERB       
30   doc1           2        4        wikis         wiki  NOUN       
31   doc1           2        5          can          can  VERB       
32   doc1           2        6      support      support  VERB       
33   doc1           2        7        users         user  NOUN       
34   doc1           2        8           in           in   ADP       
35   doc1           2        9      finding         find  VERB       
36   doc1           2       10            ,            , PUNCT       
37   doc1           2       11   developing      develop  VERB       
38   doc1           2       12          and          and CCONJ       
39   doc1           2       13   organizing     organize  VERB       
40   doc1           2       14    knowledge    knowledge  NOUN       
41   doc1           2       15    contained      contain  VERB       
42   doc1           2       16       inside       inside   ADP       
43   doc1           2       17          the          the   DET       
44   doc1           2       18         wiki         wiki  NOUN       
45   doc1           2       19   repository   repository  NOUN       
46   doc1           2       20            .            . PUNCT       
47   doc2           1        1         Paul         paul PROPN  ORG_B
48   doc2           1        2       earned         earn  VERB       
49   doc2           1        3            a            a   DET       
50   doc2           1        4 postgraduate postgraduate  NOUN       
51   doc2           1        5       degree       degree  NOUN       
52   doc2           1        6         from         from   ADP       
53   doc2           1        7          MIT          mit PROPN  ORG_B
54   doc2           1        8            .            . PUNCT       
                                       noun_phrase noun_phrase_root_text
1                      Natural Language Processing            Processing
2                      Natural Language Processing            Processing
3                      Natural Language Processing            Processing
4                                             <NA>                  <NA>
5                                         a branch                branch
6                                         a branch                branch
7                                             <NA>                  <NA>
8                                 computer science               science
9                                 computer science               science
10                                            <NA>                  <NA>
11                                            <NA>                  <NA>
12 various Artificial Intelligence (AI) techniques            techniques
13 various Artificial Intelligence (AI) techniques            techniques
14 various Artificial Intelligence (AI) techniques            techniques
15 various Artificial Intelligence (AI) techniques            techniques
16 various Artificial Intelligence (AI) techniques            techniques
17 various Artificial Intelligence (AI) techniques            techniques
18 various Artificial Intelligence (AI) techniques            techniques
19                                            <NA>                  <NA>
20                                            <NA>                  <NA>
21                                         content               content
22                                            <NA>                  <NA>
23                                            <NA>                  <NA>
24                                natural language              language
25                                natural language              language
26                                            <NA>                  <NA>
27                              NLP-enhanced wikis                 wikis
28                              NLP-enhanced wikis                 wikis
29                              NLP-enhanced wikis                 wikis
30                              NLP-enhanced wikis                 wikis
31                                            <NA>                  <NA>
32                                            <NA>                  <NA>
33                                           users                 users
34                                            <NA>                  <NA>
35                                            <NA>                  <NA>
36                                            <NA>                  <NA>
37                                            <NA>                  <NA>
38                                            <NA>                  <NA>
39                                            <NA>                  <NA>
40                                       knowledge             knowledge
41                                            <NA>                  <NA>
42                                            <NA>                  <NA>
43                             the wiki repository            repository
44                             the wiki repository            repository
45                             the wiki repository            repository
46                                            <NA>                  <NA>
47                                            Paul                  Paul
48                                            <NA>                  <NA>
49                           a postgraduate degree                degree
50                           a postgraduate degree                degree
51                           a postgraduate degree                degree
52                                            <NA>                  <NA>
53                                             MIT                   MIT
54                                            <NA>                  <NA>
   noun_phrase_length start_token_id root_token_id
1                   3              1             3
2                   3              1             3
3                   3              1             3
4                  NA             NA            NA
5                   2              5             6
6                   2              5             6
7                  NA             NA            NA
8                   2              8             9
9                   2              8             9
10                 NA             NA            NA
11                 NA             NA            NA
12                  7             12            18
13                  7             12            18
14                  7             12            18
15                  7             12            18
16                  7             12            18
17                  7             12            18
18                  7             12            18
19                 NA             NA            NA
20                 NA             NA            NA
21                  1             21            21
22                 NA             NA            NA
23                 NA             NA            NA
24                  2             24            25
25                  2             24            25
26                 NA             NA            NA
27                  4              1             4
28                  4              1             4
29                  4              1             4
30                  4              1             4
31                 NA             NA            NA
32                 NA             NA            NA
33                  1              7             7
34                 NA             NA            NA
35                 NA             NA            NA
36                 NA             NA            NA
37                 NA             NA            NA
38                 NA             NA            NA
39                 NA             NA            NA
40                  1             14            14
41                 NA             NA            NA
42                 NA             NA            NA
43                  3             17            19
44                  3             17            19
45                  3             17            19
46                 NA             NA            NA
47                  1              1             1
48                 NA             NA            NA
49                  3              3             5
50                  3              3             5
51                  3              3             5
52                 NA             NA            NA
53                  1              7             7
54                 NA             NA            NA

@kbenoit
Copy link
Collaborator

kbenoit commented Nov 5, 2018

I think it should operate just as entity does, by marking the start and end of the noun phrase, and then using an extract or consolidate function to extract or combine them. The problem with the format above is that it repeats the noun phrases across its components.

So with entity:

txt3 <- "We analyzed the Supreme Court using natural language processing." 
spacy_parse(txt3, entity = TRUE, nounphrase = FALSE)
#    doc_id sentence_id token_id      token      lemma   pos entity
# 1   text1           1        1         We     -PRON-  PRON       
# 2   text1           1        2   analyzed    analyze  VERB       
# 3   text1           1        3        the        the   DET  ORG_B
# 4   text1           1        4    Supreme    supreme PROPN  ORG_I
# 5   text1           1        5      Court      court PROPN  ORG_I
# 6   text1           1        6      using        use  VERB       
# 7   text1           1        7    natural    natural   ADJ       
# 8   text1           1        8   language   language  NOUN       
# 9   text1           1        9 processing processing  NOUN       
# 10  text1           1       10          .          . PUNCT 

I think nounphrase = TRUE ought to return:

spacy_parse(txt3, entity = FALSE, nounphrase = TRUE)
#    doc_id sentence_id token_id      token      lemma   pos nounphrase
# 1   text1           1        1         We     -PRON-  PRON       
# 2   text1           1        2   analyzed    analyze  VERB       
# 3   text1           1        3        the        the   DET      np_beg
# 4   text1           1        4    Supreme    supreme PROPN      np_mid
# 5   text1           1        5      Court      court PROPN      np_end
# 6   text1           1        6      using        use  VERB       
# 7   text1           1        7    natural    natural   ADJ       
# 8   text1           1        8   language   language  NOUN       
# 9   text1           1        9 processing processing  NOUN       
# 10  text1           1       10          .          . PUNCT 

then we use similar code to the entity functions extract and consolidate to define nounphrase_extract() and nounphrase_consolidate(). Either of the two consolidate functions would remove the tags from the other - e.g.

spacy_parse(txt3, entity = TRUE, nounphrase = TRUE) %>%
    entity_consolidate()

would remove the nounphrase column altogether (and vice-versa).

@kbenoit
Copy link
Collaborator

kbenoit commented Nov 5, 2018

Need to add some tests as well...

This makes it more consistent with our (non) use of underscores in other parse arguments.
@amatsuo
Copy link
Collaborator Author

amatsuo commented Nov 5, 2018

I understand that we want to have consistency.

However, in that case, we need to sacrifice the re-constructability of original noun phrases. For instance, in the example above, various Artificial Intelligence (AI) techniques is one of the noun phrases. Concatenations with spaces will provide spaces after/before the parentheses. The problem is the spacy_parse output does not have trailing space info.

One possibility is to include a flag for trailing spaces as a field. This will mean a loop over tokens has to be run. Maybe worth it, though.

@amatsuo
Copy link
Collaborator Author

amatsuo commented Nov 5, 2018

And, also the information about the root token is important for some purposes. So the desirable output might be:

spacy_parse(txt3, entity = FALSE, nounphrase = TRUE)
#    doc_id sentence_id token_id      token      lemma   pos nounphrase
# 1   text1           1        1         We     -PRON-  PRON       
# 2   text1           1        2   analyzed    analyze  VERB       
# 3   text1           1        3        the        the   DET      beg
# 4   text1           1        4    Supreme    supreme PROPN      mid
# 5   text1           1        5      Court      court PROPN      end_root
# 6   text1           1        6      using        use  VERB       
# 7   text1           1        7    natural    natural   ADJ       
# 8   text1           1        8   language   language  NOUN       
# 9   text1           1        9 processing processing  NOUN       
# 10  text1           1       10          .          . PUNCT 

np_ is jsut taking up memory, so we don't need it. Also, we might possibly add the trailing space info.

@amatsuo
Copy link
Collaborator Author

amatsuo commented Nov 5, 2018

I implemented a new version of this option in noun-phrase-v2 branch. Thoughts?

Output:

txt <- c(doc1 = "Natural Language Processing is a branch of computer science that employs various Artificial Intelligence (AI) techniques to process content written in natural language. NLP-enhanced wikis can support users in finding, developing and organizing knowledge contained inside the wiki repository. ", 
  doc2 = "Paul earned a postgraduate degree from MIT.")
  (spacy_parse(txt, nounphrase = TRUE))
## Found 'spacy_condaenv'. spacyr will use this environment
## successfully initialized (spaCy Version: 2.0.16, language model: en)
## (python options: type = "condaenv", value = "spacy_condaenv")
##    doc_id sentence_id token_id        token        lemma   pos entity
## 1    doc1           1        1      Natural      natural PROPN  ORG_B
## 2    doc1           1        2     Language     language PROPN  ORG_I
## 3    doc1           1        3   Processing   processing PROPN  ORG_I
## 4    doc1           1        4           is           be  VERB       
## 5    doc1           1        5            a            a   DET       
## 6    doc1           1        6       branch       branch  NOUN       
## 7    doc1           1        7           of           of   ADP       
## 8    doc1           1        8     computer     computer  NOUN       
## 9    doc1           1        9      science      science  NOUN       
## 10   doc1           1       10         that         that   ADJ       
## 11   doc1           1       11      employs       employ  VERB       
## 12   doc1           1       12      various      various   ADJ       
## 13   doc1           1       13   Artificial   artificial PROPN  ORG_B
## 14   doc1           1       14 Intelligence intelligence PROPN  ORG_I
## 15   doc1           1       15            (            ( PUNCT  ORG_I
## 16   doc1           1       16           AI           ai PROPN  ORG_I
## 17   doc1           1       17            )            ) PUNCT       
## 18   doc1           1       18   techniques    technique  NOUN       
## 19   doc1           1       19           to           to  PART       
## 20   doc1           1       20      process      process  VERB       
## 21   doc1           1       21      content      content  NOUN       
## 22   doc1           1       22      written        write  VERB       
## 23   doc1           1       23           in           in   ADP       
## 24   doc1           1       24      natural      natural   ADJ       
## 25   doc1           1       25     language     language  NOUN       
## 26   doc1           1       26            .            . PUNCT       
## 27   doc1           2        1          NLP          nlp PROPN  ORG_B
## 28   doc1           2        2            -            - PUNCT       
## 29   doc1           2        3     enhanced      enhance  VERB       
## 30   doc1           2        4        wikis         wiki  NOUN       
## 31   doc1           2        5          can          can  VERB       
## 32   doc1           2        6      support      support  VERB       
## 33   doc1           2        7        users         user  NOUN       
## 34   doc1           2        8           in           in   ADP       
## 35   doc1           2        9      finding         find  VERB       
## 36   doc1           2       10            ,            , PUNCT       
## 37   doc1           2       11   developing      develop  VERB       
## 38   doc1           2       12          and          and CCONJ       
## 39   doc1           2       13   organizing     organize  VERB       
## 40   doc1           2       14    knowledge    knowledge  NOUN       
## 41   doc1           2       15    contained      contain  VERB       
## 42   doc1           2       16       inside       inside   ADP       
## 43   doc1           2       17          the          the   DET       
## 44   doc1           2       18         wiki         wiki  NOUN       
## 45   doc1           2       19   repository   repository  NOUN       
## 46   doc1           2       20            .            . PUNCT       
## 47   doc2           1        1         Paul         paul PROPN  ORG_B
## 48   doc2           1        2       earned         earn  VERB       
## 49   doc2           1        3            a            a   DET       
## 50   doc2           1        4 postgraduate postgraduate  NOUN       
## 51   doc2           1        5       degree       degree  NOUN       
## 52   doc2           1        6         from         from   ADP       
## 53   doc2           1        7          MIT          mit PROPN  ORG_B
## 54   doc2           1        8            .            . PUNCT       
##    nounphrase whitespace
## 1         beg       TRUE
## 2         mid       TRUE
## 3    end_root       TRUE
## 4        <NA>       TRUE
## 5         beg       TRUE
## 6    end_root       TRUE
## 7        <NA>       TRUE
## 8         beg       TRUE
## 9    end_root       TRUE
## 10       <NA>       TRUE
## 11       <NA>       TRUE
## 12        beg       TRUE
## 13        mid       TRUE
## 14        mid       TRUE
## 15        mid      FALSE
## 16        mid      FALSE
## 17        mid       TRUE
## 18   end_root       TRUE
## 19       <NA>       TRUE
## 20       <NA>       TRUE
## 21   beg_root       TRUE
## 22       <NA>       TRUE
## 23       <NA>       TRUE
## 24        beg       TRUE
## 25   end_root      FALSE
## 26       <NA>       TRUE
## 27        beg      FALSE
## 28        mid      FALSE
## 29        mid       TRUE
## 30   end_root       TRUE
## 31       <NA>       TRUE
## 32       <NA>       TRUE
## 33   beg_root       TRUE
## 34       <NA>       TRUE
## 35       <NA>      FALSE
## 36       <NA>       TRUE
## 37       <NA>       TRUE
## 38       <NA>       TRUE
## 39       <NA>       TRUE
## 40   beg_root       TRUE
## 41       <NA>       TRUE
## 42       <NA>       TRUE
## 43        beg       TRUE
## 44        mid       TRUE
## 45   end_root      FALSE
## 46       <NA>       TRUE
## 47   beg_root       TRUE
## 48       <NA>       TRUE
## 49        beg       TRUE
## 50        mid       TRUE
## 51   end_root       TRUE
## 52       <NA>       TRUE
## 53   beg_root      FALSE
## 54       <NA>      FALSE

@amatsuo
Copy link
Collaborator Author

amatsuo commented Nov 6, 2018

moved to #134 (comment)

@kbenoit
Copy link
Collaborator

kbenoit commented Nov 8, 2018

Strangely I am seeing the following when running a local check:

N  checking R code for possible problems (3.1s)
   get_noun_phrases: no visible global function definition for ‘:=’
   get_noun_phrases: no visible binding for global variable ‘start_id’
   get_noun_phrases: no visible binding for global variable ‘root_id’
   nounphrase_extract.spacyr_parsed: no visible binding for global
     variable ‘nounphrase_id’
   nounphrase_extract.spacyr_parsed: no visible binding for global
     variable ‘token_space’
   nounphrase_extract.spacyr_parsed: no visible binding for global
     variable ‘token’
   spacy_extract_nounphrases.character: no visible binding for global
     variable ‘start_id’
   spacy_extract_nounphrases.character: no visible binding for global
     variable ‘root_id’
   spacy_parse.character: no visible binding for global variable ‘w_id’
   spacy_parse.character: no visible global function definition for ‘.’
   spacy_parse.character: no visible binding for global variable ‘root_id’
   spacy_parse.character: no visible binding for global variable ‘.N’
   spacy_parse.character: no visible binding for global variable
     ‘whitespace’
   Undefined global functions or variables:
     . .N := nounphrase_id root_id start_id token token_space w_id
     whitespace

and also the /inst/doc/ files ware getting deleted (I have not committed the deletions):
screenshot 2018-11-08 16 22 31

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants