Data used by concepts-parser.
- connect_words - words that (may) connect concepts: for, of, etc.;
- invalid_concepts (accentless) - known invalid words/concepts: Brown, all, etc.;
- invalid_prefixes (accentless) - words that (can) connect concepts: In London, In is an invalid prefix;
- known_concepts - irregular known concepts: Dancing with the stars;
- partial_concepts (accentless) - words/concepts that are invalid alone: Barack, Vladimir, etc.;
- split_words - words that (can) split concepts: and, -, etc.;
- valid_prefixes - valid concept prefixes;
- valid_suffixes - valid concept suffixes: Mumbai City district, island;
- firstnames (accentless) - popular firstnames;
const data = require('concept-data');
// get split words for English:
const rules = data.getSplitWords('en');
- news firstnames by country
- added
firstnames
- script
build-firstnames
- removed data
rename_concepts
- data values can be
string
[] orRegExp
[] ava
tests- node v4
- added stopwords to
invalid_concepts
- TypeScript code
- fix empty data file issue
- engine >= node4
- es6 syntax
- build 1 regExp from a list of data items. better performance
- fix small errors
- renamed: concept-data to concepts-data;
- fix concept split bug.
- keep data files in txt format;
- added rename_concepts - set a correct/known name for a concept;
- get data by lang and country codes.