Skip to content

English Knowledge Resources

Roberto Zanoli edited this page May 20, 2015 · 1 revision

###SQL-Based Resources

@TODO: The links for downloading the resources refer now to the old BIUTEE webpage. Refer to new Maven repository.

Some knowledge resources are stored as MySQL tables, provided as compressed .sql files. In order to use them:

  • Download the resources from the links in the table below. Each file represents one MySQL schema, and may contain several knowledge resources. Note that you don't need to download them all, you may download only the schema files containing the resources you wish to use.
  • Install the free SQL server MySQL.
  • Install its administration tool MySQL Workbench.
  • Run the server.
  • Connect to the server via MySQL Workbench, and in it:
  • Create a user named db_readonly, with password BIUTEE: ''Users and Privileges --> Add Account''
  • Import the schema files to the database: ''Data Import/Restore --> Import from Dump Project Folder --> (input folder path containing uncompressed .sql files) --> Load Folder Contents --> (select all required schemas) --> Start Import''
  • Make sure user db_readonly has read (SELECT) privileges to all of the tables in the imported schemas.
  • Define an environment variable named MYSQL with a value referring to the MySQL server address (name or IP address) and port. For example: dbsql.cs.biu.ac.il:3306.
Schema Name Knowledge Resources in Configuration Schema Download File Size (Compressed)
BAP (Directional Similarity) BAP Download 111 MB
Lin Similarity LIN_DEPENDENCY_ORIGINAL
LIN_PROXIMITY_ORIGINAL
Download 236 MB
Original DIRT ORIG_DIRT Download 55 MB
Wikipedia Knowledge Resource WIKIPEDIA Download 214 MB
Binary Lin, Dependency Reuters BINARY_LIN
LIN_DEPENDENCY_REUTERS
Download 2.4 GB
Framenet FRAMENET Download 228 KB
Geo (Geographical Knowledge Resource) GEO Download 1.4 MB
ReVerb (Distributional Similarity with Global Constraints) REVERB Download 161 MB

###Redis-based Resources

####Distributional Similarity

Distribution

  • Redis database files
  • License: MIT license

#####Lexical

Java interface: SimilarityStorageBasedLexicalResource

######Lin proximity-based

Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, without dependency-based features. Top 1000 similarities were selected for each element.

About 57M rules.

Download

######Lin dependency-based

Distributional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying Lin's method [Lin 1998] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.

About 58M rules.

Download

######Directional similarities, Reuters

Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the Reuters RCV1 and RCV2 corpora, with dependency-based features. Top 1000 similarities were selected for each element.

About 53M rules for left side, and about 43M rules for right side.

Download

Directional similarities, UkWAC

Directional similarity rules for English nouns, adjectives, adverbs, and verbs (which appear at least 10 times in the corpus). The similarities were calculated by applying the balanced AP (bap) measure [Kotlerman et al. 2009, Kotlerman et al. 2010] on the English UKWac corpus, with dependency-based features. Top 1000 similarities were selected for each element.

About 21M rules for left side, and about 33M rules for right side.

Download

Syntactic

Java interface: SimilarityStorageBasedDIRTSyntacticResource

DIRT, Reuters, Redis-based

Distributional similarity rules for English dependency paths (which appear at least 100 times in the corpus). The similarities were calculated by applying the DIRT method [Lin 1998] on the Reuters RCV1 and RCV2 corpora. Top 1000 similarities were selected for each element.

About 10M rules.

Download

######Distributional Similarity based on Reverb dataset, Redis-based

Distributional similarity rules for English predicates, based on Reverb extractions [Fader et al. 2011].

Download

####Wikipedia

Distribution:

  • Redis database file
  • License: MIT license
  • Java interface: eu.excitementproject.eop.lexicalminer.redis.RedisBasedWikipediaLexicalResource
  • Download

####Geo (Geographical Knowledge Resource)

Distribution:

  • Redis database file
  • License: MIT license
  • Java interface: eu.excitementproject.eop.core.component.lexicalknowledge.geo.RedisBasedGeoLexicalResource
  • Download
Clone this wiki locally