relation extraction

Relation Extraction

Library to create vertices and edges from a document annotated with cyber-entity labels and a set of SVMs and feature models to predict relationships between these cyber entities.

Entity Types

Software
- Vendor
- Product
- Version
File
- Name
Function
- Name
Vulnerability
- Name
- Description
- CVE
- MS

Relationship Types

ExploitTargetRelatedObservable Edge

  Exploit Target (e.g. vulnerability) --> Observable (e.g. software)

Sub-Observable Edge

  Observable (e.g. software) --> Observable (e.g. file)

Software, File, Function, Vulnerability Vertices

  Software/file/function/vulnerability properties are part of the same vertex
  
  Example: "... **MS15-035**, which addresses a **remote code execution** bug ..."
  "MS15-035" is extracted as a vulnerability MS property, and "remote code execution" is extracted as a vulnerability description property. This type of relationship indicates that both properties are describing the same vulnerability object.

Input

Output from the Entity-Extractor as an Annotation object, which represents the sentences, list of words from the text, along with each word's part of speech tag and cyber domain label.
The String name of the document's source
The String name of the document's title

Current Process

Pre-trained Word2Vec model
Pre-trained SVM models, one for each relationship and entities' order of appearance
Pre-generated feature maps, one for each relationship and enities' order of appearance
NVD XML files are used to find examples of the relationships
For each Annotated document:
1. Use NVD files to find known examples of relationships in document
2. Use Word2Vec model to encode each token of the document
3. Use feature maps to generate feature vectors for each token of the document
4. Use pre-trained SVM models with the document's feature vectors to predict relationships between cyber entities

Note: Refer to relation-bootstrap repo for more information on the process.

Output

A JSON-formatted subgraph of the vertices and edges is created, which loosely resembles the STIX data model

 {
 	"vertices": {
 		"1235": {
 			"name": "1235",
 			"vertexType": "software",
 			"product": "Windows XP",
 			"vendor": "Microsoft",
 			"source": "CNN"
 		},
 		...
 		"1240": {
 			"name": "file.php",
 			"vertexType": "file",
 			"source": "CNN"
 		}
 	},
 	"edges": [
 		{
 			"inVertID": "1237",
 			"outVertID": "1238",
 			"relation": "ExploitTargetRelatedObservable"
 		},
 		{
 			"inVertID": "1240",
 			"outVertID": "1239",
 			"relation": "Sub-Observable"
 		}
 	]
 }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly