-
Notifications
You must be signed in to change notification settings - Fork 7
relation extraction
testak edited this page Aug 31, 2017
·
3 revisions
Library to create vertices and edges from a document annotated with cyber-entity labels and a set of SVMs and feature models to predict relationships between these cyber entities.
- Software
- Vendor
- Product
- Version
- File
- Name
- Function
- Name
- Vulnerability
- Name
- Description
- CVE
- MS
-
ExploitTargetRelatedObservable Edge
Exploit Target (e.g. vulnerability) --> Observable (e.g. software)
-
Sub-Observable Edge
Observable (e.g. software) --> Observable (e.g. file)
-
Software, File, Function, Vulnerability Vertices
Software/file/function/vulnerability properties are part of the same vertex Example: "... **MS15-035**, which addresses a **remote code execution** bug ..." "MS15-035" is extracted as a vulnerability MS property, and "remote code execution" is extracted as a vulnerability description property. This type of relationship indicates that both properties are describing the same vulnerability object.
- Output from the Entity-Extractor as an Annotation object, which represents the sentences, list of words from the text, along with each word's part of speech tag and cyber domain label.
- The String name of the document's source
- The String name of the document's title
- Pre-trained Word2Vec model
- Pre-trained SVM models, one for each relationship and entities' order of appearance
- Pre-generated feature maps, one for each relationship and enities' order of appearance
- NVD XML files are used to find examples of the relationships
- For each Annotated document:
- Use NVD files to find known examples of relationships in document
- Use Word2Vec model to encode each token of the document
- Use feature maps to generate feature vectors for each token of the document
- Use pre-trained SVM models with the document's feature vectors to predict relationships between cyber entities
Note: Refer to relation-bootstrap repo for more information on the process.
-
A JSON-formatted subgraph of the vertices and edges is created, which loosely resembles the STIX data model
{ "vertices": { "1235": { "name": "1235", "vertexType": "software", "product": "Windows XP", "vendor": "Microsoft", "source": "CNN" }, ... "1240": { "name": "file.php", "vertexType": "file", "source": "CNN" } }, "edges": [ { "inVertID": "1237", "outVertID": "1238", "relation": "ExploitTargetRelatedObservable" }, { "inVertID": "1240", "outVertID": "1239", "relation": "Sub-Observable" } ] }