Skip to content

Active Issue: Relating Source

John Wunder edited this page Feb 2, 2016 · 14 revisions

Consensus

Information identifying and characterizing sources of CTI information should be broken out into a separate "top level" Source construct rather than embedded within each "top level" construct.

Open Questions

How should the relationship between a "top level" construct instance and its Source be asserted?

Questions to consider

  1. Should Source follow the "one way to do things" with relationships or should it be an exception to the rule?
  2. Is Source a key CTI object or only metadata?
  3. Should there be a distinction between the producer of the STIX and the source of the content?
  4. If so, how should that distinction be conveyed?
  5. How do we deal with anonymous sources?
  6. Separate Source object each time an anonymous source is asserted or one general anonymous Source object that is related to for each anonymous source assertion?
  7. How do we deal with deanonymizing an anonymous source?
  8. How do we deal with third party source assertions?
  9. How do we deal with complex source chains (e.g., Z sends me STIX that is a translation of STIX produced by Y that was a STIX codification of information created by X)?
  10. How do we deal with uncertainty/confidence on source assertions?
  11. How important is bandwidth efficiency?
  12. What are the best approaches for dealing with the issue?

Proposal #1

Follow the "one way of doing things" for relationships and assert source relationships for all "top level" construct instances using the Relationship object with a relationship nature of "Has Source".

Strong assertion that Source is a key CTI object and not simply metadata.

Advantages:

  • Consistency (one way of doing relationships)
  • Treats Source as a key CTI object and allows its characterization and correlation like any other CTI object
  • Inherently graph-based to support analysis
  • Enables assertions for both producer of the STIX itself and the creator of the content itself
    • In large majority of cases they will be the same and this approach allows them to be asserted consistently
  • Enables support for anonymous sources and for deanonymizing sources
  • Supports third party source assertions
  • Inherently supports complex source chains in a consistent fashion
  • Allows assertion of confidence for any source assertion
  • When same exact content received from multiple sources, allows you to characterize (with confidence) them separately
  • Supports more flexible pivoting on Source

Disadvantages:

  • Could result in more verbose content (a few extra lines for the "Has Source" relationship of each construct).
    • Can be mitigated by a many-to-one relationship for "Has Source" which would offer the most efficient representation available.

Examples

Example #1: simple indicator with attributed source for the information

{
	"id": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
	"type": "source",
	"timestamp": "2015-12-21T19:59:11Z",
	"name": "US-CERT"
}

{
	"id": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
	"type": "indicator",
	"timestamp": "2015-12-21T19:59:11Z",
	"title": "Sakurel Malware",
	"indicator_expression": "this would be an observable pattern for a particular file hash using the new CybOX patterning language under consideration",
	"indicator_type": ["File Hash Watchlist"]
}

{
	"id": "example:rel-9d0c539e-a874-42c7-a055-3e900b98724f",
	"type": "relationship",
	"timestamp": "2015-12-21T19:59:12Z",
	"from": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
	"to": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
	"relationship_nature": "Has Source"
}

Proposal #2 (Twigs...Terry and John)

Consider the "producer" or "creator" of the STIX representation of a construct as distinct and tracked separately from the sources of information behind that analysis. We believe that there's value in understanding which STIX producer created a construct in addition to understanding where it came from.

Thus, we propose having created_by_ref field (name is certainly debatable, it could be information_source_ref) that defines which STIX producer created the entity. This field serves several purposes:

  1. It's a non-ambiguous way of saying who is responsible for publishing and updating that STIX construct. With the relationship approach, that immediately becomes ambiguous and complicated. While the real source of information may be ambiguous and complicated, the STIX producer should not be.
  2. If we end up using digital signatures, it provides a field within the object that can be a part of the signature content to ensure that the producer is accurate.
  3. It avoids creating extra relationship objects that must be separately tracked for each construct.
  4. It avoids endless relationship chains to assert who the source is for a relationship. For example, if I issue an indicator and then issue a relationship saying that I created it, how do I indicate that it's me saying that I created it? Do I just add a "source" relationship to that relationship? Or do we assume that "source" relationships have a source of whoever they point to?
  5. It maintains relationship not as a generic "this data is related to this data" but as a specific "there's a cyber threat intel domain relationship here".
  6. It's not easy for another party to claim ownership of an existing construct. For example, if I issue an indicator I would say that I created it via issuing a relationship. What if you issue another relationship saying that you actually created that indicator? How should a consumer evaluate that? Having the source directly embedded in the object mitigates this by requiring an object update to change the source which, as we've discussed previously, may only be done by the object owner.
  7. It simplifies the description of who is asserting that they created the content. For example,

This field would align with the Information_Source indicated by the information source in STIX 1.2.

There are several potential approaches for tracking the source of the information itself (which, many times, is multiple sources):

  • Using the relationship object, if confidence is desired
  • Using a list of ID references, if the very common use case is "we got it from this PDF"

Example

{
  "type": "package",
  "id": "package--7342007e-2b76-4a08-a5db-33b09089b602",
  "sources": [
    {
      "type": "information-source",
      "id": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "name": "mitre.org"
    }
  ],
  "malwares": [
    {
      "type": "malware",
      "id": "malware--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "title": "Some Malware",
      "created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1"
    }
  ],
  "relationships": [
    {
      "type": "relationship",
      "id": "relationship--6b0e3856-95f3-4c04-a882-116832996da1",
      "source_ref": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "target_ref": "malware--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "confidence": "high"
    }
  ],
  "indicators": [
    {
      "type": "indicator",
      "id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "title": "Some indicator",
      "created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1"
    }  
  ]
}
Clone this wiki locally