Skip to content

Active Issue: Relating Source

John Wunder edited this page Feb 2, 2016 · 14 revisions

Consensus

Information identifying and characterizing sources of CTI information should be broken out into a separate "top level" Source construct rather than embedded within each "top level" construct.

Open Questions

How should the relationship between a "top level" construct instance and its Source be asserted?

Questions to consider

  1. Should Source follow the "one way to do things" with relationships or should it be an exception to the rule?
  2. Is Source a key CTI object or only metadata?
  3. Should there be a distinction between the producer of the STIX and the source of the content?
  4. If so, how should that distinction be conveyed?
  5. How do we deal with anonymous sources?
  6. Separate Source object each time an anonymous source is asserted or one general anonymous Source object that is related to for each anonymous source assertion?
  7. How do we deal with deanonymizing an anonymous source?
  8. How do we deal with third party source assertions?
  9. How do we deal with complex source chains (e.g., Z sends me STIX that is a translation of STIX produced by Y that was a STIX codification of information created by X)?
  10. How do we deal with uncertainty/confidence on source assertions?
  11. How important is bandwidth efficiency?
  12. What are the best approaches for dealing with the issue?

Proposal #1

Follow the "one way of doing things" for relationships and assert source relationships for all "top level" construct instances using the Relationship object with a relationship nature of "Has Source".

Strong assertion that Source is a key CTI object and not simply metadata.

Advantages:

  • Consistency (one way of doing relationships)
  • Treats Source as a key CTI object and allows its characterization and correlation like any other CTI object
  • Inherently graph-based to support analysis
  • Enables assertions for both producer of the STIX itself and the creator of the content itself
    • In large majority of cases they will be the same and this approach allows them to be asserted consistently
  • Enables support for anonymous sources and for deanonymizing sources
  • Supports third party source assertions
  • Inherently supports complex source chains in a consistent fashion
  • Allows assertion of confidence for any source assertion
  • When same exact content received from multiple sources, allows you to characterize (with confidence) them separately
  • Supports more flexible pivoting on Source

Disadvantages:

  • Could result in more verbose content (a few extra lines for the "Has Source" relationship of each construct).
    • Can be mitigated by a many-to-one relationship for "Has Source" which would offer the most efficient representation available.

Examples

Example #1: simple indicator with attributed source for the information

{
	"id": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
	"type": "source",
	"timestamp": "2015-12-21T19:59:11Z",
	"name": "US-CERT"
}

{
	"id": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
	"type": "indicator",
	"timestamp": "2015-12-21T19:59:11Z",
	"title": "Sakurel Malware",
	"indicator_expression": "this would be an observable pattern for a particular file hash using the new CybOX patterning language under consideration",
	"indicator_type": ["File Hash Watchlist"]
}

{
	"id": "example:rel-9d0c539e-a874-42c7-a055-3e900b98724f",
	"type": "relationship",
	"timestamp": "2015-12-21T19:59:12Z",
	"from": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
	"to": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
	"relationship_nature": "Has Source"
}

Proposal #2 (Twigs...Terry and John)

Consider the "producer" or "creator" of the STIX representation of a construct as distinct and tracked separately from the sources of information behind that analysis. We believe that there's value in understanding which STIX producer created a construct in addition to understanding where it came from.

Approach

We propose having created_by_ref field (name is certainly debatable, it could be information_source_ref) that defines which STIX producer created the entity.

We do recognize the need to track the source of the information itself. Our feeling is that if that source is the same as the STIX producer, that can already be covered. If it isn't, or there are derived sources, that could be tracked in a few different ways:

  • Using a list of references, if the very common use case is "we got it from this PDF and we know FireEye made that PDF"
  • Via the relationship object, if there are commonly more complex sources with varying confidences

The exact approach would still need to be discussed, though we do have a ROUGH proposal there: https://docs.google.com/drawings/d/1IfU0u_5y2ZbyEbmrLIo5nXgcBX-wSP3ssQjAB8-9iks/edit?usp=sharing

Yes, that looks complicated. We tried to represent: who created the STIX content, the original source for that content, and the evidence that the original source used to compile that content. This is a work in progress.

For this proposal though, we feel that in any sharing ecosystem we need producers to take responsibility for and ownership of the STIX representation of that objects that they create in a non-ambiguous, simple way.

Goals

Assure non-ambiguity

Using an embedded created_by_ref field is a non-ambiguous way of saying who is responsible for publishing and updating that STIX construct. With the relationship approach, that immediately becomes ambiguous and complicated. While the real source of information may be ambiguous and complicated, the STIX producer should not be. This way we can do things like mandate that only information producers can update content...if we allow multiple producers, and confidence on producers, that becomes more difficult.

More compatible with digital signatures

Using the embedded field also ensures that the producer of a construct is a part of the signature block for that top-level content. That way when the content is passed around as a block you can understand who created it and be assured that it's accurate (assuming it was signed by the producer).

Avoids superfluous relationship objects

We feel that relationship object should be reserved to represent relationships between objects in the cyber threat domain. You can use relationships to represent everything, it doesn't mean you should. Using it to represent who created a given STIX construct is beyond that purpose.

It also simply avoids either a high volume of extra relationships (an additional one for each TLO) or having a relationship with multiple target nodes. While a relationship with multiple targets is easy to represent in a serialization, handling that in code can become very tricky and should be avoided.

Simplifies information source for relationships

An embedded reference also avoids chains of "source" relationships. For example, if I issue an indicator and then issue a relationship saying that I created it, how do I indicate that it's me saying that I created it? Do I need to have another "source" relationship saying that I'm the source of the first source relationship? Or do we assume that "source" relationships have a source of whoever they point to, which is inconsistent?

Having it embedded in the TLO ensures there's a single, concise, way to do that across all TLOs.

Helps prevent false ownership claims

This approach also makes it harder for another party to claim ownership of an existing construct. For example, if I issue an indicator I would say that I created it via issuing a relationship. What if you issue another relationship saying that you actually created that indicator? How should a consumer evaluate that? Having the source directly embedded in the object mitigates this by requiring an object update to change the source in the object itself, which can more easily be detected and evaluated.

Example

{
  "type": "package",
  "id": "package--7342007e-2b76-4a08-a5db-33b09089b602",
  "sources": [
    {
      "type": "information-source",
      "id": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "name": "mitre.org"
    }
  ],
  "malwares": [
    {
      "type": "malware",
      "id": "malware--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "title": "Some Malware",
      "created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1"
    }
  ],
  "relationships": [
    {
      "type": "relationship",
      "id": "relationship--6b0e3856-95f3-4c04-a882-116832996da1",
      "source_ref": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "target_ref": "malware--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
      "confidence": "high"
    }
  ],
  "indicators": [
    {
      "type": "indicator",
      "id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
      "title": "Some indicator",
      "created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1"
    }  
  ]
}
Clone this wiki locally