Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3: ByteRuns #293

Open
dilyanpalauzov opened this issue May 18, 2017 · 3 comments
Open

Python 3: ByteRuns #293

dilyanpalauzov opened this issue May 18, 2017 · 3 comments

Comments

@dilyanpalauzov
Copy link

What is the right way to generate byte_runs with python3?

import cybox.common
import cybox.objects.file_object

file_object = cybox.objects.file_object.File()
file_object.byte_runs = cybox.common.ByteRuns()
file_object.byte_runs.byte_run = cybox.common.ByteRun()

with open('/bin/ls', 'rb') as f:
    file_object.byte_runs.byte_run[0].byte_run_data = f.read()

with open('test', 'wb') as f:
    f.write(file_object.to_xml())

creates
<cyboxCommon:Byte_Run_Data>b'\x7fELF\x02\x01\x01\....\x00\x00\x00'</cyboxCommon:Byte_Run_Data>

but I would expect CDATA, no b'...' and no \x00.

@gtback
Copy link
Contributor

gtback commented May 18, 2017

Hi, @dilyanpalauzov, thanks for the question.

The Byte_Run_Data field is not one of the more frequently used fields in CybOX (in my experience), and is not restricted (it uses xs:anyType) so I don't know how people are using it in practice. XML doesn't allow native NULL bytes (\x00) even with the numeric reference &#0; . The latter is valid in a CDATA block, but I'm not clear whether it is interpreted as a NULL or the literal characters &, #, 0, and ;. So you might need to do something like base64-encoding the data anyway.

For types that we explicitly expect to need a CDATA wrapper (like an HTML email message), we explicitly add the wrapper in the bindings. We can certainly do that in for Byte Run Data, too, as well as adding any metadata to show how the data is encoded (for example, Base64). The b'...' wrapper should definitely not be there (and is an artifact of Python 3). Regardless of what we do to encode the data, we'll also need python-cybox to be able to decode the Byte Run XML back into raw data when parsing.

Have you run across Byte Run Data on CybOX "in the wild"? I would be curious what it looks like.

As a side note, if you are planning to represent the entire contents of a file, standard practice is to use the Artifact object instead. I realized that your example of reading the contents of a file could just be an example, though, and your question is valid regardless.

@dilyanpalauzov
Copy link
Author

Why is Artifact better than Byte_Run for representing a file, which is already partially described in a FileObjectType?

The FileObjectType already contains many properties for a file, like hashes, bits per pixel (for pictures) and so on, and has Byte_Runs. What is Byte_Runs supposed to be used for?

@gtback
Copy link
Contributor

gtback commented May 19, 2017

The ByteRun type can be used to represent any subset of the bytes in a larger object. I don't recall the exact history, but the Artifact Object can be used as a standalone object; it does not need to be embedded within another Object. The Artifact object has a lot more options for specifying how the binary data is encoded; this is much more expressive than the open-ended Byte_Run_Data field.

There's certainly some duplicated functionality between the two. I've always tended to use the Artifact object, and haven't seen Byte_Run_Data being used.

I'd be happy to add better support for Byte_Run_Data in python-cybox, but would need to know how it actually gets used, to make sure we correctly serialized to/from XML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants