Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is contextual data publicly available? #16

Open
samwagg opened this issue Mar 18, 2024 · 4 comments
Open

Is contextual data publicly available? #16

samwagg opened this issue Mar 18, 2024 · 4 comments

Comments

@samwagg
Copy link

samwagg commented Mar 18, 2024

I noticed that the data submission schema contains a lot of contextual information that is not present in the CSV file, which only contains a list of techniques plus a date. I know that it was noted somewhere that the public dataset is anonymized, but is there any way to access at least some of the contextual data, such as platform and software_name?

I'm also curious as to whether your analysis is available in a more machine friendly format.

Thanks so much for this great resource!

@mticmtic
Copy link
Contributor

Hi @samwagg, based on the agreement we made with our data contributors, we are not providing the complete data set, only TIDs and date.

And the data is hosted as a CSV, which we have found to be very machine friendly. Is CSV doesn't work for you, there are CSV to JSON converters online that you can use. This one seems like it would do the trick: https://csvjson.com/.

@samwagg
Copy link
Author

samwagg commented Mar 19, 2024

@mticmtic Thanks for the quick reply! CSV is totally fine. But I mean the analytical data presented on the website, such as sightings by industry and sightings by sector.

@mticmtic
Copy link
Contributor

@samwagg oh i see. The more robust detail presented on the website - industry, region, etc - are not publicly available. That was part of our agreement with our data contributors.

@samwagg
Copy link
Author

samwagg commented Mar 19, 2024

@mticmtic Gotcha. Thank you for your patience. I just want to make sure it's totally clear that what I'm interested in is the aggregated analytical data that you present on your website already, just in a machine format. Not the raw data. For example, this is a great visualization, and it would be awesome to have the exact percentages it represents too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants