Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Proteomics Thermofischer Raw data to mzML open format #23

Open
proccaserra opened this issue Jul 10, 2019 · 3 comments
Open

Convert Proteomics Thermofischer Raw data to mzML open format #23

proccaserra opened this issue Jul 10, 2019 · 3 comments
Assignees
Labels
Resolute Issue associated with RESOLUTE project Task 3.2.2 FAIR implementation of IMI data types, projects and databases

Comments

@proccaserra
Copy link
Collaborator

@ulo please see:
https://github.com/compomics/ThermoRawFileParser

the tool is described in the following manuscript (preprint)
https://www.biorxiv.org/content/10.1101/622852v1.full

it would help with the release of the dataset public and for future

@ulo
Copy link
Collaborator

ulo commented Jul 10, 2019

Thanks for the input.
Yes, the currently community-accepted open format for proteomics is mzML.
In the past, I have used msconvert (http://proteowizard.sourceforge.net/tools/msconvert.html) for this conversion step. I think we should publish the open-format files in addition to the proprietary raw format.

@sgtp sgtp added the Resolute Issue associated with RESOLUTE project label Jul 10, 2019
@mcourtot mcourtot added this to the F2F meeting October 2019 milestone Jul 10, 2019
@sgtp sgtp changed the title Proteomics Thermofischer Raw data conversion to mzML open format Convert Proteomics Thermofischer Raw data to mzML open format Jul 10, 2019
@mcourtot mcourtot added the Task 3.2.2 FAIR implementation of IMI data types, projects and databases label Sep 4, 2019
@mcourtot
Copy link
Collaborator

@proccaserra sent info on tools - @sedlyarov will do the conversion and submit to proteomXchange. @ulo looking into accompanying metadata for submission

@ulo
Copy link
Collaborator

ulo commented Jan 15, 2020

I submitted the proteomics data to the public ProteomeXchange repository, and made some interesting observations regarding requirements on file formats and metadata:

  1. This repository requires not the raw files in an open format (which would be mzML; as also stated by @proccaserra) but the result files. As we used the ProteomeDiscoverer software, the proprietary result file format is *.pdResult. Fortunately, the software can also export the required mzID format.

  2. Next to generic metadata on the whole data set (project title & description, keywords, sample & data processing protocol), they also require a number of more specific annotations on the sample and method:

    • species: NCBITAXON ontology
    • tissue: BTO / EFO ontology
    • instrument: MS ontology
    • cell type: CL / EFO ontology
    • disease: EFO / DOID ontology
    • quantification method: PRIDE ontology
  3. Interestingly, the actual experimental factors are entered in a free text field, not requiring any structure or ontology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolute Issue associated with RESOLUTE project Task 3.2.2 FAIR implementation of IMI data types, projects and databases
Projects
None yet
Development

No branches or pull requests

4 participants