Thank you all for your interest in the node accreditation exercise.
We have five weeks to create a 16S workflow, demonstrate our ability to analyse and interpret the data, test our analysis platform performance, and write a comprehensive report. Here are the details for the SOP we are working through, including some of the questions we need to answer. Please take some time to familiarise yourself with that. A starting pipeline from the previous interns' cohorts is available here, including a link to their report. We can use those to start us off.
A detailed description of the accreditation task force's expectation is available here.
I have created a private repo for this exercise. Please share your GitHub username. I have also downloaded the data and are available on the HPC. However, since these are massive data, we will make them accessible via a shared folder, accessible only to this exercise's participants. I'd like for us to streamline the processes by sharing responsibilities. During optimisation, we will use the training data to avoid overloading the HPC and only run on the real test data when we have that optimised at various stages (by one person). Here are some of the responsibilities I foresee:
- Developing and testing various pipelines (We'll need to form teams)
- A team to convert the pipeline to a workflow language (Nextflkow or Snakemake)
- Report writing, we will need to link the report to the analysis. A Github page would be great.
Some details:
- The test data is located in the
/data/accredetation/16S/test
folder. - It is organized into:
- Data
- Results
- Code: This is under version control. Only send output that needs to be uploaded to GitHub
- Report
- tmp