- Nothing changed yet.
- New process to correct BioSamples imported from NCBI
- Move the data/project directories to the eload directory
- Detect contig naming convention in VCF file
- Fallback on manual VCF file parsing when running sample check as pysam fails sometime
- Fix validation output file renaming in SV and naming convention
- Fix aggregation check
- Fix: ENA platform and Imputation attributes
- Actually keep the samples that were created
- Fix import of metadata for analysis only
- Fix Async upload to ENA
- Run new statistic calculation step
- Install Vep version if not available
- Fallback on manual VCF file parsing when running sample check as pysam fails sometime
- Allow derive sample from multiple sample accessions
- Check that VEP has been run before QC
- Refactor biosamples communicators into pyutils
- Initial version of an orchestrator for submission processing
Add label to nextflow process to better support SLURM
- Remove instance id for clustering and accessioning
- Slight improvement in VEP cache retrieval
- Use NCBI eutils key whenever possible.
- New script to test sample ownership.
Allow insert_new_assembly.py to insert a new taxonomy only
- Wait for all the accessioning to be complete before ingesting accessioning report in the variant warehouse
- Remove platform restriction and add experiment type value
- confirm samples brokering using existing and novel sample names
- Skip check for VEP if not run
- Fix remapping ingestion log file name
- qc_submission support optional statistic calculation
- Improve collection date and geographic location check for pre-submitted samples
- Ensure annotation & statistics calculation are run per analysis
- Update qc_submission to check the new split variant load logs
- Fix missing module
- Create a dummy assembly report when none exists for sequences
- Validate existing BioSamples
- Allow the Link label to be the link if no label is provided
- New script to update Biosamples
- Pass single sequence accession to ENA
- Split variant load
- During variant remapping extract VCF file with taxonomy id
- Normalise VCF before accessioning/variant load
- Merge accessioning and variant load in a single nextflow workflow
- Fix submission to existing project accession (see PR177)
- New script to Insert a publication in EVAPRO (see PR180)
- Change the default ingestion instance to 2 (see PR182)
- Use new version of eva-common-pyutils (v0.6.2).
- Refactor to use new version of pyutils. See here.
- Add check for ENA projects and Biosamples accession. See here.
- Improve novel attributes validation. See here.
- Do not validate biosamples when we update sample info. See here.
- Do not catch errors during normalisation. See here.
- Use Ensembl rapid release and project assembly as fallbacks for target assembly. See here.
- Fix unique analysis usage in Prepare submission and ENA brokering
- Flag for normalisation to warn on reference check issues rather than fail
- Fix validation of date in metadata
- Update the reference assembly in the metadata spreadsheet before brokering
- Fix link retrieval from upload single file to ENA
- Hack to remove the null values in external reference
- Add missing metadata files and process
- Make project and analysis alias unique by prepending the ELOAD number
- Remove spaces between novel attributes
- Improve formatting of archival text
- Brokering results overwrite previous one.
- Disable check for presence of extra file in the 30_valid dir
- Check that the project exists in EVAPRO before loading from ENA only load the analysis if it does
- Fix to QC stating that FTP files are missing
- Remove normalisation
- Run normalisation in prepare brokering
- Move log directory for remapping and clustering from 53_clustering to 00_logs
- New script to retrieve archived ELOAD from LTS
- Fix bug qc_submission.py
- Add docker image to represent eva-submission run environment for testing
- Enforce the presence of sample collection data and geographic location
- Add default geolocation and collection dates if missing
- accept multiple mongos hosts in connection strings (see EVA-3253)
- fix accession import log file name (see EVA-3243)
- Use Spring properties generator to generate application properties for the various Java pipelines (see EVA-3147)
- Determine remapping target assembly for a submission (see EVA-3208)
- Fix nextflow process for creation of properties file
- Fix how the species name in Ensembl is resolved within the VEP cache
- default clustering instance to 6
- Bugfixes for accession load
- Prepare reference genome to allow normlaisation to happen without error
- Add population and imputation to BioSamples when available
- Load Submitted variant accession to the variant warehouse during variant load
- Add remapping, clustering and backpropagation steps
- skip remapping, clustering and backpropagation when target assembly is from a different taxonomy
- Separate structural variation check in the Nextflow
- Generate archival confirmation text after brokering
- Fix to QC report
- Add normalisation step during validation. The file used after that will be the normalised file.
- Fix brokering to existing project (use ELOAD as project alias)
- New validation script that checks verify all ingestion steps have been successful and the data appear where it should.
- New script to add a taxonomy/assembly to the metadata ()
- Add new validation that normalise all the input VCF files as they are being validated
- Make validations run through nextflow run based on the tasks specified on command line
- Create CSI index files
- Detect VCFs with structural variants
- Create CSI index files
- Upload FTP to ENA FTP is resumable and retryable.
- New option for dryrun for ENA upload
- Fix metadata read:
- Support for Analysis list in Sample Sheet
- Serialise dates when uploading to BioSamples
- Hot fix for update to BioSample during brokering
- Update the Contig alias while downloading a new genome
- Use the curation object to update any BioSamples
- Add EVA Study link URL to all BioSamples created
- Write the configuration file explicitly via the context manager
- Make BioSamples brokering more robust
- Allow submission to ENA via asynchronous endpoint
- csi generated without .gz to accomodate ENA validation
- Check assembly report for multiple synonyms for the same contig
- Remove tbi index generation and upload to ENA
- Support for metadata load when adding to an existing project
- migrate script do not crash when specifying a project that has not been used yet
- Fix VEP cache download and directory extraction
- New option to keep the configuration as is
- update_metadata script now check assembly_set_id coherence
- New script to retrieve VCF and tabix files from ENA when they do not exist in EVAPRO
- automated deployment in codon
- Add support for FTP copies in datamovers nodes
- New Script to migrate in-progress submissions
- Set a cipher that will work on Codon
- New option to force validation to pass
- Enable project and analysis accessions to be set on command line
- Ability to resume Nextflow processes within ingestion
- Require latest vcf_merge and retrieve the csi index if it exists
- Fix for running annotation only
- Adding count service credentials to accessioning properties file
- Adding logging to single file across each step
- Update metadata file correctly after merge
- Fix bug in post merge check
- Allow running loading step with annotation only
- Add warning when the browsable files are different from analysis files
- Do not retry brokering preparation when it has already been done
- Fix brokering to existing projects
- Share ELOAD config to ensure prep and validation can communicate
- Fix the annotation metadata collection name
- metadata: update assembly_set_id in analysis after loading from ENA
- VEP cache resolution is based on the assembly
- New option to merge the VCF before ingestion
- new script to run metadata update if they did not occur during ingestion
- Check the aggregation type of VCF files during the validation
- Use aggregation type detected during validation
- Resolve and Create (if required) the variant warehouse databases
- Resolve and download the correct VEP cache or skip annotation
- Fix error strategy in Nextflow
- Detect merge can be performed
- Check analysis alias collision
- New option to merge file post validation
- Add analysis to existing project
- resolve symlinks during variant load to avoid confusing eva-pipeline
- Update metadata after variant load
- Fixes to variant load pipeline and ENA queries.
- Additional bug fixes from 1.0
- Bug fixes from 1.0 release.
- Support for multiple analyses and different reference sequence per analysis
- Version number now stored in eload config, backwards incompatible configs will be upgraded
- Bug fix for getting hold dates from ENA
- Refactor database connection methods
- Skip VEP annotation based on command-line parameter
- Get VEP versions from database when possible
- Fix bug with hold date checking metadata db before loaded from ENA.
- Speedup of excel reader
- Retrieve hold date from ENA rather than config
- Use secondaryPreferred reads in Mongo 4.0 environment
- Add ability to force replacing the config (previous one is backed up)
- Retrieve the reference accession from the analysis in metadata database
- Retrieve files from ENA when not available locally
- Support for multiple files per analysis
- Fix missing Nextflow and etc directory
- Make scripts in bin executable
- Various bugfixes in genome downloader, validation reporting, backlog preparation, and ingestion