Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Microarray] Assertion fails in GENERATE_SOFTWARE_TABLE when data files are not compressed #99

Open
cyouh95 opened this issue Jun 12, 2024 · 1 comment

Comments

@cyouh95
Copy link
Contributor

cyouh95 commented Jun 12, 2024

Description

R.utils is only used when data files are compressed (e.g., .CEL.gz) to unzip them. The following assertion fails with uncompressed data files (e.g., .CEL) because R.utils is not used:

assert len(AFFYMETRIX_SOFTWARE_DPPD) == len(df), f"Not all software accounted for! Missing: {set(AFFYMETRIX_SOFTWARE_DPPD) - set(df['name'].str.lower())}"

Solution

Modify AFFYMETRIX_SOFTWARE_DPPD to exclude R.utils if data files are not compressed. Same thing can be done to AGILENT_SOFTWARE_DPPD in Agilent pipeline.

@cyouh95
Copy link
Contributor Author

cyouh95 commented Jul 2, 2024

Array Data File Name field in runsheet used to determine whether data files are compressed or not. Quoted commas cause issue in splitCsv() as described here, but can be resolved by specifying quote parameter.

cyouh95 added a commit to cyouh95/GeneLab_Data_Processing that referenced this issue Aug 30, 2024
cyouh95 added a commit to cyouh95/GeneLab_Data_Processing that referenced this issue Aug 30, 2024
cyouh95 added a commit to cyouh95/GeneLab_Data_Processing that referenced this issue Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant