Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdc_rnaseq.R: gdc_rnaseq() on workflows other than "HTSeq - Counts" produce errors #62

Open
lyijin opened this issue Jul 17, 2018 · 2 comments

Comments

@lyijin
Copy link

lyijin commented Jul 17, 2018

sorry for raising two successive issues with this R script.

i previously used the function to cache and return a SummarizedExperiment of HTSeq - Counts from TCGA data, and it worked fine without any hitches (well, i do get HTTP 429 errors when i tried to cache the entirety of TCGA, but i solved it by caching the individual projects of TCGA).

however, when i wanted to test something on the FPKM level, the function produces the same errors on both machines that i tried the command on.

i ran
tcga_se <- gdc_rnaseq("TCGA-CHOL", "HTSeq - FPKM")

the caching ran fine, but it dies with:

Error in names(x) <- value :
  'names' attribute [1] must be the same length as the vector [0]

i'm not really sure what the error means, sorry--if you guys can replicate the error, could you look into why this error was produced? thanks!

(and btw if i could request a nitpicky improvement, could you please suppress the output

Parsed with column specification:
cols(
  X1 = col_character(),
  X2 = col_double()
)

that floods my screen everytime i run the function gdc_rnaseq(). thanks!

@lyijin
Copy link
Author

lyijin commented Aug 1, 2018

i think i found out what was wrong with the script--it's lines 146--148 that is crashing the script.

    mat_qc = data.frame(t(mat[qc_idx, -1]))
    colnames(mat_qc) = paste0('qc',mat[qc_idx,1])
    coldata = dplyr::bind_cols(coldata,mat_qc)

"HTSeq - Counts" files contains three extra lines at the bottom that start with "__" (double underscores), and from what i can tell, the code moves these lines into the colData of the SummarizedExperiment. FPKM / FPKM-UQ files do not contain lines with double underscores in them, therefore causing line 148 to crash.

i've monkey-patched my version to completely drop these three lines. the effect is that now i've subverted the crash when i ask for HTSeq - FPKM, but i do lose some information when the same function works on HTSeq - Counts files. i guess one could write an if block to wrap around these three lines that only gets executed when workflow_type == 'HTSeq - Counts', but i didn't need the info for now, hence i didn't mind the trade-off.

happy to share my code if others are facing the same problem while waiting for the code to be patched.

@JYLeeBioinfo
Copy link

Thank you for sharing your troubleshooting!
I experienced the same problem.

In my case, I tried making a custom function that resembles gdc_rnaseq to fix it without re-installation.

.htseq_importer function also needed to be defined in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants