-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding custom gene set #747
Comments
Some remarks: It is not clear to me what you mean with the You mention lists of DEG (differentially expressed genes), but Also note that your issue cannot be reproduced by others, because of the lack of the input files (i.e. reproducible example). |
Hello. Thank you for responding. I don't see NA values in my data, I'm, trying to figure out the underlying reason. Yes, I am using the all genes measured. I have several GSEA to run, and I'm using the fold change from each group comparison as the ranking criteria. That's what I was referring to. Let me see how to make a reproducible example |
For a starter, it would already be helpful if you could show your full code, and the output in which the NA are present. |
To clarify my question. There is no error or NA in the input or output. The result table only had 5 of the 8 gene sets. The investigator would like to see the GSEA result metrics for all of the 8 gene sets. Can you give some insights why the result metrics for the 3 of the gene sets are missing ? What criteria is used ? I will get the full code and data in the meantime |
Steps
Attached zip file |
Thanks for providing the example files. By using these I could reproduce your 'issue', which IMO is actually the intended/expected behavior. Let me explain: Using the provided files I ran your code. I am only showing the output of the last step:
Indeed, results for only 5 gene sets (out of 8) are reported... How come? Since no significance cutoff is applied (because of the setting Some code to show this:
Thus, only gene set Next check how many unique genes are actually present in the gene sets:
This shows that e.g. How many of the unique genes in the gene sets are actually also measured in the (filtered) input data:
The table above shows that many genes in the gene sets have not been 'measured' (i.e. are not present in Rerun GSEA, but increase max gene sets size to 7000 (from 2000).
Note that now the results of 6 gene sets are reported (not 5); results for set But why still not results for all 8 input sets? This has to do with the (2nd) warning that is reported: This warning is thrown by This is why in the end results for only 6 of the 8 sets are reported! Note: to have all sets analyzed, in the Code to show
When setting
|
Thank you SO MUCH for looking into this and for the detailed explanation. This has been incredibly helpful. |
That is a good question. I commonly use the (moderated) t-value for ranking genes, and I have never had to deal with unbalanced sets. Yet, I don't want to claim the t-value is 'the best' ranking metric... but it is arguably used often. |
Thank you very much |
Hello , I'm working on GSEA on a custom gene set. I created a custom gene set that included 8 gene sets. Lets call it geneset1, geneset2...geneset8. I saved them as GMT files, and was able to run GSEA using the following function shown at the bottom.
For most of my DEG lists, the code worked perfectly. For two of my DEG lists, the result table only had 5 of the 8 gene sets. Note that code worked fine, just that the the result table only had 5 of the 8 gene sets.
The investigator would like to see the results for all of the 8 gene sets. I've tried changing the settings of
minGSSize, maxGSSize, pvalueCutoff and eps
, but I'm not able to get the p-value and other information to show up for the missing gene sets.Can you give some insights why the rest are missing ? Are they all NA values , if so why are they NA values ?
Logs when running
The text was updated successfully, but these errors were encountered: