Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heatmap: location for distance matrix, path for output figure #6

Open
Ascalon98 opened this issue Jun 29, 2024 · 9 comments
Open

Heatmap: location for distance matrix, path for output figure #6

Ascalon98 opened this issue Jun 29, 2024 · 9 comments

Comments

@Ascalon98
Copy link

Ascalon98 commented Jun 29, 2024

Hi! I tried to use the heatmap function, but it did not work. I am not sure where the distance matrix is supposed to be, but it was in my miniconda3 directory, which I think is weird. Could you also tell me what the path for the output figure should be? It seems it is not just a simple jpg file.

(base) aimre@sisko:~/VarClust_data$ varclust_heatmap /home5/aimre/miniconda3/bin/varclust_distance_matrix /home5/aimre/Varclust_figure/heatmap1.jpg
Traceback (most recent call last):
  File "/home5/aimre/miniconda3/bin/varclust_heatmap", line 127, in <module>
    cluster.cluster_hierarchical(distances=distances,
  File "/home5/aimre/miniconda3/lib/python3.11/site-packages/varclust/cluster.py", line 148, in cluster_hierarchical
    colours['label'] = colours['index'].str.split(': ', 1).str[0]
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home5/aimre/miniconda3/lib/python3.11/site-packages/pandas/core/strings/accessor.py", line 137, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: StringMethods.split() takes from 1 to 2 positional arguments but 3 were given
(base) aimre@sisko:~/VarClust_data$ varclust_heatmap /home5/aimre/miniconda3/bin/varclust_distance_matrix /home5/aimre/miniconda3/bin/varclust_heatmap
Traceback (most recent call last):
  File "/home5/aimre/miniconda3/bin/varclust_heatmap", line 127, in <module>
    cluster.cluster_hierarchical(distances=distances,
  File "/home5/aimre/miniconda3/lib/python3.11/site-packages/varclust/cluster.py", line 148, in cluster_hierarchical
    colours['label'] = colours['index'].str.split(': ', 1).str[0]
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home5/aimre/miniconda3/lib/python3.11/site-packages/pandas/core/strings/accessor.py", line 137, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: StringMethods.split() takes from 1 to 2 positional arguments but 3 were given
(base) aimre@sisko:~/VarClust_data$ varclust_heatmap /home5/aimre/miniconda3/bin/varclust_distance_matrix /home5/aimre/Varclust_figure/heatmap1.jpg
Traceback (most recent call last):
  File "/home5/aimre/miniconda3/bin/varclust_heatmap", line 127, in <module>
    cluster.cluster_hierarchical(distances=distances,
  File "/home5/aimre/miniconda3/lib/python3.11/site-packages/varclust/cluster.py", line 148, in cluster_hierarchical
    colours['label'] = colours['index'].str.split(': ', 1).str[0]
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home5/aimre/miniconda3/lib/python3.11/site-packages/pandas/core/strings/accessor.py", line 137, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: StringMethods.split() takes from 1 to 2 positional arguments but 3 were given
@fasterius
Copy link
Owner

fasterius commented Jul 1, 2024

Could you please write down all of the commands that you have run, from the start, as well as where your data is stored? I think that if you've gotten results inside your Conda directory you've done something odd.

@Ascalon98
Copy link
Author

Hi! Yes indeed! I overwrote the distance matrix script somehow. So I downloaded the source code and copied the distance matrix script in the miniconda folder again, and then it worked! Thus now I could generate the the distance matrix and it is in one of my folders. I used this command for it:

(base) aimre@sisko:~/VarClust-0.2.3/bin$ varclust_distance_matrix /home5/aimre/Varclust_profiles/ /home5/aimre/Varclust_profiles/output_distance_matrix

However for the heatmap I am getting a similar error message. I am sorry, probably it is something very basic that I am missing. So I really appreciate your help!

(base) aimre@sisko:~/Varclust_profiles$ varclust_heatmap /home5/aimre/Varclust_profiles/output_distance_matrix /home5/aimre/Varclust_profiles/output_heatmap_figure
Traceback (most recent call last):
File "/home5/aimre/miniconda3/bin/varclust_heatmap", line 123, in
cluster.cluster_hierarchical(distances=distances,
File "/home5/aimre/miniconda3/lib/python3.11/site-packages/varclust/cluster.py", line 148, in cluster_hierarchical
colours['label'] = colours['index'].str.split(': ', 1).str[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home5/aimre/miniconda3/lib/python3.11/site-packages/pandas/core/strings/accessor.py", line 137, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: StringMethods.split() takes from 1 to 2 positional arguments but 3 were given

@Ascalon98
Copy link
Author

I also add the information you asked previously:
I entirely followed the instructions so far. I have single sample vcf.gz files in this folder /home5/aimre/VarClust_data/ and the generated profile files in this folder /home5/aimre/Varclust_profiles/.

Command for creating profiles:
varclust_create_profiles /home5/aimre/VarClust_data/ /home5/aimre/Varclust_profiles/

I think the rest of the steps I have done is in my previous comment. Let me know if you need more information!

@fasterius
Copy link
Owner

Okay, strange. Do you have a way of sharing the vcf file you have, or at least a portion of it, so that I can test it on my end?

@Ascalon98
Copy link
Author

Yes! I attached a few of my vcf files. They are all single sample vcf files. I appreciate your help! The file names contain '.sam' because for some reason the GATK pipeline thought that it was part of the sample name and left it like this in the heading of the vcf files.

W1_1_S1_L003_.sam.vcf.gz
W1_2_S8_L003_.sam.vcf.gz
W1_3_S15_L003_.sam.vcf.gz
W1_4_S22_L003_.sam.vcf.gz

@Ascalon98
Copy link
Author

Ascalon98 commented Jul 9, 2024

Update: In the meantime I fixed the issue. I found the solution here: https://stackoverflow.com/questions/76812405/typeerror-stringmethods-rsplit-takes-from-1-to-2-positional-arguments-but-3-w

So I just corrected str.split(': ', 1) to str.split(': ', n=1) and then it worked.

However, I noticed that the generated figure is not informative, and it seems according to VarClust there is no detectable difference between my samples. Also, in the distance matrix all the comparisons received the same value (0.8333), which to me does not seem realistic. So, I opened my profile files, and it turned out they were empty, there is only a headline. I will try to figure out the reason of this, but if you have any idea, I would genuinely appreciate it!

Here is the figure that VarClust generated.
output_heatmap_figure

@fasterius
Copy link
Owner

Sorry for the late response, I've been on vacation and away from the computer.

Okay, it sounds like your VCFs are malformed somehow. The value 0.8333 is the expected value to get when you're comparing empty profiles with the similarity score and default parameters: similarity score = 1 - (matches + a) / (total + a + b) where a = 1 and b = 5 as default. This yields a score of 1 - 1/6 = 0.8333.

Could you try passing the --method position_only argument? Looking at your VCFs it seems you do not have annotations from SnpEff, so the default full profiles won't work.

I can also see that your chromosomes are named with Roman numerals instead of normal numbers. I'm not sure this is supported by the underlying PyVCF package.

@Ascalon98
Copy link
Author

Hi! Thank you for your answer! I added snpEff annotations but, when I wanted to build the profiles, I received error messages. Without snpEff annotations the command makes empty profiles. Can you send over one of your vcf.gz files that works for you for comparison?

I tried the --method position_only argument like you see below, but it did not work either. Can you elaborate on the usage of this argument?

varclust_create_profiles /home5/aimre/VarClust_data/ /home5/aimre/VarClust_data/snpEff_annotated/varclust_probe/probe_profiles/ --method position_only

@fasterius
Copy link
Owner

Could you try this file: https://github.com/fasterius/seqCAT/blob/master/inst/extdata/sample1.vcf.gz?

The --method position_only argument makes comparisons happen based on SNV positions only, rather than based on position and annotations. Since you don't have annotations from snpEff you should be using positions only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants