Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typo in plot_block() + Usage of HaploBlocker on small populations #2

Open
JennyHTLee opened this issue May 13, 2019 · 6 comments
Open
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@JennyHTLee
Copy link

JennyHTLee commented May 13, 2019

Hi,

I would like to report on a typo in the plot_block function for the option "orientation"

plot_block(parent.t.sliced_blocklist, orientation="front")
Error in plot_block(parent.t.sliced_blocklist, orientation = "front") :
unused argument (orientation = "front")
plot_block(parent.t.sliced_blocklist, oriantation="front")

I just started to test out HaploBlocker, so I've been playing around with the parameters. Thanks for your work.

Cheers,
Jenny

@tpook92
Copy link
Owner

tpook92 commented May 13, 2019

You are indeed right. The parameter was falsely named oriantation and not orientation. I uploaded a new version that is fixing this issue (version 1.4.7).

@JennyHTLee
Copy link
Author

Thanks,

I have a more general question:
I am trying to identify haplotype blocks in a very small population (10 genotypes) with the aim of tracing recombination from parent to F2s. The experimental design is therefore quite different from usual where you'd prefer a largely diverse population to confidently call blocks.
My question is which parameters you would recommend to modify? I guess I would need to reduce the confidence and coverage cutoff? I've tried window_size and node_min so far.

Happy to hear your suggestions.

Cheers,
Jenny

@tpook92
Copy link
Owner

tpook92 commented May 15, 2019

Boundaries of haplotype blocks in HaploBlocker are not really reflective of recent recombinations. We are more or less tracking ancient recombinations and segments in the genome shared by groups of individuals - in case founder genotypes/haplotypes are available I would recommend to go for traditional pair-wise IBD based approaches.

As designed HaploBlocker is tracking group-wise IBD and therefore block boundaries will not maximize block length for pairs of haplotypes.

To set more of a focus on pairs, you should reduce node_min and edge_min to 2. Additional you should reduce min_majorblock heavily or set a target coverage to the share of the dataset you want to cover by blocks. To maximize block length you could additional set double_share to something below 1 - if computing time is of no issue I would suggest to go for 0.5-0.6 so blocks of 3 haplotypes can be further split up.

In case both founder and F2 genotypes are available you could also consider splitting the dataset into two subgroups and requiring each block to contain at least 1 haplotype from both the founders and the F2.

@tpook92 tpook92 changed the title typo in plot_block() typo in plot_block() + Usage of HaploBlocker on small populations May 15, 2019
@JennyHTLee
Copy link
Author

Thank you very much for your useful comments!

@JennyHTLee
Copy link
Author

JennyHTLee commented May 27, 2019

I run into the error

Error in blocklist[[index + first_block - 1]] : subscript out of bounds

sometimes when I adjust the parameters. Is it because the number of nodes are too low? How can I resolve this?

Example:

block_calculation(s1t.t.s.pos[,-1], prefilter="TRUE", maf=0.05, node_min=2, edge_min=2,
target_coverage=0.4, double_share=0.5)
Start_blockinfo_calculation
..........................Start_nodes_calculation
..........................Start_simple_merge: 1303
Start_CrossMerging_full
Iteration 1 : 856 nodes
Iteration 2 : 441 nodes
Iteration 3 : 318 nodes
Iteration 4 : 263 nodes
Iteration 5 : 251 nodes
Start_IgnoreSmall
Iteration 1 : 251 nodes
Iteration 2 : 27 nodes
Iteration 3 : 15 nodes
Start_Blockmerging
Iteration 1 : 15 blocks
Iteration 2 : 13 blocks
Error in blocklist[[index + first_block - 1]] : subscript out of bounds

Thank you again

@tpook92
Copy link
Owner

tpook92 commented May 27, 2019

This error is caused by no blocks being in the haplotype library at some point of the algorithm.
As your dataset is relatively small this can happen as min_majorblock always starts at 5000 - I will add something to account for this in the next version of the package. Short-term you can solve it by setting min_majorblock to a small value (e.g. 50)

@tpook92 tpook92 added bug Something isn't working question Further information is requested labels Jul 25, 2019
@tpook92 tpook92 self-assigned this Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants