forked from DReichLab/AdmixTools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.QPGRAPH
97 lines (77 loc) · 3.48 KB
/
README.QPGRAPH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
qpGraph -p parfile -g graph [-o outgraph] [-d dotfile]
See examples/runit for some simple examples and sample output
Sample parameter file
DIR: ../data
S1: sim1
indivname: DIR/S1.ind
snpname: DIR/S1.snp
genotypename: DIR/S1.geno
outpop: Out
blgsize: 0.05
## block size in Morgans for Jackknife
## see below
lsqmode: YES
diag: .0001
hires: YES
initmix: 1000
precision: .0001
zthresh: 3.0
terse: NO
useallsnps: NO
-p parfile specifies input data (any Reich lab format)
outpop: zzz
f-stats normalized by heterozygosity in population zzz
outpop: NULL has special meaning : no normalization
blgsize: block size for Jackknife (Morgans)
lsqmode: Default is NO; Estimate of error covariance is used. Unstable for large problems
and instead we fit f_3 stats with "base" population first label (see graph spec)
diag: small fudge factor to add on diagonal of f3. Improves stability
Given admixing weights the edge lengths (drifts) are set by minimizing
a degree 2 function, subject to non-negativity of the lengths.
Initially the program tries V initial tries of admixing weights
defailt value for V is 10 . 2^n where there are n admixtures in the graph.
To override this default
initmix: V
where V is the desired number of initial tries. This may be useful
if you are worried the program may not have found the global optimum.
hires: controls output sometimes more decimals are desired
precision: controls accuracy of recovereed admixture weights. Coefficients will be within "precision" of MLE estimate.
zthresh: controls outlier output. All f-stats are calculated and those diverging more then "zthresh" from fitted values
printed.e all SNPs even if some populations hav e n data for a SNP (default NO)
*** new feature ***
admixin: <admixwts file>
contains lines such as
admix T S1 S2 w1 w2
where T is a target node, S1, S2 are source nodes and w1 w2 are weights. qpGraph
will rescale these to sum to 1.
These lines are in the same format as the output graph [-o option below] and such a graph can be used
(only admix lines are recognized)
admixout: <admixwts file>
This file is of course easy to hand edit.
If admixin is present then initial try estimation is omitted which will make qpGraph run
very much faster.
inbreed: YES
Genotypes are expected to be pseudo-haploid -- 2 samples at least per population or drift lengths on
leaves are not meaningful.
-g graph
See examples.qpGraph/gr1x
Note comments can be included in file. Begin a line with #
[-o outgraph] This can be hand edited and input again. Note format is different.
qpGraph will accept either style.
[-d dotfile] Graph output in "dot format"; dot -Tps < dotfile > dotfile.ps gives postscript
"dot" must be installed (part of Graphviz)
*** new feature ***
Input can now be an fstats file -- see README.qpfstats
*** new feature ***
halfscore (won't work in combination with fstats)
halfscore: YES
halfjackname: <output file>
Does a VERY conservative goodness of fit test. See halfscore.pdf for details.
===================================================================================================
Utility program: qpreroot
If the input graph is not connected or has loops qpGraph will die unpleasantly.
Best way to see what is happening:
qpreroot -g graph -d dotfile
## This also makes postscript
We also can change the root node
qpreroot -g graph -o outgraph -r node