Archie #22

xin-huang · 2023-09-05T12:38:15Z

No description provided.

xin-huang · 2023-12-22T19:08:08Z

sstar/train.py

+    #I think these parameters are NOT necessary for ArchIE - just retained for the signature of preprocess.process_data
+    match_bonus = 1
+    max_mismatch = 1
+    mismatch_penalty = 1


These parameters are necessary for calculating the S* scores, which are also input features in the logistic regression.

match_bonus = 5000 max_mismatch = 5 mismatch_penalty = -10000

xin-huang · 2023-12-24T09:24:10Z

sstar/stats.py

+    mut_num, hap_num = tgt_gt.shape
+    iv = np.ones((hap_num, 1))
+    counts = tgt_gt*np.matmul(tgt_gt, iv)
+    spectra = np.array([np.bincount(np.array(counts[:,idx] > 0).astype('int8'), minlength=hap_num+1) for idx in range(hap_num)])


Line 70 is not correct. The spectra here only contain "single-ton", because counts[:,idx] > 0 returns a boolean array and FALSEs and TRUEs are regarded as 0s and 1s.

Therefore, what this line founds is the total number of mutations that a haplotype contains.

The negative number problem was caused by astype('int8'), because the range of int8 is from -128 to 127, when simulating large sample sizes (e.g. 100 diploid target individuals), negative numbers may occur. To fix it, we can change astype('int8') to astype('int64')

xin-huang · 2023-12-24T10:38:58Z

sstar/train.py

+    #reading of data, preprocessing - i.e., calculating statistics -, and obtaining & labeling of true tracts
+    for replicate1, folder in enumerate(os.listdir(output_dir)):


Using enumerate makes the order of replicates in the feature table not consistent with the order of the folders containing simulated data

This makes it difficult to check the correctness of the calculation for different features.

xin-huang · 2023-12-24T13:10:31Z

sstar/stats.py

+    dist_skew = sps.skew(tgt_dist, axis=1)
+    dist_kurtosis = sps.kurtosis(tgt_dist, axis=1)


skew and kurtosis may be np.nan occasionally, we could replace nan with 0

xin-huang added 30 commits October 18, 2022 14:41

Add stats.py

92d199d

Update stats.py

72bdac6

Update stats.py

4c90fd9

Update stats.py

aa8b409

Update stats.py

f1e3a4a

Update stats.py

facb50b

Update __main__.py

ec980d3

Update tests

81a7dec

Update utils.py

2c3acfa

Add preprocess.py and models.py

f6cf8f6

Update preprocess.py and utils.py

1a26c02

Update preprocess.py and stats.py

8a124a3

Update preprocess.py and stats.py

c4a7147

Update stats.py

66d06f8

Update preprocess.py and stats.py

9769d4d

Update preprocess.py and stats.py

b7eda27

Update preprocess.py

8eea572

Update preprocess.py

329f9dd

Add files

5f4ddef

Add docstrings in preprocess.py

ec18d82

Add test_stats.py

b95ada8

Update test_utils.py

7109465

Update cal_match_rate.pyt

37d1bb3

Add test_preprocess.py

e4d0478

Update

be93900

Update tests

ea53b6a

Update train.py

5494712

Update train.py

bca8911

Update train.py

e37f9bc

Update train.py

d880374

xin-huang and others added 5 commits December 16, 2022 03:04

Update train.py

13ad25f

logistic classifier

f6db2dd

parallelization, improved accuracy calculation

ad86e4e

functions removed, comments added,...

d5e6b74

Update conda env

bcfbcea

xin-huang commented Dec 22, 2023

View reviewed changes

xin-huang commented Dec 24, 2023

View reviewed changes

xin-huang closed this Feb 11, 2024

xin-huang deleted the archie branch February 15, 2024 22:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archie #22

Archie #22

xin-huang commented Sep 5, 2023

xin-huang Dec 22, 2023

xin-huang Dec 24, 2023 •

edited

Loading

xin-huang Dec 24, 2023

xin-huang Dec 25, 2023

xin-huang Dec 24, 2023

xin-huang Dec 24, 2023

xin-huang Dec 24, 2023

		#reading of data, preprocessing - i.e., calculating statistics -, and obtaining & labeling of true tracts
		for replicate1, folder in enumerate(os.listdir(output_dir)):

		dist_skew = sps.skew(tgt_dist, axis=1)
		dist_kurtosis = sps.kurtosis(tgt_dist, axis=1)

Archie #22

Archie #22

Conversation

xin-huang commented Sep 5, 2023

xin-huang Dec 22, 2023

Choose a reason for hiding this comment

xin-huang Dec 24, 2023 • edited Loading

Choose a reason for hiding this comment

xin-huang Dec 24, 2023

Choose a reason for hiding this comment

xin-huang Dec 25, 2023

Choose a reason for hiding this comment

xin-huang Dec 24, 2023

Choose a reason for hiding this comment

xin-huang Dec 24, 2023

Choose a reason for hiding this comment

xin-huang Dec 24, 2023

Choose a reason for hiding this comment

xin-huang Dec 24, 2023 •

edited

Loading