Merge remote-tracking branch 'origin/main'

collaborativebioinformatics · Aug 30, 2024 · a67619a · a67619a
2 parents 2aeabee + 1bc170c
commit a67619a
Show file tree

Hide file tree

Showing 3 changed files with 1,154 additions and 200 deletions.
diff --git a/English_EDA/MainNotebook.ipynb b/English_EDA/MainNotebook.ipynb
diff --git a/English_EDA/MainNotebook2.ipynb b/English_EDA/MainNotebook2.ipynb
diff --git a/README.md b/README.md
@@ -127,17 +127,15 @@ population in a PCA.
 See [this notebook](https://github.com/collaborativebioinformatics/tandemrepeats/blob/main/English_EDA/MainNotebook.ipynb) for details.
 
 # Old README
-
-
 # Outlines ![alt text](https://github.com/collaborativebioinformatics/tandemrepeats/blob/main/imgs/Slide1.png?raw=true)
 ![alt text](https://github.com/collaborativebioinformatics/tandemrepeats/blob/main/imgs/Slide2.png?raw=true)
 
 Background
 ===========
-
 For a tutorial on using tdb for programmatic access, see Introduction notebook. The motivation for tdb was that TRs can be better represented as ‘replacements’ of reference sequence spans with contracted/expanded alternate allele sequence. This type of representation removes alignment ambiguities, which TRs are highly susceptible to. Furthermore, VCFs are not a normalized data structure. Each ‘row’ in a VCF can hold multiple alleles and multiple samples. This, combined with the mixed data-types, makes parsing VCF files… unpleasant.  The tdb is a normalized database with three tables with information on loci, alleles, and samples. The data can be parsed by standard data science libraries, such as pandas, with ease.
 
-**Methods**
+Methods
+============
 
 Query #1 - Population Structure
 --------------