Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main'
Browse files Browse the repository at this point in the history
  • Loading branch information
unique379r committed Aug 30, 2024
2 parents 2aeabee + 1bc170c commit a67619a
Show file tree
Hide file tree
Showing 3 changed files with 1,154 additions and 200 deletions.
396 changes: 200 additions & 196 deletions English_EDA/MainNotebook.ipynb

Large diffs are not rendered by default.

952 changes: 952 additions & 0 deletions English_EDA/MainNotebook2.ipynb

Large diffs are not rendered by default.

6 changes: 2 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,17 +127,15 @@ population in a PCA.
See [this notebook](https://github.com/collaborativebioinformatics/tandemrepeats/blob/main/English_EDA/MainNotebook.ipynb) for details.

# Old README


# Outlines ![alt text](https://github.com/collaborativebioinformatics/tandemrepeats/blob/main/imgs/Slide1.png?raw=true)
![alt text](https://github.com/collaborativebioinformatics/tandemrepeats/blob/main/imgs/Slide2.png?raw=true)

Background
===========

For a tutorial on using tdb for programmatic access, see Introduction notebook. The motivation for tdb was that TRs can be better represented as ‘replacements’ of reference sequence spans with contracted/expanded alternate allele sequence. This type of representation removes alignment ambiguities, which TRs are highly susceptible to. Furthermore, VCFs are not a normalized data structure. Each ‘row’ in a VCF can hold multiple alleles and multiple samples. This, combined with the mixed data-types, makes parsing VCF files… unpleasant. The tdb is a normalized database with three tables with information on loci, alleles, and samples. The data can be parsed by standard data science libraries, such as pandas, with ease.

**Methods**
Methods
============

Query #1 - Population Structure
--------------
Expand Down

0 comments on commit a67619a

Please sign in to comment.