Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: set up gene-alias pairs dataset #14

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c25c665
add function to create collision records with gene symbol and ensg id
anastasiabratulin Jun 18, 2024
9e37997
changed the create yaml function to put new ones in a folder
anastasiabratulin Jun 25, 2024
c63ad9c
updated hgnc and ensg downloads
anastasiabratulin Jun 29, 2024
1be793c
split out dgidb_claims_analysis from total_alias_overlap nb
anastasiabratulin Aug 21, 2024
46d3689
analyze ambiguous symbol usag in pubmed abstracts
anastasiabratulin Aug 21, 2024
038be61
add analysis on dgidb queries
anastasiabratulin Aug 21, 2024
2692bd1
remove dgidb_claim_analysis from this notebook
anastasiabratulin Aug 21, 2024
4ba8926
reorganized files
anastasiabratulin Aug 21, 2024
7878cf6
added TR2 collision record
anastasiabratulin Aug 23, 2024
18c3367
added alias-primary analysis notebook
anastasiabratulin Aug 26, 2024
06b4b75
change file organization
anastasiabratulin Aug 26, 2024
d2514df
split out collision record generation from alias-alias collision anal…
anastasiabratulin Aug 26, 2024
cd1ecc9
concept-alias pair count added to alias-alias collision analysis nb
anastasiabratulin Aug 26, 2024
43d124b
Fixed and ran the collision record generation notebook
anastasiabratulin Aug 28, 2024
e749263
corrected collision record generation nb -records contain associated …
anastasiabratulin Aug 29, 2024
3b0f262
add fields in collision record generator nb for collision gene relati…
anastasiabratulin Aug 29, 2024
74ab24d
reformat alias-alias_collision_analysis nb (previous total_alias_over…
anastasiabratulin Sep 10, 2024
509e374
wip: store progress
anastasiabratulin Sep 11, 2024
8e29915
Merge branch 'issue-7' of https://github.com/cancervariants/gene-harm…
anastasiabratulin Sep 11, 2024
dfa581a
refector alias-primary notebook
anastasiabratulin Oct 3, 2024
de943ec
adding gene- alias matching
anastasiabratulin Oct 17, 2024
28822f9
not making collision records anymore- capturing gene-alias pair relat…
anastasiabratulin Nov 4, 2024
21fb24c
clean up created files and reorganize dgidb analysis folder
anastasiabratulin Nov 6, 2024
140b2ee
created an archive folder for collision record work and old notebooks
anastasiabratulin Nov 6, 2024
e9f1f67
edited dgidb analysis nbs
anastasiabratulin Nov 7, 2024
3564a29
edit readme and change folder names (alias_alias to alias-alias)
anastasiabratulin Nov 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ the data field in which the collision occurs
*Primary gene symbol of the gene being represented by the collision*

### ENSG ID
*Unique identifier number assigned by Ensembl to each gene, begining with ENSG*
*Unique identifier number assigned by Ensembl to each gene, beginning with ENSG*

### Genomic Location (GRCh38/hg38)
*A description of the location of the gene on assembly GRCh38/hg38 including the chromosome, start position, and end position*
Expand Down
Binary file added Downloaded_files/.DS_Store
Binary file not shown.
193,457 changes: 193,457 additions & 0 deletions Downloaded_files/Homo_sapiens.gene_info20240627

Large diffs are not rendered by default.

File renamed without changes.
117,141 changes: 117,141 additions & 0 deletions Downloaded_files/ensg_biomart_gene20240626.txt

Large diffs are not rendered by default.

67,584 changes: 67,584 additions & 0 deletions Downloaded_files/hgnc_biomart_gene20240626.txt

Large diffs are not rendered by default.

49,051 changes: 49,051 additions & 0 deletions Downloaded_files/hgnc_custom_dl_20240828.txt

Large diffs are not rendered by default.

File renamed without changes.
60 changes: 60 additions & 0 deletions alias-alias_collision_records/TR2_collision_record.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@

collision_symbol: 'TR2'
collision_class: 'Protein Product'
collision_type: 'alias-alias'
collision_group:
- gene_symbol: 'TAS1R2'
ensg_id: 'ENSG00000179002'
GRCh38_gene_location: 'chr1:18,839,599-18,859,660 '
gene_size: ''
cytogenetic_location: '1p36.13'
collision_gene_relationship:
- collision_acronym_expansion: 'Taste Receptor type 1 member 2'
collision_association: 'Protein Product'
collision_source:
- PMID: ''
- PMID: ''
- gene_symbol: 'TXNRD3'
ensg_id: 'ENSG00000197763'
GRCh38_gene_location: 'chr3:126,571,779-126,655,124'
gene_size: '83,346'
cytogenetic_location: '3q21.3'
collision_gene_relationship:
- collision_acronym_expansion: 'Thioredoxin Reductase 2'
collision_association: 'Protein Product'
collision_source:
- PMID: '10455115'
- PMID: '17346242'
- gene_symbol: 'NR2C1'
ensg_id: 'ENSG00000120798'
GRCh38_gene_location: 'chr12:95,020,229-95,073,628'
gene_size: '53,400'
cytogenetic_location: '12q22'
collision_gene_relationship:
- collision_acronym_expansion: 'Testicular Receptor 2'
collision_association: 'Protein Product'
collision_source:
- PMID: '17010934'
- PMID: '12361719'
- gene_symbol: 'DEPDC7'
ensg_id: 'ENSG00000121690'
GRCh38_gene_location: 'chr11:33,015,876-33,033,582'
gene_size: '17,707'
cytogenetic_location: '11p13'
collision_gene_relationship:
- collision_acronym_expansion: ''
collision_association: ''
collision_source:
- PMID: ''
- PMID: ''
- gene_symbol: 'TNFRSF14'
ensg_id: 'ENSG00000157873'
GRCh38_gene_location: 'chr1:2,554,234-2,565,382'
gene_size: '11,149'
cytogenetic_location: '1p36.32'
collision_gene_relationship:
- collision_acronym_expansion: 'TNF Receptor 2'
collision_association: 'Protein Product'
collision_source:
- PMID: '9162061'
- PMID: '15507617'
Loading