Concept Whitepaper

Jump to bottom

SchSascha edited this page May 16, 2018 · 4 revisions

This document contains the conceptual background and realization plans for the GePi publication.

Backend

Use ElasticSearch index with new mapping structure
Incorporate daily updates
- Build GePi Pipeline
- Let run daily

Input

A Search
A-B-Search

Types of Input

Gene IDs
UniProt IDs?
Gene Names?

Processing

Mapping to top homology
Rather do mapping by gene name?
Make sure the algorithms do what they are supposed to! Write tests!

Frontend

Pie Chart

Sankey Chart, Simple

How to deal with enlargening the widget? When the widget is larger, more edges could be shown. But: How many? Depends on number of nodes and the size of the nodes, i.e. the abundances

Sankey Chart, Common Partners

Same as with simple sankey chart
How to rank common partner pairs, i.e. which pairs to show?
Currently: a and b have common partner c -> score(a,b,c) = #(a,c) + #(b,c)
- downside is that unbalanced hits are not ranked: weighting the score with max(#(a,c) / #(b,c), #(b,c) / #(a,c)) * 2 could be an option (should be in (0,1] )

Table

Allow sentence filtering by key words
results per sentence table
- when hit is in pmc, also deliver pubmed id
Provide abundance tables
- A-Search: provide abundance of all interaction partners
- A-B-Search:
  - show abundance of A and B members, possibly in separate columns
  - show number of different interaction partners for A and B members (potentially including median and/or mean+std)

Usage and Advantage of GePi Capabilities

Use Cases

Proteomics
Transcriptomics?
Pathways - potential interaction partner?

Evaluation

NatComm scenario against EvexDB
Evaluate with event-corpora