index.html

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="description" content="RNA-GPT: Multimodal Generative System for RNA Sequence Understanding">
  <meta name="keywords" content="RNA, GPT, Multimodal, Generative System, RNA Sequence Understanding">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>RNA-GPT: Multimodal Generative System for RNA Sequence Understanding</title>

  <script async src="https://www.googletagmanager.com/gtag/js?id=G-PYVRSFMDRL"></script>
  <script>
    window.dataLayer = window.dataLayer || [];
    function gtag() { dataLayer.push(arguments); }
    gtag('js', new Date());
    gtag('config', 'G-PYVRSFMDRL');
  </script>

  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">
  <link rel="icon" href="./static/images/RNA_GPT.png">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>
<body>

<nav class="navbar" role="navigation" aria-label="main navigation">
  <div class="navbar-brand">
    <a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
      <span aria-hidden="true"></span>
      <span aria-hidden="true"></span>
      <span aria-hidden="true"></span>
    </a>
  </div>
  <div class="navbar-menu">
    <div class="navbar-start" style="flex-grow: 1; justify-content: center;">
      <a class="navbar-item" href="https://yijia-xiao.github.io/">
        <span class="icon">
          <i class="fas fa-home"></i>
        </span>
      </a>
      <div class="navbar-item has-dropdown is-hoverable">
        <a class="navbar-link">More Research</a>
        <div class="navbar-dropdown">
          <a class="navbar-item" href="https://arxiv.org/abs/2408.11363">ProteinGPT</a>
          <a class="navbar-item" href="https://arxiv.org/abs/2310.02469">PrivacyMind</a>
          <a class="navbar-item" href="https://arxiv.org/abs/XXXX.XXXXX">RNA-GPT</a>
        </div>
      </div>
    </div>
  </div>
</nav>

<section class="hero">
  <div class="hero-body">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column has-text-centered">
          <h1 class="title is-1 publication-title">RNA-GPT: Multimodal Generative System for RNA Sequence Understanding</h1>
          <div class="is-size-5 publication-authors">
            <span class="author-block">Yijia Xiao<sup>1</sup>,</span>
            <span class="author-block">Edward Sun<sup>1</sup>,</span>
            <span class="author-block">Yiqiao Jin<sup>2</sup>,</span>
            <span class="author-block">Wei Wang<sup>1</sup></span>
          </div>
          <div class="is-size-5 publication-authors">
            <span class="author-block"><sup>1</sup>University of California, Los Angeles,</span>
            <span class="author-block"><sup>2</sup>Georgia Institute of Technology</span>
          </div>
          <div class="column has-text-centered">
            <div class="publication-links">
              <span class="link-block"><a href="https://arxiv.org/abs/2411.08900" class="external-link button is-normal is-rounded is-dark"><span class="icon"><i class="fas fa-file-pdf"></i></span><span>Paper</span></a></span>
              <span class="link-block"><a href="https://github.com/Yijia-Xiao/RNA-GPT" class="external-link button is-normal is-rounded is-dark"><span class="icon"><i class="fab fa-github"></i></span><span>Code</span></a></span>
              <!-- Add more links if available -->
            </div>
          </div>
        </div>
      </div>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-3">Abstract</h2>
        <div class="content has-text-justified">
          <p>RNAs are vital molecules that carry genetic information essential for life, with significant implications for drug development and biotechnology. However, RNA research is often slowed by the vast amount of literature. To address this, we introduce <strong>RNA-GPT</strong>, a multi-modal RNA chat model that simplifies RNA discovery by leveraging extensive RNA literature.</p>
          <p>RNA-GPT combines RNA sequence encoders with linear projection layers and state-of-the-art large language models (LLMs) for precise representation alignment. This enables it to process user-uploaded RNA sequences and provide concise, accurate responses. Our scalable training pipeline, powered by RNA-QA, automatically gathers RNA annotations from RNACentral using a divide-and-conquer approach with GPT-4o and latent Dirichlet allocation (LDA) to handle large datasets and generate instruction tuning samples.</p>
          <p>Experiments show RNA-GPT effectively handles complex RNA queries, streamlining RNA research. We also introduce RNA-QA, a 407,616 RNA dataset for modality alignment and instruction tuning.</p>
        </div>
      </div>
    </div>
  </div>
</section>


<section class="section">
  <div class="container is-max-desktop">
    <div class="columns is-centered">
      <div class="column is-full-width">
        <h2 class="title is-3">Introduction</h2>
        <div class="content has-text-justified">
          <p>Large language models (LLMs) trained on internet-scale corpora have been shown to perform extraordinarily well on a large array of tasks from Olympiad-level mathematical and scientific reasoning to planning long-term tasks for robotic systems. Recent advances in the biological and medical fields have enabled the adaptation of powerful models to accelerate research, significantly reducing reliance on traditional experiments.</p>
          <p>Since proteins, RNAs, and DNAs can be represented as character strings and a vast amount of sequenced data is readily available, this has created an ideal environment for training language models to predict and generate protein, DNA, and RNA structures and sequences. Protein language models like ESM have successfully encoded protein sequence and structure information, inspiring works such as ProteinGPT and ProtSt, which adapt protein representations into a language-based format, enabling natural language querying of protein data.</p>
          <p>Similar to ESM-2, works like RiNALMo and RNA-FM have utilized the flexible capabilities of LLMs to learn and predict RNA structure and functions. Much like the motivation behind protein research, where proteins are represented as strings of characters, RNAs—with their sequences of five unique nucleotides—have also sparked interest in computational RNA and DNA research using large language models (LLMs).</p>
          <p>While models like ProteinGPT, ProtST, ProteinChat, and ProteinCLIP have made significant progress in aligning protein sequences and structures with textual descriptions, advancements in the DNA and RNA domains are far less advanced. Previous efforts, such as RiNALMo and RNA-FM have mainly focused on specific tasks like promoter or enhancer prediction, and structure and function analysis. ChatNT is among the few models striving to bridge the gap between RNA comprehension and natural language. However, its emphasis is more on performing biological tasks as a conversational agent rather than providing deep RNA understanding and comprehensive dialogue.</p>
          <p>As a result, there is a notable gap in RNA chat models that offer in-depth knowledge. However, applying multimodal LLMs to RNA modeling presents unique challenges, especially in integrating diverse modalities such as textual descriptions, RNA sequences, and structural data.</p>
          <p>To overcome these challenges, we propose a two-step approach to RNA-GPT. First, we utilize the RNA-FM sequence encoder to embed RNA sequences, followed by aligning these sequence representations with natural language through a large, automatically curated QA dataset from RNA Central. Secondly, to ensure our model generates concise and accurate responses, we break down RNA-QA’s abstract summaries into individual QA pairs for instruction tuning, enhancing the model’s ability to deliver clear and relevant answers. We utilize Meta AI’s flagship Llama-3 8B Instruction as our backbone LLM to provide solid general language understanding.</p>
          <p>More specifically, our contributions are as follows:</p>
          <ul>
            <li><strong>Novel Framework:</strong> RNA-GPT is one of the first multi-modal RNA sequence chat models that enables deep, interactive RNA-focused conversations, significantly enhancing the understanding of RNAs for biological research.</li>
            <li><strong>Large-scale Dataset and Collection Pipeline:</strong> We introduce RNA-QA, a QA dataset derived from the RNA Central Database for modality alignment instruction tuning of RNA chat models. We also present our highly scalable collection pipeline that automates the scraping and summarizing of relevant literature on RNA. Using a divide-and-conquer summarization strategy, we ensure that research details are preserved effectively.</li>
          </ul>
        </div>
      </div>
    </div>
  </div>
</section>

<section class="section">
  <div class="container is-max-desktop">
    <!-- Center the entire content within the container -->
    <div class="columns is-centered">
      <div class="column is-full">
        <h2 class="title is-3">Methodology</h2>
        <div class="content has-text-justified">
          <p>RNA-GPT uses the pre-trained RNA-FM sequence encoder to embed RNA sequences, which are then passed through a linear projection layer. This layer learns to map the RNA embeddings to a shared representation space with natural language, enabling alignment with a backbone LLM, for which we chose Meta’s Llama-3 8B model. The training process is divided into two stages:</p>
          <ol>
            <li><strong>Sequence and Modality Alignment:</strong> RNA and natural language representations are aligned.</li>
            <li><strong>Instruction Tuning:</strong> The model is fine-tuned for task-specific QA generation.</li>
          </ol>

          <!-- Figure 1 -->
          <div class="columns is-centered">
            <div class="column is-12">
              <figure class="image">
                <img src="./static/images/MA.png" alt="Modality Alignment Stage">
              </figure>
              <p class="has-text-centered"><strong>Figure 1:</strong> RNA-GPT Modality Fusion &amp; Alignment Stage: we freeze the sequence encoder block and train the linear projection layer to learn how to align RNA sequence representations with text. In the alignment stage, the input to the training is only the projected RNA representation. No text prompts are incorporated in this stage.</p>
            </div>
          </div>

          <p><strong>Modality Alignment Stage (Stage 1):</strong> RNA sequences in the form of strings are first fed into the pre-trained sequence encoder, featuring 12 transformer layers trained with 23 million RNAs from the RNA Central database via self-supervised learning. We utilize a specialized token &lt;RNAHere&gt; for RNA-text modality alignment.</p>
          <p><strong>Instruction Tuning Stage (Stage 2):</strong> In stage 2, we instruction-tune the model using our curated RNA-QA dataset. We break down the full annotations into targeted QA samples with concise answers to specific questions as prediction targets. This allows the chat model to provide more relevant and accurate responses.</p>

          <!-- Figure 2 -->
          <div class="columns is-centered">
            <div class="column is-12">
              <figure class="image">
                <img src="./static/images/IT.png" alt="Instruction Tuning Stage">
              </figure>
              <p class="has-text-centered"><strong>Figure 2:</strong> RNA-GPT Instruction Tuning Stage: we use the RNA representation from the alignment stage and combine it with question prompts for instruction tuning. The model generates answers that are concise and relevant to the questions.</p>
            </div>
          </div>

          <h3>RNA-QA Dataset</h3>
          <p>To achieve modality alignment, we constructed a large-scale dataset from the RNA Central database, comprising 407,616 RNA sequences paired with abstract descriptions.</p>
          <p><strong>Divide and Conquer RNA Literature Summarization:</strong> We begin by filtering RNA sequences from RNA Central, focusing on those indexed with "Lit Scan," yielding around 420,000 RNAs with associated research papers. For the remaining 407,616 RNAs, we scrape and extract abstracts from all relevant literature. We apply LDA topic modeling to group papers by topic, summarizing each group individually. This ensures each summarization focuses on a narrower, cohesive subject area, minimizing information loss.</p>

          <!-- Figure 3 -->
          <div class="columns is-centered">
            <div class="column is-12">
              <figure class="image">
                <img src="./static/images/LDA.png" alt="RNA-QA Dataset Pipeline">
              </figure>
              <p class="has-text-centered"><strong>Figure 3:</strong> RNA-QA uses an automated pipeline to scrape and summarize existing RNA literature. We apply latent Dirichlet allocation (LDA) to group the vast literature on each RNA, and then we summarize each group individually using GPT-4o-mini. These summaries are then combined and refined to produce the final RNA annotation.</p>
            </div>
          </div>

          <p><strong>Data Augmentation:</strong> RNA-GPT decomposes the rich RNA annotations of RNA-QA into more specific QA-pairs for instruction tuning using GPT-4o-mini so that user instructions can be concisely answered.</p>
        </div>
      </div>
    </div>
  </div>
</section>


<section class="section">
  <div class="container is-max-desktop">
    <div class="columns is-centered">
      <div class="column is-full-width">
        <h2 class="title is-3">Experiments</h2>
        <div class="content has-text-justified">
          <p>We trained <strong>RNA-GPT</strong> using the flagship Llama-3 8B model architecture using a smaller subset of 5,000 RNAs and 121,000 QA samples for our initial model. We are in the process of training the larger RNA-GPT that utilizes all 407,616 RNAs of the RNA-QA dataset with millions of QA samples.</p>

          <!-- Table 1 -->
          <div class="columns is-centered">
            <div class="column is-10">
              <div class="has-text-centered">
                <table class="table is-striped is-fullwidth is-centered">
                  <thead>
                    <tr>
                      <th>Metric</th>
                      <th colspan="3">RNA Sequence</th>
                      <th colspan="3">Modality Fusion</th>
                      <th colspan="3">RNA-GPT</th>
                    </tr>
                    <tr>
                      <th></th>
                      <th>S<sub>BERT</sub></th>
                      <th>S<sub>Pub</sub></th>
                      <th>S<sub>GPT</sub></th>
                      <th>S<sub>BERT</sub></th>
                      <th>S<sub>Pub</sub></th>
                      <th>S<sub>GPT</sub></th>
                      <th>S<sub>BERT</sub></th>
                      <th>S<sub>Pub</sub></th>
                      <th>S<sub>GPT</sub></th>
                    </tr>
                  </thead>
                  <tbody>
                    <tr>
                      <td><strong>Precision</strong></td>
                      <td>0.7372</td><td>0.5528</td><td>0.5219</td>
                      <td>0.6929</td><td>0.6507</td><td>0.6655</td>
                      <td>0.8602</td><td>0.7384</td><td>0.7848</td>
                    </tr>
                    <tr>
                      <td><strong>Recall</strong></td>
                      <td>0.7496</td><td>0.5270</td><td>0.5474</td>
                      <td>0.8028</td><td>0.6082</td><td>0.6603</td>
                      <td>0.8404</td><td>0.7208</td><td>0.7561</td>
                    </tr>
                    <tr>
                      <td><strong>F1 Score</strong></td>
                      <td>0.7424</td><td>0.5387</td><td>0.5339</td>
                      <td>0.7403</td><td>0.6283</td><td>0.6627</td>
                      <td>0.8494</td><td>0.7293</td><td>0.7700</td>
                    </tr>
                  </tbody>
                </table>
                <p><strong>Table 1:</strong> RNA-QA (<strong>AIS</strong>): Comparison of RNA Sequence (left), Modality Fusion (middle), and RNA-GPT (right). Embedding base models are BERT, PubMedBERT, and OpenAI's GPT text-embedding-3-large.</p>
              </div>
            </div>
          </div>

          <p>We conducted a series of experiments to assess RNA-GPT's effectiveness both quantitatively and qualitatively, along with ablation studies to ascertain the importance of various modules at different stages. These included the original model (LLM with RNA sequence as text input), the modality-aligned model, and the final instruction-tuned model.</p>

          <!-- Table 2 -->
          <div class="columns is-centered">
            <div class="column is-10">
              <div class="has-text-centered">
                <table class="table is-striped is-fullwidth is-centered">
                  <thead>
                    <tr>
                      <th>Metric</th>
                      <th colspan="3">RNA Sequence</th>
                      <th colspan="3">Modality Fusion</th>
                      <th colspan="3">RNA-GPT</th>
                    </tr>
                    <tr>
                      <th></th>
                      <th>ROUGE-1</th>
                      <th>ROUGE-2</th>
                      <th>ROUGE-L</th>
                      <th>ROUGE-1</th>
                      <th>ROUGE-2</th>
                      <th>ROUGE-L</th>
                      <th>ROUGE-1</th>
                      <th>ROUGE-2</th>
                      <th>ROUGE-L</th>
                    </tr>
                  </thead>
                  <tbody>
                    <tr>
                      <td><strong>ROUGE</strong></td>
                      <td>0.2364</td><td>0.0935</td><td>0.2037</td>
                      <td>0.2239</td><td>0.1364</td><td>0.2091</td>
                      <td>0.5031</td><td>0.3667</td><td>0.4747</td>
                    </tr>
                  </tbody>
                </table>
                <p><strong>Table 2:</strong> RNA-QA (<strong>AIS</strong>): ROUGE Scores for RNA Sequence, Modality Fusion, and RNA-GPT.</p>
              </div>
            </div>
          </div>

          <!-- Figures -->
          <div class="columns is-centered">
            <div class="column is-half has-text-centered">
              <figure>
                <img src="./static/images/RNAGPT_ROUGE.png" alt="ROUGE Score Comparison" />
                <figcaption><strong>Figure 4:</strong> ROUGE Score Comparison</figcaption>
              </figure>
            </div>
            <div class="column is-half has-text-centered">
              <figure>
                <img src="./static/images/RNAGPT_Semantic.png" alt="Semantic Score Comparison" />
                <figcaption><strong>Figure 5:</strong> Semantic Score Comparison</figcaption>
              </figure>
            </div>
          </div>

          <!-- Table 3 -->
          <div class="columns is-centered">
            <div class="column is-10">
              <div class="has-text-centered">
                <table class="table is-striped is-fullwidth is-centered">
                  <thead>
                    <tr>
                      <th>Metric</th>
                      <th colspan="3">RNA Sequence</th>
                      <th colspan="3">Modality Fusion</th>
                      <th colspan="3">RNA-GPT</th>
                    </tr>
                    <tr>
                      <th></th>
                      <th>S<sub>BERT</sub></th>
                      <th>S<sub>Pub</sub></th>
                      <th>S<sub>GPT</sub></th>
                      <th>S<sub>BERT</sub></th>
                      <th>S<sub>Pub</sub></th>
                      <th>S<sub>GPT</sub></th>
                      <th>S<sub>BERT</sub></th>
                      <th>S<sub>Pub</sub></th>
                      <th>S<sub>GPT</sub></th>
                    </tr>
                  </thead>
                  <tbody>
                    <tr>
                      <td><strong>Precision</strong></td>
                      <td>0.7612</td><td>0.5498</td><td>0.5479</td>
                      <td>0.6884</td><td>0.6201</td><td>0.6676</td>
                      <td>0.8620</td><td>0.7173</td><td>0.7568</td>
                    </tr>
                    <tr>
                      <td><strong>Recall</strong></td>
                      <td>0.7654</td><td>0.5512</td><td>0.5649</td>
                      <td>0.8187</td><td>0.5830</td><td>0.6602</td>
                      <td>0.8623</td><td>0.7161</td><td>0.7554</td>
                    </tr>
                    <tr>
                      <td><strong>F1 Score</strong></td>
                      <td>0.7625</td><td>0.5501</td><td>0.5561</td>
                      <td>0.7466</td><td>0.6005</td><td>0.6637</td>
                      <td>0.8609</td><td>0.7165</td><td>0.7560</td>
                    </tr>
                  </tbody>
                </table>
                <p><strong>Table 3:</strong> RNA-QA (<strong>D&amp;C</strong>): Comparison of RNA Sequence (left), Modality Fusion (middle), and RNA-GPT (right). Embedding base models are BERT, PubMedBERT, and OpenAI's GPT text-embedding-3-large.</p>
              </div>
            </div>
          </div>

          <!-- Table 4 -->
          <div class="columns is-centered">
            <div class="column is-10">
              <div class="has-text-centered">
                <table class="table is-striped is-fullwidth is-centered">
                  <thead>
                    <tr>
                      <th>Metric</th>
                      <th colspan="3">RNA Sequence</th>
                      <th colspan="3">Modality Fusion</th>
                      <th colspan="3">RNA-GPT</th>
                    </tr>
                    <tr>
                      <th></th>
                      <th>ROUGE-1</th>
                      <th>ROUGE-2</th>
                      <th>ROUGE-L</th>
                      <th>ROUGE-1</th>
                      <th>ROUGE-2</th>
                      <th>ROUGE-L</th>
                      <th>ROUGE-1</th>
                      <th>ROUGE-2</th>
                      <th>ROUGE-L</th>
                    </tr>
                  </thead>
                  <tbody>
                    <tr>
                      <td><strong>ROUGE</strong></td>
                      <td>0.2472</td><td>0.0964</td><td>0.2182</td>
                      <td>0.0922</td><td>0.0393</td><td>0.0799</td>
                      <td>0.4791</td><td>0.2690</td><td>0.4405</td>
                    </tr>
                  </tbody>
                </table>
                <p><strong>Table 4:</strong> RNA-QA (<strong>D&amp;C</strong>): ROUGE Scores for RNA Sequence, Modality Fusion, and RNA-GPT.</p>
              </div>
            </div>
          </div>

          <p>The results demonstrate that <strong>RNA-GPT</strong> significantly outperforms both the original model and the modality fusion model in terms of precision, recall, F1 score, and ROUGE metrics. This indicates the effectiveness of our two-stage training process and the utility of the RNA-QA dataset.</p>

          <!-- Figures -->
          <!-- 
          <div class="columns is-centered">
            <div class="column is-half has-text-centered">
              <figure>
                <img src="./static/images/RNAGPT_ROUGE.png" alt="ROUGE Score Comparison" />
                <figcaption><strong>Figure 4:</strong> ROUGE Score Comparison</figcaption>
              </figure>
            </div>
            <div class="column is-half has-text-centered">
              <figure>
                <img src="./static/images/RNAGPT_Semantic.png" alt="Semantic Score Comparison" />
                <figcaption><strong>Figure 5:</strong> Semantic Score Comparison</figcaption>
              </figure>
            </div>
          </div>
           -->
          
          <p>Figures 4 and 5 illustrate the performance improvements of RNA-GPT over the baseline models. The ROUGE score comparison shows a significant increase in ROUGE-1, ROUGE-2, and ROUGE-L scores, indicating better overlap with the reference answers. The semantic score comparison, evaluated using BERT, PubMedBERT, and GPT embeddings, demonstrates enhanced semantic similarity between the generated and reference answers.</p>

          <p>These experiments validate the effectiveness of our approach in aligning RNA sequences with natural language representations, enabling the model to generate accurate and relevant responses to complex RNA queries.</p>

        </div>
      </div>
    </div>
  </div>
</section>


<!-- Conclusion section remains the same -->

<footer class="footer">
  <div class="container">
    <div class="content has-text-centered">
      <a class="icon-link" href="https://arxiv.org/abs/XXXX.XXXXX"><i class="fas fa-file-pdf"></i></a>
      <a class="icon-link" href="https://github.com/Yijia-Xiao/RNA-GPT"><i class="fab fa-github"></i></a>
    </div>
    <div class="columns is-centered">
      <div class="column is-8">
        <div class="content">
          <p>This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.</p>
        </div>
      </div>
    </div>
  </div>
</footer>

</body>
</html>