Generates gene sequences from an input containing the letters of the genetic code.
The input string is typically generated by automatic sequencing of mRNA.
In the industry there is a standard to include metadata and comments to these files by adding lines prefixed with “>” (greater then) character. All data after that character until the end of the line can be ignored.
The remaining data in the input string may contain whitespace characters (space, tab, linefeed, Unicode whitespace) which have no semantic meaning and should be ignored. Apart from whitespace the string should only contain the four characters A, U, G, C which may appear in upper or lowercase representation.
The function should process the string from the first position.
The string contains none, one or more “genes” (sequences of codons terminated by a stop codon) and possible “noise”, a sequence of stop codons. The function should return the genes, each consisting of a sequence of codons and terminated by a single “stop codon”. More stop codons should be treated as noise and be discarded.
The obvious question is why Elixir? Why not Java or Javascript, two of the languages I'm more experienced in?
Well, for starters I've been studying one of the latest rising stars of functional languages and I found that it's very fast (built on top of Erlang VM - if you've never heard of it, it enables WhatsApp & Discord's high availability & performance), more concise (syntax inspired by Ruby) while amazingly feature-rich, flexible and expressive (Clojure, Python, Lisp also served as inspiration among others). Now, I'm relatively new to Elixir (I picked it up in autumn 2020), but the more I'm learning the more impressed I am and the more I love the language. Immutability is the most obvious benefit. But there are many features that make Elixir great: pattern matching, the pipe operator, lazy operations with Streams, sigils, metaprogramming, to name just a few. Yes, most of these features are staples in other languages, but I find their availability and ease of use quite unique. As you might have already noticed that I'm not holding back from describing my encounter and my journey with Elixir in terms of passion and affection, just like these senior programmers do here.
To end this section, I'll just copy the introductory paragraphs from Elixir's official web page:
Elixir is a dynamic, functional language for building scalable and maintainable applications.
Elixir leverages the Erlang VM, known for running low-latency, distributed, and fault-tolerant systems. Elixir is successfully used in web development, embedded software, data ingestion, and multimedia processing, across a wide range of industries.
The solution of the second exercise is based on the Stream module. Streams are lazy, composable enumerables so all the functions in the Stream modules are also lazy (whereas the functions in Enum are eager).
In the MRnaProcessor2 module, do_get_gene is the workhorse function. Stream composition allows to build up the computation which is eventually applied on the Enum reducer function. In the reducer, codons (the building blocks of a gene), or chunks of three letters, are accumulated one by one in a list until a STOP codon is encountered. Additionally, a counter is passed along in the accumulator to compute the position of the currently processed gene. As a gene has been "captured", it is validated and then a secondary effect (such as a DB write) may ocurr. In this implementation, the captured gene or an error and a trace with the last few processed codons will be logged to the console.
See: https://elixir-lang.org/install.html
The easiest way is to use a package manager. On Windows, with Chocolatey, just Run:
$ cinst elixir
The only prerequisite for Elixir is Erlang, version 21.0 or later. When installing Elixir, Erlang is generally installed automatically for you. If that's not the case, consult the documentation on how to install Erlang manually.
It is highly recommended to add Elixir’s bin path to your PATH environment variable to ease development.
Once you have Elixir installed, you can check its version by running:
$ elixir --version.
To compile all the modules in the project and start a BEAM (the Erlang VM) instance:
$ iex -S mix
To launch the main module, from the iex (Interactive Elixir) console:
iex(1)> MRnaProcessor.get_genes("AAAGGGAUG UGA")
or:
iex(2)> MRnaProcessor.get_genes("./dataset/refMrna.fa.txt")
Execute the tests with:
$ mix test
If available in Hex, the package can be installed
by adding m_rna_processor
to your list of dependencies in mix.exs
:
def deps do
[
{:rna_processor, "~> 0.1.0"}
]
end
Documentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/rna_processor.