Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eel pond mRNAseq tutorial #2

Open
wants to merge 18 commits into
base: old
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion README.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
Documentation for the MSU Next-Gen Sequence Analysis Course, and other things,
too.

As per http://ged.msu.edu/angus/,

License:

This documentation and all textual/graphic site content is licensed
under the Creative Commons - 0 License (CC0). Please feel free to
copy, modify, distribute and perform the work, even for commercial
purposes, all without asking permission.

You can find the source code for this material under git version
control on github at https://github.com/ngs-docs/edda. Please fork at
your own leisure :)

However, presentations (PPT/PDF) and PDFs are the property of their
respective owners and are under the terms indicated within the
presentation. If no terms are indicated, please do not reuse without
specific and explicit permission - i.e. all presentations and PDFs are
Copyright (C) their authors, all rights reserved, unless otherwise
stated.

Titus Brown
[email protected]

11 changes: 10 additions & 1 deletion doc/_templates/layout.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,23 @@
{%- endif %}
<div class="body">
{% block body %} {% endblock %}
<hr>
<font size="-1"><b>LICENSE:</b>
This documentation and all textual/graphic site content is licensed
under the <a href='http://creativecommons.org/publicdomain/zero/1.0/'>
Creative Commons - 0 License
(CC0)</a> -- <a href='https://github.com/ngs-docs/edda'>fork @
github</a>. Presentations (PPT/PDF) and PDFs are the property of
their respective owners and are under the terms indicated within the
presentation.</font>

<hr>
{{ comments() }}
</div>
{%- if render_sidebar %}
</div>
{%- endif %}
</div>

{%- endblock %}

<div class="clearer"></div>
Expand Down
78 changes: 78 additions & 0 deletions doc/mrnaseq/0-download-and-save.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
===========================================
0. Downloading and Saving Your Initial Data
===========================================

The basics
----------

Amazon is happy to rent disk space to you, in addition to compute time.
They'll rent you disk space in a few different ways, but the way that's
most useful for us is through what's called Elastic Block Store. This
is essentially a hard-disk rental service.

There are two basic concepts -- "volume" and "snapshot". A "volume" can
be thought of as a pluggable-in hard drive: you create an empty volume of
a given size, attach it to a running instance, and voila! You have extra
hard disk space. Volume-based hard disks have two problems, however:
first, they cannot be used outside of the "availability zone" they've
been created in, which means that you need to be careful to put them
in the same zone that your instance is running in; and they can't be shared
amongst people.

Snapshots, the second concept, is the solution to transporting and
sharing the data on volumes. A "snapshot" is essentially a frozen
copy of your volume; you can copy a volume into a snapshot, and a
snapshot into a volume.

Getting started
---------------

Run through :doc:`saving-data-persistently` once, to get the hang of
the mechanics. Essentially you create a disk; attach it; format it;
and then copy things to and from it.

Downloading and saving your data to a volume
--------------------------------------------

There are *many* different ways of getting big sequence files to and
from Amazon. The two that I mostly use are 'curl', which downloads
files from a Web site URL; and 'ncftp', which is a robust FTP client
that let's you get files from an FTP site. Sequencing centers almost
always make their data available in one of these two ways.

.. note::

To use ncftp on your Amazon instance, you may need to install it::

apt-get -y install ncftp

For example, to retrieve a file from an FTP site, you would do something
like::

cd /mnt
ncftp -u <username> ftp://path/to/FTP/site

use 'cd' to find the right directory, and then::

>> mget *

to download the files. Then type 'quit'. You can also use 'curl' to
download files one at a time from Web or FTP sites.

Once you have the files, figure out their size using 'du -sk' (e.g. after the
above, 'du -sk /mnt' will tell you how much data you have saved under /mnt),
and go create and attach a volume (see :doc:`saving-data-persistently`).

This data is now something that will *stick around* when you shut down
your instance. It's a good rule of thumb to do "savepoints" -- whenever
you complete a big chunk of work, think about saving the data at that
point. I've broken the mRNAseq tutorial down into chunks of work where
you can do this -- after each Web page, basically.

Some test data
--------------

To get started with multfile analysis and assembly, I've provided some
test data. It's on snapshot 'snap-f5a9dea7', so go create a volume from
that and mount it as '/data' to get started.

Loading