Skip to content

Latest commit

 

History

History
475 lines (324 loc) · 18.4 KB

Setup.md

File metadata and controls

475 lines (324 loc) · 18.4 KB

System Setup

This document describes a relatively detailed way to set up Mark2Cure so it could run locally on your computer.

The whole guide was tested on a Ubuntu machine. Mac users may find some difference when following this guide. Remember, whenever you have any trouble, Google is your best friend.

Requirements

This guide assumes you know elementary knowledge on Linux command (including Git). If you are having trouble, feel free to visit this website.

You will need a Linux system to set it up. If you are using a Mac, you may want to skip the following section. If you are using Windows OS, however, it is highly recommended that you use a virtual environment that runs Ubuntu. When setting up the local Mark2Cure, make sure you have a stable network connection.

If you can talk to Max right now in person, you may want to ask him what ENTREZ_EMAIL is. If he doesn't know what you are asking, tell him you are trying to fetch documents from PubMed.

Install Ubuntu (skip if you are using Mac)

What you need

  • VirtualBox
  • Ubuntu .ISO image. You can download the desktop LTS version from Ubuntu website

Using VirtualBox

The following guide assumes you are using VirtualBox, and unless otherwise specified, you can use the default settings prompted. Please take note of the bold part below.

  1. Download and install VirtualBox. Open VirtualBox.
  2. Create a new virtual machine by clicking "New". Choose "Linux" and "Ubuntu" and give it a name.
  3. For a smooth experience, you want to allocate as much memory (RAM) for your virtual machine as possible. Recommended memory size is at least 50% of the memory size of the machine you are working. For example, if you use a machine with 4G of RAM, you may want to allocate between 2500MB and 3000MB for this virtual machine.
  4. Create a virtual hard disk (VHD). And allocate at least 16GB of hard disk memory for it. Failure to do so may result insufficient memory to install all the dependencies required to run Mark2Cure.
  5. For the first time you run it, you need to select the original image file of Ubuntu. When prompted, choose the image file you just downloaded. Click "Install Ubuntu" and follow the instructions to customize.
  6. Keep in mind that this is a virtual machine, so when it mentioned "Erase disk" or something similar, it won't do any hard to you real machine. Get yourself a cup of water while Ubuntu is finishing set-up, or keep reading to know what to do next.

Other recommended program(s)

  • MySQL Workbench.

    It is highly recommended that you download this for the sake of easier database processing you will need later. The guide will use MySQL Workbench when testing database.

Deploy development environment

Before actually working on Mark2Cure, we need to make sure every dependency is installed. If you have your own preference or favorite commands, you can install them now.

Also, make a directory somewhere, and clone this GitHub repo. In this section, root or ./ refers to the directory where .git is located. For example, if your .git is located under ~/repos/mark2cure, ./mark2cure/ below will mean ~/repos/mark2cure/mark2cure/

Some commands below assume that you have them installed already, so if your machine doesn't have these programs installed already (like if you have just installed Ubuntu), you will see many prompts that ask you to install them. Do so when appropriate. For a better experience, a program called f**k can be very useful. There is a detailed instruction on how to install following the link.

Dependencies

Before running the following commands, navigate to the root of the repo. When prompted to put the password when installing MySQL, you need to remember that as it will be used later

$ sudo apt-get update
$ sudo apt-get upgrade
$ sudo apt-get install build-essential python python-dev python-pip python-virtualenv libmysqlclient-dev git-core nginx supervisor rabbitmq-server graphviz libgraphviz-dev pkg-config libncurses5-dev npm ruby-dev
$ sudo pip install -r requirements.txt
$ sudo pip install nltk
$ sudo npm install gulp-cli -g
$ sudo npm install gulp -D
$ sudo npm install gulp-compass gulp-if gulp-livereload gulp-clean-css gulp-csso gulp-sass gulp-rename tiny-lr segfault-handler
$ sudo npm install -g bower
$ sudo bower install
$ sudo gem update --system
$ sudo gem install compass
$ sudo ln -s /usr/bin/nodejs /usr/bin/node

Local setup

  • Create a file at ./mark2cure/local_settings.py and put the following lines into the file. This file is what you want to modify if you want to customize anything locally. Note that previously you were asked to remember the password when installing MySQL, and now you will need to put it into {YOUR PASSWORD HERE} below. Also, if you get ENTREZ_EMAIL from Max, you need to put that here as well.
import os.path

SECRET_KEY = "thiscanactuallybesomerandomstring"
DEBUG = True
ACCOUNT_EMAIL_VERIFICATION = 'none'
ENTREZ_EMAIL = '[email protected]' # Put that email address here

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'mark2cure',
        'USER': 'root',
        'PASSWORD': '{YOUR PASSWORD HERE}',
        'HOST': 'localhost',
        'PORT': ''
    }
}

PROJECT_ROOT = os.path.abspath(os.path.dirname(__file__))
  • We would like css and JavaScript files to be preprocessed. To do so, you should run this command everytime the repo is updated

    $ gulp
    
  • Setup database

    • Open MySQL Workbench, and click something like "Local instance" (there should be only one if you just installed it).
    • In the Query tab, run CREATE DATABASE mark2cure;. You should see a new database is created under "SCHEMAS" under the bottom left panel.
    • Go back to console. Run python manage.py migrate. You will see that many data are being migrated.
  • Import the basic training by running python manage.py loaddata fixtures/tasks.json. You will see a line at the end of the message like below. This imports the training data the user has to complete before they can actually do any tasks on Mark2Cure.

    Installed 8 object(s) from 1 fixture(s)
    

Running server

Now we have finished setting up the database, it is time to run the server locally. However, make sure you have checked the tables under scheme mark2cure in the database. If you are seeing many tables, go ahead.

  • Run the server by typing python manage.py runserver. If nothing goes wrong, you should see something below.

    However, keep in mind that once you run this command, you may want to open a new terminal/console. Also, every time you modify something, make sure you stop and re-run the server.

    System check identified no issues (0 silenced).
    Django version 1.9.6, using settings 'mark2cure.settings
    Starting developement server at http://127.0.0.1:8000/
    Quit the server with CONTROL-C
    
  • Visit 127.0.0.1:8000 in your broswer. You will see the front page of Mark2Cure. From here, you are running a local Mark2Cure. The experience should be exactly the same as you would on mark2cure.org

Create a new account

Sign up a new account as you would on mark2cure.org by going over those training steps. Remember the username and password, as you will use this account most of time later. For the email, it does not actually matter what it is as long as it looks like an email address (since we have disabled it by ACCOUNT_EMAIL_VERIFICATION = 'none' above)

Doing training tasks

You will need to do some training before you actually see the dashboard (as you would on the real Mark2Cure). To create groups and see those quests, you also need to finish all those tasks listed on the left.

... or not

If you really do not want to do them, here is an alternative that you could do to skip them by modifying the database. However, these scripts are created by the author of this guide based on his sole observation of how Mark2Cure works. The author has included his analysis and you can modify the SQL scripts accordingly before applying them.

Mark2Cure stores the progress of each user's tasks in the table called task_level. There are different levels for different tasks, so you can append rows here to skip them, for example, by running the following scripts. If you are not very sure what you are doing, also read the note below before applying it.

Create groups of Quests

If you do not know what a Quest is, Quest is a set of Documents you are going to annotate to identify possible concepts (like gene, drug or treatment). The data will be later used in relation tasks.

Fetch documents from PubMed

We first need to fetch the documents from PubMed, and then we will use these documents to create quests.

Documents in PubMed are identified by a unique ID, which is a string of digits. If you want to import specific documents from PubMed, you may want to write them down.

Now, run python manage.py shell, which will open a console and place the cursor after In [1]: .

First time setup

If this is NOT the first time trying to fetch documents from PubMed, you can skip this part.

You need to download nltk library before fetching any documents. To do so, in the shell opened, type in

import nltk
nltk.download()

This will open up an NLTK Downloader, which prompts you five options. Enter d to download, and put all when asked which package to download. The downloading may take some time.

Batch fetch or Fetch a specific document

Type the following scripts into the shell, or you can modify it to fetch certain documents. Please note that there might be a minimum number of documents required to create a quest

import mark2cure.document.tasks

# Importing some random documents
for x in range (27834000, 27834100):
  mark2cure.document.tasks.get_pubmed_document(x);

# Or, you can import a specific document given an ID:
mark2cure.document.tasks.get_pubmed_document(27834101);

Create groups

After the documents are fetched, type the following into the shell. Note that you can edit the group property (name and description) later through the database.

from mark2cure.common.models import Group, Document

G = Group()
G.save()
G.assign(Document.objects.all()) # Here, you can modify it to only put specific documents
G.save()
G.enabled = True
G.save()

To know more about assign function used above, read the source code here (If the link is broken, refer to assign function in ./common/models.py). .

Credit

The original document was written by Runjie Guan, who was an intern working for this project between Sep 2016 to Mar 2017. I have gone over the same procedure described above and can make sure it is accurate as of Mar 2017. This guide is not comprehensive since I did not have the chance to work on every feature of Mark2Cure, so please contact Max if you need help. For example, this guide doesn't tell people how to create relationship tasks, so if you do, please add it into this guide to complete it.

Since the project is developing quite quickly, if you see any problems in this guide, you will probably want to figure it out by yourself, which is what I did before I know how to write this.

========

Setup

Server Dependencies

  • sudo apt-get update
  • sudo apt-get upgrade
  • sudo apt-get install build-essential python python-dev python-pip python-virtualenv libmysqlclient-dev git-core nginx supervisor rabbitmq-server graphviz libgraphviz-dev pkg-config libncurses5-dev

Project Setup

  • Make the python virtual environment and activate it

  • virtualenv mark2cure-venv

  • . /var/www/virtualenvs/mark2cure-venv/bin/activate

  • Make the project folder and download the repo

  • sudo adduser deploy

  • sudo /home/deploy/webapps

  • cd /home/deploy/webapps/

  • git clone https://[email protected]/sulab/mark2cure.git && cd mark2cure

  • Install all the python related dependencies

  • sudo /opt/mark2cure-venv/bin/pip install -r requirements.txt

Database Migrations

  • python manage.py schemamigration APP --auto CHANGE_MESSAGE
  • python manage.py migrate APP

Control

  • . /opt/mark2cure-venv/bin/activate

  • cd webapps/mark2cure/ && git pull origin HEAD

  • sudo supervisorctl restart mark2cure

  • sudo chmod a+x /bin/gunicorn_start

Utils

  • Flow diagram of the database relationships
  • python manage.py graph_models -a -o myapp_models.png

Notes

Mark2Cure detailed installation instructions for developers

These are instructions, tips, and tricks for how to install mark2cure for developers with more detail than above, assuming you might be unfamiliar with much of this technology. Please feel free to let us know if anything is unclear.

Below is the terminal history for installation of mark2cure in a “clean” brand new Macbook Pro. You will likely have a computer that has already been used for development, so you may run into different issues, but hopefully none.

First thing! You will need pip (you probably already have this). pip is important because it is compatible with virtual environments which we will use to compartmentalize different projects that require different versions of software. Mark2cure has a lot of dependencies.

Install [Brew] using the long command below. It’s a package manager for Apple comps. “installs the stuff you need that Apple didn’t.”

$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Sudo install a few things. You will need to have virtual environments installed. For more information, see [Python Virtual Environments].

$ sudo pip install virtualenv

You will also need the virtual environment wrapper: [Virtual Environment Wrapper]

$ sudo pip install virtualenvwrapper

Open the /.bash_profile.

$ nano ~/.bash_profile

Add the following two lines to your bash profile:

export WORKON_HOME=$HOME/.virtualenvs

source /usr/local/bin/virtualenvwrapper.sh

Then, "source" the profile.

$ source ~/.bash_profile

Make a new directory where you will house your development version of mark2cure (and other development projects you may have). You could call this something like "repos." Go into this directory.

$ mkdir repos
$ cd repos/

Clone the repository so that you have an exact copy of mark2cure. This would be done in your “repos” folder. No need to make a folder called “mark2cure” because “git clone” will do this for you. If you want to develop, then you would make a branch later (see git documentation).

$ git clone https://[email protected]/sulab/mark2cure.git

cd into the newly created folder called mark2cure, and make a virtual environment called “mark2cure."

$ mkvirtualenv mark2cure

You want to activate the virtual environment.

$ workon mark2cure

There are two folders called mark2cure. One is a higher level folder, but you should "cd" into the lower level to install the requirements file. The requirements file is just a list of software programs that mark2cure (or any program with a requirements.txt file) is dependent on.

$ cd mark2cure/
$ pip install -r requirements.txt

Databases are needed in mark2cure to save and recall all of the useful annotations provided by the Mark2Curators.

Make sure you download MySQL [Sequel Pro] and obtain a copy of the MySQL "framework" mark2cure database. See Max or Jennifer for this.

$ brew install mysql
$ ln -sfv /usr/local/opt/mysql/*.plist ~/Library/LaunchAgents
$ launchctl load ~/Library/LaunchAgents/homebrew.mxcl.mysql.plist

If you have issues with graphviz and pygraphviz, you can remove them from the requirements.txt file (for now). Other dependencies probably should not be tampered with.

$ cd repos/
$ cd mark2cure/
$ pip install -r requirements.txt

Make new directory called “env_vars”

$ mk_dir env_vars
$ mkdir env_vars
$ cd env_vars/
$ touch development.sh
$ nano development.sh

Inside the development.sh file will live some special information. Please see Max or Jennifer for the file contents.

$ cd mark2cure/
$ source env_vars/development.sh
$ echo $MARK2CURE_DATABASE_URL

You will also need the NLTK database [Natural Language Tool Kit]. Download NLTK using the following commands and follow download instructions in the popup window; download everything.

$ sudo pip install -U nltk
$ python

Running the Python shell will allow you test that the download was successful:

import nltk

Run the unit tests:

$ python manage.py test

At this point, all tests should be passing. After all the tests work correctly, then you can run the server.

$ python manage.py runserver_plus

You should now have a local development version of Mark2Cure!

Install “tree” so that you can see a data tree of your current working directory to see the Mark2Cure application layout.

$ brew install tree

mark2cure development details

When doing manual testing of the mark2cure application (i.e., using the website, and not using the command python manage.py test), you will go to the development server site([Django development server]), and you will realize that you cannot get past the "log in" page. To fix this, you can run these commands to force the development server to let you get to the "test_user" account.

From terminal, use python manage.py shell_plus and run the following Python commands:

User.objects.create_user('test_user', password='password')


This does not solve the issue of a "simulated database with real content for testing"... more to come.

Migration Commands

------ training requirement migrations -----------

for lvl in Level.objects.filter(task_type='re'): r = Requirement.objects.filter(task_type='re', order=lvl.level).first() if(r): lvl.requirement = r lvl.save()

for u in User.objects.all(): print( Level.objects.filter(requirement__task_type='ner', user_id=u.pk).count() )

for u in User.objects.all(): if Level.objects.filter(requirement__task_type='ner', user_id=u.pk).count() >= 2: for idx in range(10,15): try: Level.objects.get(user_id=u.pk, requirement_id=idx) except Level.DoesNotExist: Level.objects.get_or_create(user_id=u.pk, requirement_id=idx) except: pass