Skip to content

NLP Server: Provisioning and Troubleshooting

Dylan Phelan edited this page Sep 10, 2018 · 2 revisions

Provisioning the NLP Server

Getting onto the VM:

Almost all of this walkthrough assumes you are SSH'd onto the VM you're setting up. To do so:

Assumptions and Requirements

This install guide assumes a machine running Ubuntu 18.04 (run lsb_release -a to determine your version) behind the MITRE firewall and proxy, and will address many of the issues originally encountered during set up. To run the server, the machine will need:

  • Proxy taken care of

    • If you don't know how to do this on a MITRE machine, reach out to Dylan Phelan. We don't want to post sensitive information online for others.
  • Git

    • Run the following to install:

      sudo apt-get update
      sudo apt-get install git
      
  • Java 8

    • Installing this was an absolute pain, because of proxy and firewall issues. Below you'll find a very manual walkthrough, though quicker ways of installing may exist. I used these articles in coming up with this walkthrough: https://askubuntu.com/questions/908467/key-server-receive-failed-when-installing-through-ppa

      1. Open and edit sources list

        sudo nano /etc/apt/sources.list
        
      2. Add the webup8team java PPA manually by adding these lines, based on the archive found here -- the version, bionic is based off Ubuntu 18.04

        deb http://ppa.launchpad.net/webupd8team/java/ubuntu bionic main 
        deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu bionic main 
        
      3. Download onto your local machine the PGP key for this java archive from here, or ask Dylan Phelan. Save the file as java-key.gpg, for reference in the following steps

      4. Copy that file from your local machine onto the VM -- run the following from your local machine, leaving open your ssh session:

        scp /path/to/local/java-key.pgp [email protected]:~/
        
      5. From the vm now, add this key to your keychain:

        sudo apt-key add java-key.gpg
        
      6. Update apt and install the oracle-java8-installer:

        sudo apt-get update
        sudo apt install oracle-java8-installer
        
      7. Check the java version

        javac -version
        

        Expecting roughly:

        java version "1.8.0_181"
        Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
        Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
        
      8. Set Java env vars

        sudo apt install oracle-java8-set-default
        
  • Python 2.7

    • To install Python 2.7 you simply need to do the following in Ubuntu 18.04 in a terminal (they work beautifully side by side out of the box):

      # refreshing the repositories
      sudo apt update
      # its wise to keep the system up to date!
      # you can skip the following line if you not
      # want to update all your software
      sudo apt upgrade
      # installing python2.7 and pip for it
      sudo apt install python2.7 python-pip
      
  • Associated python packages

    • The first part of the fluxnotes_nlp_ensemble documentation describes how to do this, so refer to the original documentation or reach out to Dylan Phelan in the case of errors, but it should be enough to be in /path/to/fluxnotes_nlp_ensemble/ directory and run the following:

      pip2 install -r requirements.txt
      

Configuring the Server

Again, this is largely copied over from the original documentation, so refer here in the event of issues or reach out to Dylan Phelan. I've tried to distill those instructions down into concrete instructions based on my direct experience on an Ubuntu 18.04 machine, assuming you are in the /path/to/fluxnotes_nlp_ensemble/ directory .

  1. All this assumes that you've ssh'd on to the VM:

  2. Copy fluxnotes_nlp.config.in to fluxnotes_nlp.config.

    cp fluxnotes_nlp_config.in fluxnotes_nlp.config
    
  3. Create a directory for the files you need for configuration. We named our directory config and ran:

    mkdir ../config
    
  4. Get a copy of the required setup files from Dylan Phelan or Sam Bayer. Move those files into the config directory. Those files will include:

    1. jcarafe-embedded-server-assembly-0.9.99-bin.jar - Java Archive file containing the jCarafe server.
    2. cder_nlm_model - Model for the jCarafe engine to use
    3. meddra_20_0_english.zip - zip containing a variety of meddra terms in different English in different formats. Unzip after.
    4. stanford-corenlp-full-2018-02-27.zip - Zip containing Stanford's core NLP library. Unzip after.
  5. Configure the SERVER_RECORD section:

    1. Create a file that the webServices can use to configure the server location:

      touch ../config/server_record.json
      
    2. Edit, using your preferred terminal-based editor (for the uninitiated, use nano over vi for ease of use), the content of server_record.json to be:

      {}
      
    3. Provide a path to server_record.json file in the fluxnotes_nlp.config, changing the server_record section to be:

      [SERVER_RECORD]
      # ... 
      JSON = /path/to/config/server_record.json
      
  6. Configure the JCARAFE section:

    1. Edit fluxnotes_nlp.config to provide paths to a) the jCarafe jar, and b) the cder_nlm_model to be used by the jCarafe engine, changing the JCARAFE section to be:

      [JCARAFE]
      # This is the absolute path to the jCarafe JAR.
      JAR = /path/to/config/jcarafe-embedded-server-assembly-0.9.99-bin.jar
      # Anything that ends in _MODEL will be treated as a model.
      # These things need to be linked with entries in resources/jcarafe.json
      # These should be absolute paths.
      AR_MODEL = /path/to/config/cder_nlm_model
      
  7. Configure the MEDDRA section:

    1. Edit fluxnotes_nlp.config to provide paths to a) the MEDDRA dir you unzipped, and b) where you want the meddra DB file to live after construction, changing the MEDDRA section to be:

      [MEDDRA]
      # This is the absolute path of the MedDRA MedAscii directory.
      MEDASCII_DIR = /path/to/config/meddra_20_0_english/MedAscii
      # And this is where you want the DB file to live. You must provide
      # a pathname here. The file will get created on initial server launch
      MEDDRA_DB_FILE = /path/to/config/meddra.db
      
  8. Configure the STANFORD_NLP section:

    1. Edit fluxnotes_nlp.config to provide a path to Stanford's core NLP library, which you unzipped earlier. Change the STANFORD_NLP section to be:

      [STANFORD_NLP]
      # This is the absolute path to the root of the Stanford NLP distribution.
      # We require 3.9.1 or later, which you can download here:
      # https://stanfordnlp.github.io/CoreNLP/download.html
      STANFORD_NLPROOT = /path/to/config/stanford-corenlp-full-2018-02-27
      
  9. Configure the SPELLING_CORRECTION section:

  10. Create a dir for the spelling correction model. Interestingly, this will not be nested in the config directory. Maybe it can be, but all documentation suggest nesting it in /path/to/fluxnotes_nlp_ensemble. So from that dir, run:

    mkdir -p resources/symspell
    
  11. Build the spelling correction model by running:

    python utils/build_symspell_dictionary.py --use_shelve --model_directory resources/symspell/model third_party/python/pysymspell/frequency_dictionary_en_82_765.txt third_party/data/SPECIALIST/specialist_extra_lexicon.txt 
    

Start-up Script:

We've defined a single start-up script for running the python processes that will continue running even after an SSH session is closed. Don't create this file in the fluxnotes_nlp_ensemble dir, otherwise Git will try and track it. Follow these steps to create an equivalent file:

  1. Create a .sh file:

    touch nlp-start-up.sh
    
  2. Change the file so that it is executable:

    chmod +x nlp-start-up.sh
    
  3. Edit the file to contain the following script:

    #!/bin/sh
    # Ensures that any nohup.out files exist in the same dir as your config and your scripts
    cd  /home/dphelan/parent/of/fluxnotes_nlp_ensemble-and-config
    # Run the jCarafe server first
    nohup python2.7 /home/dphelan/nlp/fluxnotes_nlp_ensemble/bin/run_jcarafe_server.py AR_MODEL &
    # jCarafe needs to be running before the web server; sleep for a few seconds and then run
    sleep 15 && nohup python2.7 /path/to/fluxnotes_nlp_ensemble/bin/run_server.py --globally_visible --no_processor_threading &
    

And voila! Running that script will now trigger the launch of both the jCarafe server and the web server. To run the script from the command line:

./nlp-start-up.sh

Setting Up Start-On-Boot

You can set a task in your crontab file to specify that, on reboot, you should run this start-up script.

  1. Open the VM's crontab file:

    crontab -e
    
  2. Add the following your crontab file:

    @reboot /path/to/nlp-start-up.sh
    

To reboot the system, you can run

sudo reboot

Kill Script:

We've defined a single kill script for prematurely ending the python and java processes. Don't create this file in the fluxnotes_nlp_ensemble dir, otherwise Git will try and track it. Follow these steps to create an equivalent file:

  1. Create a .sh file:

    touch kill-nlp.sh
    
  2. Change the file so that it is executable:

    chmod +x kill-nlp.sh
    
  3. Edit the file to contain the following script:

    #!bin/sh
    # Two steps: kill python2.7 processes, and kill jCarafe Java server
    ps -ef | grep python2.7 | grep -v grep | awk '{print $2}' | xargs kill
    ps -ef | grep  org.mitre.jcarafe.server | grep -v grep | awk '{print $2}' | xargs kill
    # Make a backup of the current nohup output file
    mkdir -p backup; mv nohup.out backup/nohup-`date +%Y-%m-%d_%H-%M-%S`.out
    

And voila! Running that script will now end both the jCarafe server and the web server. To run the script from the command line:

./kill-nlp.sh

Troubleshooting: Manual Service Startup

If for some reason things go wrong with the manual start up you can run nlp-start-up.sh manually by navigating to its parent directory and running:

# navigate to the `dir` that contains nlp-start-up.sh
cd /path/to/dir
./nlp-start-up.sh

Troubleshooting: Manual Service Shutdown

Run the kill script mentioned above to manually kill the servers by doing the following:

# navigate to the `dir` that contains kill-nlp.sh
cd /path/to/dir
./kill-nlp.sh
Clone this wiki locally