-
Notifications
You must be signed in to change notification settings - Fork 19
NLP Server: Provisioning and Troubleshooting
Almost all of this walkthrough assumes you are SSH'd onto the VM you're setting up. To do so:
This install guide assumes a machine running Ubuntu 18.04 (run lsb_release -a
to determine your version) behind the MITRE firewall and proxy, and will address many of the issues originally encountered during set up. To run the server, the machine will need:
-
Proxy taken care of
- If you don't know how to do this on a MITRE machine, reach out to Dylan Phelan. We don't want to post sensitive information online for others.
-
Git
-
Run the following to install:
sudo apt-get update sudo apt-get install git
-
-
Java 8
-
Installing this was an absolute pain, because of proxy and firewall issues. Below you'll find a very manual walkthrough, though quicker ways of installing may exist. I used these articles in coming up with this walkthrough: https://askubuntu.com/questions/908467/key-server-receive-failed-when-installing-through-ppa
-
Open and edit sources list
sudo nano /etc/apt/sources.list
-
Add the webup8team java PPA manually by adding these lines, based on the archive found here -- the version,
bionic
is based off Ubuntu 18.04deb http://ppa.launchpad.net/webupd8team/java/ubuntu bionic main deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu bionic main
-
Download onto your local machine the PGP key for this java archive from here, or ask Dylan Phelan. Save the file as
java-key.gpg
, for reference in the following steps -
Copy that file from your local machine onto the VM -- run the following from your local machine, leaving open your ssh session:
scp /path/to/local/java-key.pgp [email protected]:~/
-
From the vm now, add this key to your keychain:
sudo apt-key add java-key.gpg
-
Update apt and install the oracle-java8-installer:
sudo apt-get update sudo apt install oracle-java8-installer
-
Check the java version
javac -version
Expecting roughly:
java version "1.8.0_181" Java(TM) SE Runtime Environment (build 1.8.0_181-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
-
Set Java env vars
sudo apt install oracle-java8-set-default
-
-
-
Python 2.7
-
To install Python 2.7 you simply need to do the following in Ubuntu 18.04 in a terminal (they work beautifully side by side out of the box):
# refreshing the repositories sudo apt update # its wise to keep the system up to date! # you can skip the following line if you not # want to update all your software sudo apt upgrade # installing python2.7 and pip for it sudo apt install python2.7 python-pip
-
-
Associated python packages
-
The first part of the
fluxnotes_nlp_ensemble
documentation describes how to do this, so refer to the original documentation or reach out to Dylan Phelan in the case of errors, but it should be enough to be in/path/to/fluxnotes_nlp_ensemble/
directory and run the following:pip2 install -r requirements.txt
-
Again, this is largely copied over from the original documentation, so refer here in the event of issues or reach out to Dylan Phelan. I've tried to distill those instructions down into concrete instructions based on my direct experience on an Ubuntu 18.04 machine, assuming you are in the /path/to/fluxnotes_nlp_ensemble/
directory .
-
All this assumes that you've ssh'd on to the VM:
-
Copy
fluxnotes_nlp.config.in
tofluxnotes_nlp.config
.cp fluxnotes_nlp_config.in fluxnotes_nlp.config
-
Create a directory for the files you need for configuration. We named our directory
config
and ran:mkdir ../config
-
Get a copy of the required setup files from Dylan Phelan or Sam Bayer. Move those files into the
config
directory. Those files will include:-
jcarafe-embedded-server-assembly-0.9.99-bin.jar
- Java Archive file containing the jCarafe server. -
cder_nlm_model
- Model for the jCarafe engine to use -
meddra_20_0_english.zip
- zip containing a variety of meddra terms in different English in different formats. Unzip after. -
stanford-corenlp-full-2018-02-27.zip
- Zip containing Stanford's core NLP library. Unzip after.
-
-
Configure the SERVER_RECORD section:
-
Create a file that the webServices can use to configure the server location:
touch ../config/server_record.json
-
Edit, using your preferred terminal-based editor (for the uninitiated, use
nano
overvi
for ease of use), the content ofserver_record.json
to be:{}
-
Provide a path to
server_record.json
file in thefluxnotes_nlp.config
, changing the server_record section to be:[SERVER_RECORD] # ... JSON = /path/to/config/server_record.json
-
-
Configure the JCARAFE section:
-
Edit
fluxnotes_nlp.config
to provide paths to a) the jCarafe jar, and b) the cder_nlm_model to be used by the jCarafe engine, changing the JCARAFE section to be:[JCARAFE] # This is the absolute path to the jCarafe JAR. JAR = /path/to/config/jcarafe-embedded-server-assembly-0.9.99-bin.jar # Anything that ends in _MODEL will be treated as a model. # These things need to be linked with entries in resources/jcarafe.json # These should be absolute paths. AR_MODEL = /path/to/config/cder_nlm_model
-
-
Configure the MEDDRA section:
-
Edit
fluxnotes_nlp.config
to provide paths to a) the MEDDRA dir you unzipped, and b) where you want the meddra DB file to live after construction, changing the MEDDRA section to be:[MEDDRA] # This is the absolute path of the MedDRA MedAscii directory. MEDASCII_DIR = /path/to/config/meddra_20_0_english/MedAscii # And this is where you want the DB file to live. You must provide # a pathname here. The file will get created on initial server launch MEDDRA_DB_FILE = /path/to/config/meddra.db
-
-
Configure the STANFORD_NLP section:
-
Edit
fluxnotes_nlp.config
to provide a path to Stanford's core NLP library, which you unzipped earlier. Change the STANFORD_NLP section to be:[STANFORD_NLP] # This is the absolute path to the root of the Stanford NLP distribution. # We require 3.9.1 or later, which you can download here: # https://stanfordnlp.github.io/CoreNLP/download.html STANFORD_NLPROOT = /path/to/config/stanford-corenlp-full-2018-02-27
-
-
Configure the SPELLING_CORRECTION section:
-
Create a dir for the spelling correction model. Interestingly, this will not be nested in the config directory. Maybe it can be, but all documentation suggest nesting it in
/path/to/fluxnotes_nlp_ensemble
. So from that dir, run:mkdir -p resources/symspell
-
Build the spelling correction model by running:
python utils/build_symspell_dictionary.py --use_shelve --model_directory resources/symspell/model third_party/python/pysymspell/frequency_dictionary_en_82_765.txt third_party/data/SPECIALIST/specialist_extra_lexicon.txt
We've defined a single start-up script for running the python processes that will continue running even after an SSH session is closed. Don't create this file in the fluxnotes_nlp_ensemble
dir, otherwise Git will try and track it. Follow these steps to create an equivalent file:
-
Create a .sh file:
touch nlp-start-up.sh
-
Change the file so that it is executable:
chmod +x nlp-start-up.sh
-
Edit the file to contain the following script:
#!/bin/sh # Ensures that any nohup.out files exist in the same dir as your config and your scripts cd /home/dphelan/parent/of/fluxnotes_nlp_ensemble-and-config # Run the jCarafe server first nohup python2.7 /home/dphelan/nlp/fluxnotes_nlp_ensemble/bin/run_jcarafe_server.py AR_MODEL & # jCarafe needs to be running before the web server; sleep for a few seconds and then run sleep 15 && nohup python2.7 /path/to/fluxnotes_nlp_ensemble/bin/run_server.py --globally_visible --no_processor_threading &
And voila! Running that script will now trigger the launch of both the jCarafe server and the web server. To run the script from the command line:
./nlp-start-up.sh
You can set a task in your crontab file to specify that, on reboot, you should run this start-up script.
-
Open the VM's crontab file:
crontab -e
-
Add the following your crontab file:
@reboot /path/to/nlp-start-up.sh
To reboot the system, you can run
sudo reboot
We've defined a single kill script for prematurely ending the python and java processes. Don't create this file in the fluxnotes_nlp_ensemble
dir, otherwise Git will try and track it. Follow these steps to create an equivalent file:
-
Create a .sh file:
touch kill-nlp.sh
-
Change the file so that it is executable:
chmod +x kill-nlp.sh
-
Edit the file to contain the following script:
#!bin/sh # Two steps: kill python2.7 processes, and kill jCarafe Java server ps -ef | grep python2.7 | grep -v grep | awk '{print $2}' | xargs kill ps -ef | grep org.mitre.jcarafe.server | grep -v grep | awk '{print $2}' | xargs kill # Make a backup of the current nohup output file mkdir -p backup; mv nohup.out backup/nohup-`date +%Y-%m-%d_%H-%M-%S`.out
And voila! Running that script will now end both the jCarafe server and the web server. To run the script from the command line:
./kill-nlp.sh
If for some reason things go wrong with the manual start up you can run nlp-start-up.sh
manually by navigating to its parent directory and running:
# navigate to the `dir` that contains nlp-start-up.sh
cd /path/to/dir
./nlp-start-up.sh
Run the kill script mentioned above to manually kill the servers by doing the following:
# navigate to the `dir` that contains kill-nlp.sh
cd /path/to/dir
./kill-nlp.sh
Copyright © 2017 The MITRE Corporation | Approved for Public Release; Distribution Unlimited. Case Number 16‑1988
- Home
- About Flux Notes
- Active Treatment Summary Objects
- Data Standards for Breast Cancer
- Database decision
- Declarative Shortcut Format
- Demo Script
- Deployment Plan - Lite Mode
- Dragon Software Information and Troubleshooting
- Flux Notes Lite Demo Script
- How To Create New Shortcuts
- Interaction Between REACT.js Components in Flux
- JavaScript and HTML Code Style Guide
- Key Application Features
- Minimap Evaluation
- Naming Convention for Visual Components
- NLP Server: Provisioning and Troubleshooting
- Pre Release Testing Script
- Profiling and Performance
- Redux Implementation Guide
- Shorthand Context Problem
- Testing
- Third Party Libraries Changes
- Demo Scenarios -- (out of date)