ReCiter is a highly accurate system for guessing which publications in PubMed a given person has authored. ReCiter includes a Java application, a DynamoDB-hosted database, and a set of RESTful microservices which collectively allow institutions to maintain accurate and up-to-date author publication lists for thousands of people. This software is optimized for disambiguating authorship in PubMed and, optionally, Scopus.
ReCiter can be installed on a locally controlled server or using services provided by Amazon Web Services (AWS). For those looking to install ReCiter on AWS, this cloud repository provides a CloudFormation template.
CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts. This file serves as the single source of truth for your cloud environment.
As the below installation instructions explain, you can choose to install a template with a Scopus integration or without a Scopus integration.
- Create an AWS user and account through AWS console
- Create an AWS user here.
- Navigate to the IAM (Identity and Access Management) managed service.
- Go to
Users
and click onAdd User
- User name could be anything, but let's choose
svc-reciter.
- For
Access Type
, select asProgrammatic Access.
- Click on
Next Permissions.
- Click on
Attach existing policies directly.
- Use the filter to find and select the policy,
AdministratorAccess.
- Click on
Next Review
- Create user.
- Click on
Download credentials.
(You will only able to view the credential once. Store in a secure location.)
- Install the AWS CLI (command-line interface) on your local machine.
- Verify that you have Python installed, preferably Python 3.4 or greater.
- To check on your version of Python, enter the following in Terminal:
python --version
- If Python is not the proper version, enter:
brew install python
- To check on your version of Python, enter the following in Terminal:
- Use PIP to install the AWS CLI:
pip3 install awscli --upgrade --user
- Check version using
aws --version
- Setup AWS profile
- In Terminal, enter
aws configure --profile reciter
- Input: the access key and secret key you got from creating the AWS user; also, input the region you want to run ReCiter in.
- Your profile should now be set up, but let's run a test to see if its setup we will use the cli to get our AWS account number.
- Enter
aws sts get-caller-identity --output text --query 'Account' --profile reciter
???
- Verify that you have Python installed, preferably Python 3.4 or greater.
- Import the Reciter-PubMed CloudFormation template in AWS console
- Go to the CloudFormation service in AWS console
- Click on
Create stack
- For template source, select Amazon S3 url.
- You may choose to enter either:
- "No Scopus" template - https://reciter-workshop.s3.amazonaws.com/aws-elasticbeanstalk-master-stack-noscopus.json
- Includes "Scopus" template(use if you have scopus subscription and have its api-key and insttoken) - https://reciter-workshop.s3.amazonaws.com/aws-elasticbeanstalk-master-stack.json
- Click on
Next
- Enter a name for the stack, e.g.,
reciter-workshop-master-stack
- For
ApplicationPubmedEnvPubmedApikey
, use9ab81e95f12df169b4e40c02719f76db8308
. Although we recommend getting your own api key from Pubmed website - For
ApplicationReciterEnvAMAZONAWSACCESSKEY
andApplicationReciterEnvAMAZONAWSSECRETKEY
, give the keys you created in IAM user. - Enter DNS names for the ApplicationCNAMEPubmed and ApplicationCNAMEReciter fields.
- DNS names must be regionally unique.
- To verify a DNS is available, enter the following in Terminal where is your profile name that you setup in previous step and include your preferred dns prefix:
aws elasticbeanstalk check-dns-availability --cname-prefix <your preferred dns prefix> --profile <profile-name>
- Suggestion: to avoid conflicts, include your personal institutional ID.
- Click
Next
, and add tags. These tags will be attached to all the resources that are created with this stack. - Check the two acknowledge boxes and click "Create stack.
- That should create your application hosting stack with load balancer and 1 instance to host the application.
- This generally takes ~10 minutes to create. In the meantime, we can import the second stack.
- Configure your GitHub account.
- If you haven't done so already, create a Github Account.
- Visit the settings, associated with your personal Github account.
- Click on
Developer Settings
- Go to Personal Access Tokens.
- Click on
Generate new token
. - In the
Note
field, enterreciter-workshop
(or whatever alternative you wish). - Check
public_repo
and theGenerate token
button below. - Note this token in a secure place.
- Fork the ReCiter repository to your personal GitHub account.
- Go to the ReCiter repository.
- Click on the
Fork
button. - Go to the application,properties file as it is forked on your personal account as located here:
https://github.com/<your-github-username>/ReCiter/blob/master/src/main/resources/application.properties
- Edit the file and find
aws.s3.use.dynamic.bucketName
and set that flag =true
:aws.s3.use.dynamic.bucketName=true
- Under commit message, enter
Dynamic bucket generation
. - Click
Commit
- Edit the file and find
aws.dynamodb.settings.region
and set that to appropriate region you want all your dynamodb tables to be: for e.g. if you are in us-east-1(N. Virginia)aws.dynamodb.settings.region=us-east-1
- If you dont want to use scopus then ddit the file and find
use.scopus.articles
and set that to false:use.scopus.articles=false
- Under commit message, enter
set dynamodb region and scopus flag
. - Click
Commit
- Fork the ReCiter-Pubmed-Retrieval tool repository to you personal GitHub account.
- Go to the ReCiter PubMed Retrieval Tool.
- Fork the ReCiter-Scopus-Retrieval-Tool repository to you personal GitHub account. Do this if you have used the cloudformation template with scopus included and also if you have valid scopus subscription.
- Go to the Scopus repository
- Fork the ReCiter-Publication-Manager repository to you personal GitHub account.
- Go to the ReCiter publication manager repository
- Click on the
Fork
button. - Go to the local,js file as it is forked on your personal account as located here:
https://github.com/<your-github-username>/ReCiter-Publication-Manager/blob/master/config/local.js
- Edit the file and put your reciter endpoint and reciter-pubmed endpoint that you specified as the CNAME prefix in the previous cloudformation template. Those endpoint will be used by the application to manage the publication data. Also make sure the adminApikey is also the same as specified in the previous master cloudformation template.
- Import the CI/CD (continuous integration/delivery) CloudFormation template in AWS console
- Before we proceed, we need to verify that the Reciter-PubMed CloudFormation template has been competely installed. In CloudFormation home, ensure that you see the
UPDATE_COMPLETE
status message for the Reciter-PubMed CloudFormation template. If this installation is not complete, wait until it is. - At the AWS cloudformation console, click on
Create stack
. - Enter the S3 URL we will be using:
https://reciter-workshop.s3.amazonaws.com/aws-ci-cd-master-stack-noscopus.yml
orhttps://reciter-workshop.s3.amazonaws.com/aws-ci-cd-master-stack.yml
if you have scopus subscription. - Click on
Next
- Enter a stack name:
reciter-ci-cd
- For GitHubToken, enter the token we generated for your personal GitHub account.
- In the GitHubUser field, enter your GitHub username.
- The remainder of fields can be set to their default.
- Click on
Next
and acknowledge the checkboxes - After review, click
Create stack
- Wait for stack to finish by looking for the
UPDATE_COMPLETE
status message.
- Before we proceed, we need to verify that the Reciter-PubMed CloudFormation template has been competely installed. In CloudFormation home, ensure that you see the
- Use ReCiter in production.
- Visit the CodePipeline service.
- If the stack has finished installing, you should see three pipelines: ReCiter and ReCiterPubmed and ReCiterPublicationManager. You will see four pipelines additionally ReCiterScopus if you used the template with Scopus.
- You can check the status of each pipeline as it goes through the process.
- Click on ReCiter. As you can see, it is pulling the changes for our source repository and then building the application. You can click on
Details
in the Build section to see live logs of the build process - When the build is complete, go the URL for ReCiter.
- Use the CNAME you entered above for ReCiter, and go to a URL that has this general form:
http://<cname>.<region>.elasticbeanstalk.com/swagger-ui/index.html
- If you have trouble finding this URL, go here and click on ReCiterService
- You can do the same for the ReCiter Pubmed Service and other services that we created.
- Teardown of resources - When you are finished experimenting with or using your AWS account, you should clean up the resources associated with it to avoid incurring charges for resources that you are not using.
- Optional: go to CloudWatch and see which services are being used.
- Empty any S3 buckets that have been created here. There will be one created for codepipeline with bucket name codepipeline-- e.g. codepipeline-us-east-1-. Use the console to empty it or you can use the terminal to delete it as well using
aws s3 rm s3://<bucket-name> --recursive --profile <profile-name>
- Select the “master” version of any CloudFormation stacks and delete them Cloudformation. Delete cannot proceed if your setup is still using services. This may take several minutes.
- Wait for delete to be completed. When it is completed, it should say “No stacks.”
- Go to DynamoDB console. Delete all the tables one by one by clicking on delete table button.
- Voila you have deleted all your resources and should have an empty account.