Gather information about the requests to and from RC API #1157

adinuca · 2018-11-19T08:30:54Z

Why

To reduce the number of calls to Elasticsearch we need to understand who triggers them.

What

Data about requests made to RC is gathered from the access logs
Top 5 of requests made is determined for a time interval
Data about requests made to RC-API is gathered from the access logs
Top 5 of requests made is determined for a the same time interval
Data about number of requests made to Elasticsearch is retrieved from Elastic Cloud for the same time interval

Notes

adinuca · 2018-12-13T09:25:59Z

Based on Lumen documentation, there are no statistics about cache usage, but we could generate them by using the EventServiceProvider.

adinuca · 2018-12-13T09:44:22Z

More monitoring from Elastic Cloud can be found here : https://3b83e4a3efda4f5e8c2f8f2ec07c0fc2.us-east-1.aws.found.io:9243/app/monitoring#/elasticsearch/nodes

adinuca · 2018-12-13T12:30:43Z

Today from 1am to 2am (GMT) there was a spike of requests to Elasticsearch(table shows hours using GMT+2 timezone):

Requests from AWS for that time frame have been centralised here: https://docs.google.com/spreadsheets/d/1mhHmy6n9m3PMY-BLzLhDXz3jFL6FECRJ8sW5oE09S18/edit#gid=153453992

adinuca · 2018-12-14T14:05:22Z

Yesterday from 3pm to 4pm there was another spike of requests.
Requests have been centralised here: https://docs.google.com/spreadsheets/d/1mhHmy6n9m3PMY-BLzLhDXz3jFL6FECRJ8sW5oE09S18/edit#gid=1773113440&fvid=1689222157

Elasticsearch request rate for one node:

adinuca · 2018-12-14T14:09:33Z

To obtain the above data, I have downloaded the AWS ALB logs(from S3: aws s3 cp s3://nrgi-lb-logs/AWSLogs/877912432675/elasticloadbalancing/us-east-1/2018/12/13 . --recursive) and ran the following script:

import os
import gzip

cwd = os.getcwd()
logs_dir = os.path.join(cwd, 'rc_logs')
filenames = os.listdir(logs_dir)

full_filepaths = [os.path.join(logs_dir, f) for f in filenames]
only_files = [f for f in full_filepaths if os.path.isfile(f) and ('resource-contracts-lb' in f or 'rc-subsite-master-lb' in f)]

logs = os.path.join(cwd, "final_log")
with open("%s" % logs, 'a') as target:
    for f in only_files:
        print f
        with gzip.open(f, 'r') as zip_ref:
            target.write(zip_ref.read(gzip))


curated_file_path = os.path.join(cwd, 'curated_sorted_lines_log.csv')
with open(logs, 'r') as f:
    with open(curated_file_path, 'w') as cf:
        for line in f:
            if "staging" not in line :
                line = line.replace("\"GET ", "GET_").replace("\"POST ", "POST").replace("\"HEAD ", "HEAD").replace("\"OPTIONS ", "OPTIONS")
                tokens = line.split(' ')
                date = tokens[1]
                if date > "2018-12-13T14:59:59.140792Z" and date < "2018-12-13T16:00:00.140792Z":
                    system = tokens[2]
                    path = tokens[12]
                    newline = '{},{},{}\n'.format(date, system, path)
                    cf.write(newline)

The resulted file was then imported to GDrive.

adinuca · 2018-12-14T14:39:02Z

On the 13th, between 1 and 2 am, the following requests have been made:

adinuca · 2018-12-14T14:48:13Z

On the 13th of December, between 3pm and 4pm, the following requests have been made:

adinuca · 2018-12-24T07:40:24Z

On the 23rd of December, between 5pm and 6pm, the following requests have been made:

Data has been gathered here: https://docs.google.com/spreadsheets/d/1mhHmy6n9m3PMY-BLzLhDXz3jFL6FECRJ8sW5oE09S18/edit#gid=296562686&fvid=1368567753

adinuca · 2019-01-07T12:02:40Z

For every page viewed on the RC subsites, there is a request made to retrieve the list of contracts corresponding to the viewed page, and another 25 requests made to RC API to retrieve the details of each contract. When sorting the documents on the page, the process repeats.
For each contract viewed, there are 3 requests made to RC API to retrieve data about the contract.
The requests are made using the open-contract-id, not the id of the contract, as it is done for the search page. Requests are for annotations, metadata and text
For each /contract/contractID/metadata request made, there are 3-4 requests made to ES

adinuca · 2019-01-07T12:25:25Z

#1172 , #1173 , #1174 were raised to deal with the findings.

adinuca self-assigned this Nov 19, 2018

adinuca added the Vitamin label Nov 19, 2018

charlesyoung added this to the Unscheduled milestone Dec 7, 2018

adinuca closed this as completed Jan 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gather information about the requests to and from RC API #1157

Gather information about the requests to and from RC API #1157

adinuca commented Nov 19, 2018 •

edited

Loading

adinuca commented Dec 13, 2018

adinuca commented Dec 13, 2018

adinuca commented Dec 13, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 24, 2018

adinuca commented Jan 7, 2019

adinuca commented Jan 7, 2019

Gather information about the requests to and from RC API #1157

Gather information about the requests to and from RC API #1157

Comments

adinuca commented Nov 19, 2018 • edited Loading

Why

What

Notes

adinuca commented Dec 13, 2018

adinuca commented Dec 13, 2018

adinuca commented Dec 13, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 14, 2018

adinuca commented Dec 24, 2018

adinuca commented Jan 7, 2019

adinuca commented Jan 7, 2019

adinuca commented Nov 19, 2018 •

edited

Loading