Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation investigation #3631

Closed
djbrooke opened this issue Feb 16, 2017 · 8 comments
Closed

Performance degradation investigation #3631

djbrooke opened this issue Feb 16, 2017 · 8 comments

Comments

@djbrooke
Copy link
Contributor

djbrooke commented Feb 16, 2017

In our retrospective today, @raprasad mentioned that there have been pages throwing performance alerts in New Relic. At first glance, there does not appear to be a common thread with these alerts.

We should investigate what's causing these alerts and determine small steps we can take to keep our performance where we want it.

@raprasad
Copy link
Contributor

raprasad commented Feb 21, 2017

Some numbers for Jan. 21 to Feb. 21 from Google Analytics:

All pages: 5.77* seconds

  • (*all pages, overall average)
  • 6+ seconds: 23.49% of pageviews
  • 7+ seconds: 17.21% of views

Dataset page: 8.53** seconds

  • (**all dataset pages, overall average)

It is not "definitive" whether page slowness is a combination of slow-loading individual pages, overall system load, an unknown bug, or a combination of these.

However, there are known issues previously identified via tools such as google PageSpeeds.

As a first cut for examining issues:

  • Narrow google analytics (GA) to a single day
  • Choose a slow loading page. Example: On 2/21, GA reports 118 seconds to load this page: https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/10766
  • Analyze the actual server logs for the selected slow-loading page. Looking for all the requests prior to and including that page request. e.g. Start with 1 hour of previous requests
    • As per @pameyer, include postgres logs, glassfish logs, solr logs
  • Examine logs for unusual activity
  • Replay requests via a script to examine system load and possible issues.

Barring a "silver bullet"/bad bug being found. Site performance enhancements may be made via:


site_speed_page_timings_-_analytics

@pameyer
Copy link
Contributor

pameyer commented Feb 21, 2017

It might be helpful to have some system-level information from the various servers involved (glassfish application servers, postgres servers, solr servers, apache servers) for the same time window. Assuming sysstat is installed, the contents of /var/log/sa would be a good starting point (possibly with more specific postgres level info if that looks to be the source of the problem).

@raprasad
Copy link
Contributor

raprasad commented Feb 22, 2017

Important to note that stuff below is likely going to come into play--prepare to optimize

  • Finding bugs. Query optimization. Caching. Storing metadata in JSON format

Guidelines for server response time:

  • Target number established by google: 200 ms
  • Example
    • CKAN page: 270 ms
    • open scholar. gking page: sub 200ms
  • Sample DV page with 19 small files: 5700 ms yesterday, 6100 ms today
  • Other sample DV pages are more in the 2000 ms range (10x the guideline)
    • This is only server response time--not time for js/css/rendering/etc

@raprasad
Copy link
Contributor

raprasad commented Feb 22, 2017

note: "prototype" above is not full-featured by any means. But thought is:

  • Most of our traffic is "reads" vs. "writes"
  • Dataset info, especially published dataset info is not going to change. e.g. Use JSON docs instead of re-querying each time

@djbrooke
Copy link
Contributor Author

Great - it looks like we now have enough information to work on this when we pull it into a future sprint.

@djbrooke
Copy link
Contributor Author

djbrooke commented May 9, 2019

Closing in favor of #5824

@pdurbin
Copy link
Member

pdurbin commented May 10, 2019

@djbrooke I'm catching up on email and hope you don't mind that I'm clicking "close" for you. 😄

@pdurbin pdurbin closed this as completed May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants