-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Project Proposal]: ERDDAP web logs analysis #34
Comments
Another possible take on this is to build something that can be run as a sidecar to a deployment and use the logs for metrics that can be consumed by a tool like Prometheus. Axiom has done some work in https://github.com/axiom-data-science/erddap-metrics which uses the outward facing status page, but I think it could be extended (or more likely re-written with a framework that is still being developed) to work with logs as a sidecar. |
Looks like a very promising docker sidecar like project here https://github.com/dfo-meds/erddaputil Will check it out properly next week |
@callumrollo We can also reach into the erddap code to improve/optimize some of the logging messages. I have been exercising the maven docker erddap development container as of late (a bit of a kerfuffle over netcdfAll.jar artifacts no longer being published, yada yada...). So, there is an opportunity to examine what could be added/optimized and passed back to erddap as a PR (if desired). Part of the build process exercises the current test suite, so we can ensure we don't break things too badly. |
Interested! |
I'd love for the ERDDAP logs to be more useful for both admins and ERDDAP developers (me and others). One thought is there are a number of logging/analytics services that will likely have support for many of the requested features above (and more). Would it make sense to use one of those existing services? From the ERDDAP developer point of view, there's a lot of data that could be useful for me. In particular reporting errors (so I can fix them without relying on users/admins reporting them to me) and feature usage (to inform prioritization of work). Getting that data from a running ERDDAP to a central point I can access it is the biggest hurdle. |
@ChrisJohnNOAA are you familiar with the log reporting/analysis that Roy and Dale (@rmendels and @dhr-sc) were testing with the SWFSC ERDDAP? Dale showed a demo where he had the output of the logs as an ERDDAP dataset. It was pretty brilliant. This sort of system may offer a solution for providing logs back to you via ERDDAP. |
@ChrisJohnNOAA @jenseva @dhr-sc All credit to Dale. It is entirely his work. |
I haven't seen that yet. I'll take a look and see if it can solve my needs. That said I still want to support improvements to logs for admins. |
As much as I love python, I agree that digging into the ERDDAP source (Java) is the cleaner approach here. The logs could be improved, but it may also be possible to add this information to the status.html page (or similar). |
Please look at the output in the daily emails. A lot of the information may already be there. It summarizes in great detail all the requests. |
I don't have that configured on my ERDDAP; I'll look into it. If the project is about parsing that output we should start by collecting together some example outputs from ERDDAP that this project will be using as input. |
Even if not mailed, I believe but could be wrong, that the file is created in the log directory. Either way, I can provide an example of what we get, but would prefer not to do so publicly, so if you can contact me by email I can send you a sample. It is really quite extensive, breaking access down into all sorts of categories. |
As @rmendels posted, there are daily emails that get logged to erddapData/logs/emailLogyyyy-mm-dd.txt. They look like this: I have tried parsing information from this, but it lacks some of the details that my manager has requested. I have been asked to crunch the numbers on a monthly basis to answer questions like:
So far, I've found it easier to analyse the nginx/apache logs of incoming http requests rather than getting it from ERDDAP's status page, daily emails or logs. Looking at requests is nice as you have very granular, raw data. Not the summarised/binned data that goes into e.g. the daily email report. |
Thank you for taking the time to propose this topic! From the Code Sprint topic survey, this has garnered a lot of interest. Following the contributing guidelines on selecting a code sprint topic I have assigned this topic to @callumrollo. Unless indicated otherwise, the assignee will be responsible for identifying a plan for the code sprint topic, establishing a team, and taking the lead on executing said plan. The first action for the lead is to:
|
In response to a merged PR question: happy to be a participant @callumrollo ! |
Noting here I volunteer to be on-the-ground scrum master as a last resort lest someone else takes the lead. |
Also able to help co-lead on this topic as needed. |
I would also like to contribute to this project, if possible. It would definitely be nice to get some example outputs from ERDDAP that this project will be using as input. I'll email @rmendels about it. |
@apkrelling That sounds like a good place to start. An existing ERRDAP log to parse with above mentioned tools to see if we can come up with answers to these questions. If the log currently does not provide those details, how to add details to the log so that can be done. Three things to also examine is not to do harm to the test suite and determine if these two tools can be used to conjure up the requested information, add too them or start a different tool all together. @callumrollo What might be useful is a good example from nginx/apache log and a script that you use to answer some of those questions. I will also look for my logs on our server. We do not have a lot of traffic, but I have a apache/ERDDAP installation for which log information might be possible. I can at least setup a docker ERDDAP text based development container for hacking. At a minimum, participants will want to install Docker Desktop to begin climbing through the java code. Interactive containers for ERDDAP and Ubuntu can be found here. Build scripts. The VIM editor is included. Use the Items to review: |
Just a note that for running a local ERDDAP, you can use Jetty. Which doesn't require installing anything manually (after maven). https://github.com/ERDDAP/erddap/tree/main/development/jetty |
@ChrisJohnNOAA I am not sure how Jetty does web server logging, but I'd be interested to know if it can produce compatible logs w/Apache or nginx. I will try to look around tomorrow. |
It looks like Jetty request logging needs to be turned on. They look like (from the Jetty documentation, not a real example): This looks compatible with nginx, but I haven't tried parsing one yet. |
I tested with that Jetty example log line you provided, and it does seem to work with the nginx log parser. |
Project Description
Develop a tool that reads in the web logs of an ERDDAP server to analyse how the server is being used. This would include:
Expected Outcomes
A python based tool that ERDDAP admins can use to quickly and easily establish how data from their server are being used
Skills required
Expertise
Novice
Topic Lead(s)
@callumrollo
Relevant links
Work in progress here https://github.com/callumrollo/erddaplogs
The text was updated successfully, but these errors were encountered: