Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show the WHOLE webserver traffic in froxlors statistics #884

Open
rseffner opened this issue Oct 6, 2020 · 9 comments
Open

show the WHOLE webserver traffic in froxlors statistics #884

rseffner opened this issue Oct 6, 2020 · 9 comments
Labels

Comments

@rseffner
Copy link

rseffner commented Oct 6, 2020

Summary

At the moment only outgoing webserer traffic with known http status codes is calculated for froxlors traffic statistics.
With changes in apaches LogFormat an parsing awstats data-file we should be able to count ALL the traffic going through the webserver.

System information

  • Froxlor version: 0.10.20
  • Web server: apache2
  • OS/Version: Debian 10

Steps to reproduce

  1. look into apache.conf for LogFormat containing %O
  2. look into awstatsMMYYYY-DOMAIN.TLD.txt datafile between BEGIN_DOMAIN and END_DOMAIN and comapre with values between BEGIN_TIME and END_TIME
  3. look into froxlors source which part of awstats data file is interpreted for traffic statistics

Expected behavior

  1. log sum of outgoing AND incoming web traffic (use %S instead of %O in apaches LogFormat)
  2. sum from "bandwidth" AND "not viewed bandwidth" from _TIME (instead "bandwith" from _DOMAIN) part in awstats data file

Actual behavior

  1. only outgoing traffic is logged by apache
  2. only traffic of known/defined http-codes and not from robots ("bandwidth" instead of "not viewed bandwidth") is counted by froxlor from awstats data file

AWStats splits the traffic data between the Viewed and non viewed traffic. AWStats's explanation on non viewed traffic is "Not viewed traffic includes traffic generated by robots, worms, or replies with special HTTP status codes."

@rseffner
Copy link
Author

rseffner commented Oct 6, 2020

There are also _FILETYPES and _DOWNLOAD sections in awstats data file. While - for an example - in _FILETYPES in row PDF is a value of 12.536.762 the sum of all PDF files mentioned in _DOWNLOADS section is 53.230.834.

Sum of FILETYPES equals sum of _DOMAIN and equals sum of _TIME column "bandwidth". As we learned from awstats we have to add _TIME column "bandwidth not viewed" to catch traffic from robots, malware and with special HTML-return codes.

Another point seem to be to add also the sum of _DOWNLOAD section to get the WHOLE traffic (because it differs from _FILETYPES which equals _DOMAINS/_TIME-bandwidth).

Why there is no TOTAL line in awstats?

@tobyX
Copy link

tobyX commented Feb 5, 2021

I also stumbled over this because I wondered why my Nextcloud domain has very little traffic. That is because if you use WebDav this traffic wont show up in _DOMAIN, but in _LOGIN.
I was not able to find a documentation about Awstats Data File, how to read it correctly?
At the moment I'm thinking about reading the Apache Log directly for calculating traffic, I found a little Perl snippet which does it quite good and fast:
cat access.log | perl -nE '/\[.+\] ".+" \d+ (\d+)/; $sum += $1; END {$sum = $sum / 1024 / 1024; printf("%.3f MB", $sum)}'
Something like that in PHP for Froxlor should also work, I think.

At the moment Froxlors traffic calculation is totally unreliable and has many issues (Systemd rotates the logs normally before Froxlor can calculate them and the two problems mentioned here).

@tobyX
Copy link

tobyX commented Feb 11, 2021

I added a crude implementation of manual counting of traffic directly in the logfiles and logged this and what was found in Awstats and it differs widly, most of the times Awstats is only half of directly counting.
But I suspect that the part about "BEGIN_TIME" counts every traffic. I will try to confirm this and if it is so then I will add a pull request to change the counting in Froxlor.

But I think a better idea would be to change the system totally. I made some major change to my Apache logs, for example I rotate every day and the rotated logs are postfixed with the date, which makes it very easy for everybody to find the correct logs.
So my suggestion would be to do something like this also in Froxlor and then let Systemd rotate the logs at midnight and then we could to the calculation of http traffic at a later date (to relax load) and just look at the file from yesterday (and also assign the traffic to yesterday, currently the traffic from yesterday is written to DB with the date of today, which is confusing).
Is there interest in such a big change?

@d00p
Copy link
Member

d00p commented Feb 11, 2021

well we decided years ago to let the traffic calculation handle projects that are made for that (webalizer, awstats). So the "main" problem here would be a wrong/incorrect transfer of webalizer/awstats values to the froxlor-database to display for admin/customers

@tobyX
Copy link

tobyX commented Feb 11, 2021

Ok, then I will check if my assumption is correct about the TIME section and if yes I will send a pull request with a fix.

@d00p
Copy link
Member

d00p commented Jun 16, 2021

any news on that @tobyX ?

@tobyX
Copy link

tobyX commented Jun 16, 2021

Sorry, I did let my code run for some months, but the numbers never added up at all and sometimes even where negative and I didnt found where the error is. And then other pressing issues came up... I will try to do it again and find out what went wrong.

@d00p
Copy link
Member

d00p commented Jul 22, 2021

So, I've just check on this a little deeper:

As we learned from awstats we have to add _TIME column "bandwidth not viewed" to catch traffic from robots, malware and with special HTML-return codes.

Wrong, the _TIME column only shows the viewed traffic, when adding up the values it's exactly the same as _DOMAIN

From what I've read, we need to add the viewed traffic and not viewed traffic - So I checked in the data file, the not viewed parts are _ROBOT, _WORMS and _ERRORS but wenn adding these up, I get more than awstats shows for "not viewed traffic". Also when adding up _DOMAIN entries and dividing by 1024 - i'm still getting more KB than awstats shows for "viewed traffic" ...no idea where awstats gets these numbers from its own data-file...i might be missing something.

Any ideas?

@d00p d00p added the feedback label Jul 22, 2021
@d00p
Copy link
Member

d00p commented Nov 5, 2022

We've integrated 'goaccess' into the next major version of froxlor which will also be the new default

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants