diff --git a/CHANGES.md b/CHANGES.md new file mode 100644 index 0000000..b07fe4c --- /dev/null +++ b/CHANGES.md @@ -0,0 +1,43 @@ +!NADA Changes + +0.9 - Thu Apr 4 14:51:06 BRT 2013 +---------------------------------- + +* Added support for SQLite database. + +0.6 - Tue Jan 29 16:28:14 BRST 2013 +---------------------------------- +* Included script to clean-up database unnecessary old + data + +0.5 - Wed Jan 23 17:54:23 BRST 2013 +---------------------------------- +* Fixed parsing of metrics that doesn't contain minimum + and maximum especified on perfdata + +0.4 - Thu Jan 3 20:27:33 BRST 2013 +---------------------------------- +* Added some memory allocation protections +* Removed ugly homebrewed string copy function +* Migrating from realloc() to calloc() +* Using strncpy() instead of strcpy() +* Added section `How to help?' into README file + +0.3 - Tue Aug 21 18:52:48 BRT 2012 +---------------------------------- +* Fixed float cast at tolerance calc +* Added option to avoid boundary limit below zero +* Added exponential smoothing as a valid algorithm to calcule + baselines +* Now using NAGIOS_HOSTNAME and NAGIOS_SERVICEDESC at database + level + +0.2 - Mon Jul 9 21:07:55 BRT 2012 +---------------------------------- +* Splitted history table into two different entities to avoid + unneeded info replication +* Added sazonality as a variable option + +0.1-alpha +--------- +* Initial release diff --git a/Changes b/Changes deleted file mode 100644 index d5c06d6..0000000 --- a/Changes +++ /dev/null @@ -1,35 +0,0 @@ -This file contains history of changes for !NADA - -0.9 - Thu Apr 4 14:51:06 BRT 2013 - - Added support for SQLite database. - -0.6 - Tue Jan 29 16:28:14 BRST 2013 - - Included script to clean-up database unnecessary old - data - -0.5 - Wed Jan 23 17:54:23 BRST 2013 - - Fixed parsing of metrics that doesn't contain minimum - and maximum especified on perfdata - -0.4 - Thu Jan 3 20:27:33 BRST 2013 - - Added some memory allocation protections - - Removed ugly homebrewed string copy function - - Migrating from realloc() to calloc() - - Using strncpy() instead of strcpy() - - Added section `How to help?' into README file - -0.3 - Tue Aug 21 18:52:48 BRT 2012 - - Fixed float cast at tolerance calc - - Added option to avoid boundary limit below zero - - Added exponential smoothing as a valid algorithm to calcule - baselines - - Now using NAGIOS_HOSTNAME and NAGIOS_SERVICEDESC at database - level - -0.2 - Mon Jul 9 21:07:55 BRT 2012 - - Splitted history table into two different entities to avoid - unneeded info replication - - Added sazonality as a variable option - -0.1-alpha - - Initial release diff --git a/README b/README deleted file mode 100644 index 2f041f1..0000000 --- a/README +++ /dev/null @@ -1,246 +0,0 @@ -!NADA - !Not ADAptive Thresholds - -Version: 0.09 - -************************ WARNING ************************** - THIS PROJECT IS COMPLETELY EXPERIMENTAL!! - FEEDBACKS ARE WELCOME. -*********************************************************** - - -0 - What is !NADA? - ^^^^^^^^^^^^^^ - -!NADA is a brand new project which intents to insert baseline adaptive thresholds -to Nagios(R) or Icinga monitoring frameworks. By "adaptive" I mean a threshold which -may change through time, accordingly to a given resource behaviour. - -This project is intended those ones, who have already did a question like: -"How can I avoid false positives when monitoring a given server that every -Monday has a higher load average than during the other days?" - - -1 - How does it work? - ^^^^^^^^^^^^^^^^^ - -!NADA requires a MySQL database running or an SQLite installation together with -Nagios(R)/Icinga. It encapsulates your check plugin, parses and stores performance -data into DB, calculates the standard deviation and creates two new metrics, pointing -to the top and bottom of your baseline. If collected value overflow (up or down) the -baseline, !NADA change the plugin return code to CRITICAL thus causing Nagios(R)/Icinga -to alert. - -!NADA' standard behaviour assumes that you are using a week sazonality, if it's -not appropriate, please may the source be with you. - -Let's explain how it works by a simple example: - -If a given check occurs just now (let's say: Monday at 11:07 PM), !NADA will -retrieve the last one hundred Monday =~ 11:07 PM check results from DB. It -will then calculate the stardard deviation using these one hundred check -results and make a good baseline to current check. - - -2. - OK, how can I configure this "thing" ? - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Let's suppose you have a Nagios(R)/Icinga command configuration like this: - -define command{ - command_name check_disk - command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ -} - -You just need to change command to the following and you are ready: - -define command{ - command_name check_disk_baseline - command_line /path/to/nada $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ -} - - -3 - Configure is easy, how about compile? - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -At these early stages I have not prepared any how-to on compilation, however -it's pretty direct and simple. - -You'll need just a C compiler (of course) and mysql-devel package installed -into your system(if you gonna use MySQL, otherwise you need sqlite-devel). -I left a simple Makefile together with the project, so ifyou want to adapt -it to your system, feel free to do it, but don't forget to send me a diff. ;) - -3.1 - MySQL compilation: - -$ make mysql -... - -Then, as root: - -$ make install - -3.2 - SQLite compilation: - -$ make sqlite -... - -Then, as root: - -$ make install - - -I have sucessfully built it with: - -- gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC) -- Linux 3.4.2-1.fc16.x86_64 -- mysql-devel-5.5.24-1.fc16.x86_64 -- sqlite-devel-3.7.7.1-1.fc16.x86_64 - -and also - -- gcc version 4.1.2 20080704 (Red Hat 4.1.2-52) -- Linux 2.6.18-308.1.1.el5PAE -- mysql-devel-5.0.95-1.el5_7.1 -- sqlite-devel-3.3.6-6 - - -To uninstall, run as root: - -$ make uninstall - - -4 - How about configuring MySQL ? - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -You will find a .sql file together with the package. You basically need to run: - -$ mysql -u root -p -A < database-creation.sql - - -5 - How can I configure MySQL database user/password? - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -You can change the user, password and host to your MySQL server by editing -baseline.ini at root source directory before installing or, after running -"make install", edit it under /opt/nada/. - -6 - Important - baseline.ini explained - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -minentries=X - If !NADA couldn't find at least X entries on its database, it will - not apply baseline calculation to determine check state. In this - specific case, !NADA will return the same code returned to it by - plugin execution. As we need a reasonable amount of historic data - to be able to calculate the standard deviation, I chose to let it - at user choise what's the minimun number of entries before start to - actually change plugin's output. X must be an integer. - - Example: - If you set this value to 10, using !NADA in a service which checks - at every 5 minutes, it will take =~ 10 weeks(considering a sazonality - value of 7 - see bellow) to actually start applying baseline - calculation. - - Otherwise if you have a service wich checks at every minute, !NADA - probably will start to calculate the baseline within =~ 2 weeks(again - considering a sazonality value equal to 7). This is an algorithm side - effect: internally, on query execution, it applies by default a - tolerance of 5 minutes when retrieveng data of last weeks, so, for - example, a execution today Fri 14:30:00 will fetch data from all last - Fridays between 14:28:30 and 14:32:30. - -maxentries=X - The opposite of the option above. This determines the maximum - number of historic data retrieved to calculate baseline. Pay - attention to the performance issues implied with this specific - option. X must be an integer. - -sazonality=X - With this option, you specify in days how long your service sazonality - is. So, for example, if your monitored service has a behaviour which - repeats itself every day(a server which load has a considerable - increase every midday) you may define X to 1. In the other hand, if - your service has an increase, let's say, every monday, you may define - sazonality to 7(in other words, one week). X must be an integer. - -tolerance=X - After standard deviation calc, a tolerance index is applied before - define resource's top and bottom limits. X must be an integer, - and it should represent an acceptable percentage tolerance on - monitored resource usage. - -allownegatives=[yes|no] - This options defines if after the calc for deviation has been made, - the value for bottom boundary will be able to remain below 0(zero). - If this option is defined to 'no', and the bottom boundary reamains - below zero, bottom line will become 0. - -baselinealgorithm=[exponential_smoothing|standard_deviation] - Define algorithm to baseline calculation. Avaible algorithms by now - are "Standard Deviation" or "Simple Exponential Smoothing", try - both to see which fits better to your monitored resource. - -[DATABASE] -host=localhost - MySQL server's IP address. - This parameter is ignored in case of SQLite. - -user=root - MySQL user with write/read permissions to `nada` database. - This parameter is ignored in case of SQLite. - -password=mypass - MySQL users's password. - This parameter is ignored in case of SQLite. - -dbname=nada - For MySQL, specify database name(default nada). - For SQLite, it points to valid path where SQLite gonna be - created(i.e. dbname=/tmp/nada) - - -7 - Database Management - ^^^^^^^^^^^^^^^^^^^ - -Included with this package there's an executable that should be scheduled to -run once a day in your crontab. This executable clean all data that NADA -doesn't need anymore, avoiding dabase uncontrolled grow. - -To correctly schedule database clean up process, just use line below on your -crontab configuration: - -5 0 * * * root /opt/nada/purge-db-data >/dev/null 2>/dev/null - - -8 - Why in the hell C? - ^^^^^^^^^^^^^^^^^^ - -The right answer is: because I like it. -You may find a lot of resources pointing that is -far better than C, but really, really, common! I was just trying to experiment -my C in the real world and try to help community :-D - -If you want to implement a brand new shining !NADA version on a 'hype' language, -please remember to advise me, so i can point a link here to your project. - - -9 - How to help? - ^^^^^^^^^^^^ - -To see what's going on with this small project, please look on Changes file, -and if you're willing to help in any way, please contact-me and i'll certainly -have something for you to do. - -10 - Caveats - ^^^^^^^ - -Some well known issues. - -10.1 - Expression tree is too large (maximum depth 1000) - -NADA uses extensively database and, in case of SQLite, there's a limitation -on query depth. Try to decrease baseline.ini's `maxentries' parameter to a -value below 500 and see if the problem persists. - -For further details on this `issue': http://www.sqlite.org/limits.html - diff --git a/README.md b/README.md new file mode 100644 index 0000000..a37b310 --- /dev/null +++ b/README.md @@ -0,0 +1,247 @@ +!NADA +===== + +## !Not ADAptive Thresholds + +Version: 0.09 + +## WARNING! +### THIS PROJECT IS COMPLETELY EXPERIMENTAL! +#### Feedback is welcome +--- + +What is !NADA? +-------------- + +!NADA is a brand new project which intents to insert baseline adaptive thresholds +to Nagios(R) or Icinga monitoring frameworks. By "adaptive" I mean a threshold which +may change through time, accordingly to a given resource behaviour. + +This project is intended those ones, who have already did a question like: +"How can I avoid false positives when monitoring a given server that every +Monday has a higher load average than during the other days?" + + +How does it work? +----------------- + +!NADA requires a MySQL database running or an SQLite installation together with +Nagios(R)/Icinga. It encapsulates your check plugin, parses and stores performance +data into DB, calculates the standard deviation and creates two new metrics, pointing +to the top and bottom of your baseline. If collected value overflow (up or down) the +baseline, !NADA change the plugin return code to CRITICAL thus causing Nagios(R)/Icinga +to alert. + +!NADA' standard behaviour assumes that you are using a week sazonality, if it's +not appropriate, please may the source be with you. + +Let's explain how it works by a simple example: + +If a given check occurs just now (let's say: Monday at 11:07 PM), !NADA will +retrieve the last one hundred Monday =~ 11:07 PM check results from DB. It +will then calculate the stardard deviation using these one hundred check +results and make a good baseline to current check. + + +OK, how can I configure this thing? +------------------------------------- + +Let's suppose you have a Nagios(R)/Icinga command configuration like this: + + define command{ + command_name check_disk + command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ + } + +You just need to change command to the following and you are ready: + + define command{ + command_name check_disk_baseline + command_line /path/to/nada $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ + } + + +Configure is easy, how about compile? +------------------------------------- + +At these early stages I have not prepared any how-to on compilation, however +it's pretty direct and simple. + +You'll need just a C compiler (of course) and mysql-devel package installed +into your system (if you're gonna use MySQL, otherwise you need sqlite-devel). +I left a simple `Makefile` together with the project, so if you want to adapt +it to your system, feel free to do it, but don't forget to send me a diff. ;) + +MySQL compilation (as root) + + make mysql + make install + +SQLite compilation (as root) + + make sqlite + make install + +I have sucessfully built it with: + +* gcc version 4.6.3 20120306 (Red Hat 4.6.3-2) (GCC) +* Linux 3.4.2-1.fc16.x86_64 +* mysql-devel-5.5.24-1.fc16.x86_64 +* sqlite-devel-3.7.7.1-1.fc16.x86_64 + +and also + +* gcc version 4.1.2 20080704 (Red Hat 4.1.2-52) +* Linux 2.6.18-308.1.1.el5PAE +* mysql-devel-5.0.95-1.el5_7.1 +* sqlite-devel-3.3.6-6 + +Un-installation (as root) + + make uninstall + + +How about configuring MySQL? +---------------------------- + +You will find a .sql file together with the package. You basically need to run: + + mysql -u root -p -A < database-creation.sql + + +How can I configure MySQL database user/password? +------------------------------------------------- + +You can change the user, password and host to your MySQL server by editing +baseline.ini at root source directory before installing or, after running +"make install", edit it under /opt/nada/. + +Important - baseline.ini explained +---------------------------------- + +* `minentries=X` + + If !NADA couldn't find at least X entries on its database, it will + not apply baseline calculation to determine check state. In this + specific case, !NADA will return the same code returned to it by + plugin execution. As we need a reasonable amount of historic data + to be able to calculate the standard deviation, I chose to let it + at user choise what's the minimun number of entries before start to + actually change plugin's output. X must be an integer. + + Example: + If you set this value to 10, using !NADA in a service which checks + at every 5 minutes, it will take =~ 10 weeks(considering a sazonality + value of 7 - see bellow) to actually start applying baseline + calculation. + + Otherwise if you have a service wich checks at every minute, !NADA + probably will start to calculate the baseline within =~ 2 weeks(again + considering a sazonality value equal to 7). This is an algorithm side + effect: internally, on query execution, it applies by default a + tolerance of 5 minutes when retrieveng data of last weeks, so, for + example, a execution today Fri 14:30:00 will fetch data from all last + Fridays between 14:28:30 and 14:32:30. + +* `maxentries=X` + + The opposite of the option above. This determines the maximum + number of historic data retrieved to calculate baseline. Pay + attention to the performance issues implied with this specific + option. X must be an integer. + +* `sazonality=X` + + With this option, you specify in days how long your service sazonality + is. So, for example, if your monitored service has a behaviour which + repeats itself every day(a server which load has a considerable + increase every midday) you may define X to 1. In the other hand, if + your service has an increase, let's say, every monday, you may define + sazonality to 7(in other words, one week). X must be an integer. + +* `tolerance=X` + + After standard deviation calc, a tolerance index is applied before + define resource's top and bottom limits. X must be an integer, + and it should represent an acceptable percentage tolerance on + monitored resource usage. + +* `allownegatives=[yes|no]` + + This options defines if after the calc for deviation has been made, + the value for bottom boundary will be able to remain below 0(zero). + If this option is defined to 'no', and the bottom boundary reamains + below zero, bottom line will become 0. + +* `baselinealgorithm=[exponential_smoothing|standard_deviation]` + + Define algorithm to baseline calculation. Avaible algorithms by now + are "Standard Deviation" or "Simple Exponential Smoothing", try + both to see which fits better to your monitored resource. + +`[DATABASE]` + +* `host=localhost` + MySQL server's IP address. + This parameter is ignored in case of SQLite. + +* `user=root` + MySQL user with write/read permissions to `nada` database. + This parameter is ignored in case of SQLite. + +* `password=mypass` + MySQL users's password. + This parameter is ignored in case of SQLite. + +* `dbname=nada` + For MySQL, specify database name(default nada). + For SQLite, it points to valid path where SQLite gonna be + created(i.e. dbname=/tmp/nada) + + +Database Management +------------------- + +Included with this package there's an executable that should be scheduled to +run once a day in your crontab. This executable clean all data that NADA +doesn't need anymore, avoiding dabase uncontrolled grow. + +To correctly schedule database clean up process, just use line below on your +crontab configuration: + + 5 0 * * * root /opt/nada/purge-db-data >/dev/null 2>/dev/null + + +Why in the hell C? +------------------ + +The right answer is: ***because I like it***. +You may find a lot of resources pointing that is +far better than C, but really, really, common! I was just trying to experiment +my C in the real world and try to help community :-D + +If you want to implement a brand new shining !NADA version on a 'hype' language, +please remember to advise me, so I can point a link here to your project. + + +How to help? +------------ + +To see what's going on with this small project, please look at the +[CHANGES](CHANGES.md) and [TODO](TODO.md) files, and if you're willing to help in any way, +please contact-me and I'll certainly have something for you to do. + +If you help out, you'll end up on the [THANKS](THANKS.md) page! + +Caveats +------- + +Some well known issues. + +* Expression tree is too large (maximum depth 1000) + + NADA uses extensively database and, in case of SQLite, there's a limitation + on query depth. Try to decrease baseline.ini's `maxentries' parameter to a + value below 500 and see if the problem persists. + + For further details on this issue: http://www.sqlite.org/limits.html diff --git a/THANKS b/THANKS deleted file mode 100644 index 4a8ebd0..0000000 --- a/THANKS +++ /dev/null @@ -1,4 +0,0 @@ -My special thanks to: - -- Leonardo Vaz -- Josu Gil Arriortua diff --git a/THANKS.md b/THANKS.md new file mode 100644 index 0000000..9716a40 --- /dev/null +++ b/THANKS.md @@ -0,0 +1,4 @@ +## My special thanks to: + +* Leonardo Vaz +* Josu Gil Arriortua diff --git a/TODO b/TODO deleted file mode 100644 index 25b4da1..0000000 --- a/TODO +++ /dev/null @@ -1,2 +0,0 @@ -- Add support for multiple databases and file based storage -- Get rid of `command_line` references diff --git a/TODO.md b/TODO.md new file mode 100644 index 0000000..a039acd --- /dev/null +++ b/TODO.md @@ -0,0 +1,5 @@ +!NADA Todo List +=============== + +- [ ] Add support for multiple databases and file based storage +- [ ] Get rid of `command_line` references