Skip to content

Latest commit

 

History

History
536 lines (458 loc) · 121 KB

maddash_config_server.rst

File metadata and controls

536 lines (458 loc) · 121 KB

Configuring the MaDDash Server

Note

The primary way most users configure MaDDash is with the :doc:`pSConfig MaDDash Agent <psconfig_maddash_agent>`. The agent automatically fills in the values detailed below and most never need to touch any of the options listed. If you find you self in a situation where you do need to make manual configuration changes, then this document provides guidance.

The main configuration file for the maddash-server component is located at /etc/maddash/maddash-server/maddash.yaml

The file is in YAML format. In this file you define the checks you want to run and how they are organized. You can also tweak setting for the embedded web server and various other aspects of the software. The configuration file is broken into the following primary sections:

  • groups - In general, this is the section where you define lists of hosts that you want to check. You can define any number of custom groups in this section. When you define grids these groups will be used to define what constitutes the rows and columns. In theory they don't have to be hosts, but for the perfSONAR checks this is the common case because your dimensions are endpoints of a test.
  • checks - This is where you define how to run each check. For Nagios checks, this means defining the command-line script to run and arguments to pass it.
  • grids - This is where you define how the groups will be applied to the checks. You define the groups that will compose the rows and columns, then define the type of check that will be run.
  • dashboards - This is where you group grids together.

In addition to the sections above, there are also the following optional sections:

  • reports - A set of patterns that can match across one or more cells in individual grids and generate notifications
  • notifications - Here you can define the types of notifications sent (e.g. email) when a pattern defined in the reports is matched.

There are also a number of miscellaneous parameters related to the general operation of the service. Each of these property categories is described in the remainder of this section.

groups

groups are used to define the resources that will compose the rows and columns of a grid. For the perfSONAR checks, this is often used to group the endpoints used in performance tests together (e.g. create a group for all the hosts running OWAMP tests and another for all the hosts running BWCTL tests). The names, members and number of groups is all customizable. groups take the following format:

groups:
    groupName:
        - "member"
        - "member"

All groups go under the group block. In the example above groupName can replaced with any alphanumeric string that does not contain whitespace. For example "myOwampHosts", "campusHosts", or any other string that meets the requirements. This name will be used later in the configuration, so its important to make note of it. In the above example, you should change member to the name of the host or other resource you want in the list.

groupMembers

groupMembers are used to further describe members of a group. Any custom property may be added but only the values listed are understood by the default MaDDash interface. There is also a special map propert that can be used to dynamically set values based on the value of the opposing row/column. groupMembers take the following format where "member" is the name of the member referenced:

groupMembers:
    - id: "member"
      ...member properties...

Below are the list of well-known properties:

Name Type Required Description
id String Yes The group member this is referencing. For example, the hostname or address of a single item in a group list
label String No The value to show when displaying information about the referenced group member.
pstoolkiturl String No The URL of the perfSONAR Toolkit web page.
map YAML Object No Defines properties that change depending on the opposing row/column. The keys of this object are other groupMembers underneath which is a custom set of parameters that can be accessed in template strings using %row.map.<property> and %col.map.<property> respectively. The reserved key default can be used to define a set of properties that apply to hosts not explicitly listed otherwise.

checks

General Parameters

checks are where you provide instructions as to how results should be obtained. Checks take the current format in the configuration file:

checks:
    checkName :
       ...check-parameters...

The checkName can be any alphanumeric string with no whitespace. It is used to identify the check later in the file, so you should note it. For example you could name it something like "owampLossCheck" or "bwctl100Gbps". The following parameters are available for the check:

Name Type Required Description
name String Yes A human-readable name used for display purposes when describing this check
description String Yes Human-readable text describing the purpose of this check. The description field accepts the template variables %row and %col that will be populated with the current row and column values respectively when the check in applied to a grid.
type String Yes The type of check. Currently the software supports net.es.maddash.checks.NagiosCheck, net.es.maddash.checks.PSNagiosCheck, and net.es.maddash.checks.RandomCheck
params YAML Object Type dependent A YAML object containing parameters specific to the check. See the :ref:`type-specific-params` section.
checkInterval int Yes How frequently to run the check in seconds
retryInterval int Yes How frequently to run the check in seconds if it encounters a state different from the current state
retryAttempts int Yes The number of consecutive times a new state must be seen before it changes the state of the check. For example, if a check has been OK for many days, but suddenly a critical is seen, then a critical state must be seen 2 more times before the status will change
timeout int Yes The number of seconds to wait for the check to return. If it does not return in this timeframe, the check is set to the UNKNOWN status.

Type-specific Parameters

Currently the software supports the following types of checks:

  • net.es.maddash.checks.NagiosCheck - This check is performed using Nagios command. The parameters provided describe how to run that command.
  • net.es.maddash.checks.PSNagiosCheck - This check is a perfSONAR Nagios command. It is an extension of net.es.maddash.checks.NagiosCheck, but includes additional fields to collect information necessary to display graphs from the perfSONAR toolkit.
  • net.es.maddash.checks.RandomCheck - This should only be used for testing. This check returns a random result every time it runs.

NagiosCheck

Name Type Required Description
command String or YAML Object Yes The full nagios command to run on the local system. It accepts the template variables and can also be a map where the outer key is the row and inner is the column (both can take special value "default") allowing for deterministic setting of the command.

PSNagiosCheck

Name Type Required Description
command String or YAML Object Yes See :ref:`nagios-check`
maUrl YAML Object Yes The URL of the measurement archive where performance data related to this check may be retrieved. This accepts the template variables listed in the :ref:`psnagios-check-template-vars` section. The object has one key that is called default which will be the default URL used for any cell in a grid. The remaining keys are members of groups assigned to the row. If default and a row key are specified, the row key is preferred for that row. The value of each key is a map where the key is a member of a group assigned to the column or you can use the default key to apply the URL to every column in the row. If default is specified and a specific value for a column, the specific value for the column is preferred. See the default configuration file for a full example.
graphUrl String Yes A URL where a graph of data related to the check can be retrieved. This accepts the template variables listed in the :ref:`psnagios-check-template-vars` section.
metaDataKeyLookup String Yes DEPRECATED A URL where metaDataKeys can be looked up for the data. These are often needed to generate the graph URL. This accepts some of the template variables listed in the :ref:`psnagios-check-template-vars` section. Note: Some variables it cannot accept because it is responsible for generating them.

Nagios Check Template Variables

Name Description
%row The row in the grid associated with this check at the time its run
%col The column in the grid associated with this check at the time its run
%row.<prop> Custom properties defined in the groupMembers section.
%col.<prop> Custom properties defined in the groupMembers section.
%row.map.<prop> Custom properties defined in the groupMembers section that change depending on opposing row or column.
%col.map.<prop> Custom properties defined in the groupMembers section that change depending on opposing row or column.
%maUrl The url of the measurement archive. You can't use this in the maUrl parameters as this is generated from that template.
%maKeyF DEPRECATED A comma-separated list of the metaDataKeys for the forward direction of a test. This cannot be used in metaDataKeyLookup as it is generated after the URL that is called.
%maKeyR DEPRECATED A comma-separated list of the metaDataKeys for the reverse direction of a test. This cannot be used in metaDataKeyLookup as it is generated after the URL that is called.
%srcName DEPRECATED The hostname of the source endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called.
%srcIP DEPRECATED The IP address of the source endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called.
%dstName DEPRECATED The hostname of the destination endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called.
%dstIP DEPRECATED The IP of the destination endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called.
%eventType DEPRECATED The eventType returned by metaDataKeyLookup of the destination endpoint of a point-to-point test. This cannot be used in metaDataKeyLookup as it is generated after the URL is called.
%event.delayBuckets The string http://ggf.org/ns/nmwg/characteristic/delay/summary/20110317
%event.delay The string http://ggf.org/ns/nmwg/characteristic/delay/summary/20070921
%event.bandwidth The string http://ggf.org/ns/nmwg/characteristics/bandwidth/achievable/2.0
%event.iperf The string http://ggf.org/ns/nmwg/tools/iperf/2.0
%event.utilization The string http://ggf.org/ns/nmwg/characteristic/utilization/2.0

grids

grids associate groups with checks and arrange them in a two-dimensional structure. Grids are arranged as a list of objects with the following parameters:

Name Type Required Description
name String Yes A human readable name of the grid
rows String Yes The name of the group that will compose the rows of the grid. This must match a group name defined in the groups section of the configuration file or an error will be returned.
columns String Yes The name of the group that will compose the columns of the grid. This must match a group name defined in the groups section of the configuration file or an error will be returned.
checks List of Strings Yes The name of the check elements that need to be run for each row and column. Each element must match a check name defined under the checks section of the configuration or an error will be returned.
rowOrder String Yes Specifies how the rows should be ordered. Valid values are alphabetical, which will sort them alphabetically, or group which will present them exactly in the order they are defined in the group section.
colOrder String Yes Specifies how the columns should be ordered. Valid values are alphabetical, which will sort them alphabetically, or group which will present them exactly in the order they are defined in the group section.
excludeSelf boolean Yes If set to 1, then a check will not be run where the value of the current row is equal to the value of the current column. If set to 0, then a check will be run in this case.
excludeChecks YAML Object No This excludes individual checks based on the row and column. The structure is a map where the key is the name of the row where you want to exclude a check. It should match a member of the group assigned to the "rows" property of this grid or it can be the special key 'default' that matches every row. The value is a list of columns that should not appear in the grid. An item in the list must be a member of the group assigned to the "columns" property of this grid or the special value "all" which removes all columns for a row. A full example is provided in the default configuration file.
columnAlgorithm boolean Yes Determines which checks will be run. Valid values are as follows: all - Run a check between every row and column; afterSelf - Run a check to every host that's defined after the current row in the 'rows' group; beforeSelf - Run a check to every host that's defined before the current row in the 'rows' group
reports String No References the id field of a report in the reports section to match against this grid.
statusLabels YAML object Yes Describes what each status means. It is structured as a set of key/value pairs where the key is the status and the value is the description of the status. Valid status values are ok, warning, critical, unknown and notrun, and extra. You do not need to define every status if not all are applicable to your check.
statusLabels.extra YAML object Yes Object where you can define custom status labels. Valid keys are value which is an integer identifying the custom state, shortName which is a name to label the state and description which is text that will apear in the GUI legend.

dashboards

dashboards group grids together. You define them as as a list of YAML objects with the following properties:

Name Type Required Description
name String Yes The name you want displayed as the title of the dashboard
grids List of YAML objects Yes The list of grids you want included in this dashboard. Each item in the list has one property name, where you specify the name of the grid. This must map to a name property for one of the defined grids in the configuration file.

reports

reports define patterns that match across one or more cells in a single grid. You define them as as a list of YAML objects with the following properties:

Name Type Required Description
id String Yes The identifier used in grids to reference this rule
rule YAML Object Yes The parent rule object that defines that patterns to match and the "problem" it identifies
rule.type String Yes The type of rule. See :ref:`maddash_config_server-reports-rule_types`.
rule.selector YAML object Type dependent An object describing what cells to look at when evaluating the rule. Valid when rule.type is rule or siteRule.
rule.match YAML object Type dependent An object describing how to determine if this pattern should generate a notification. Valid when rule.type is rule or siteRule.
rule.problem YAML object Type dependent An object describing what to do if the rule matches. Valid when rule.type is rule or siteRule.
rule.rules List of YAML objects Type dependent List of rule objects to evaluate. Valid when rule.type is forEachSite, *matchAll or matchFirst.
rule.site String Type dependent Only valid when rule.type is siteRule. This is the name or the row or column to evaluate
rule.selector.type String Yes The type of selector. See :ref:`maddash_config_server-reports-selector_types`.
rule.selector.rowSite String Yes The name of the row to select when using a selector of type cell.
rule.selector.colSite String Yes The name of the column to select when using a selector of type cell.
rule.selector.rowIndex String Yes The index of the check to select when selecting a row in a match of type cell or check.
rule.selector.colIndex String Yes The index of the check to select when selecting a column in a match of type cell or check.
rule.match.type String Yes The type of match. See :ref:`maddash_config_server-reports-match_types`.
rule.match.status Integers Type dependent For match types status and statusThreshold, the integer value of the status to match.
rule.match.statuses List of Numbers Type dependent For match type statusWeightedThreshold, the list of statuses where the index in the list corresponds to the integer value of the status (starting at 0)
rule.match.threshold Number Type dependent For match type statusThreshold the percentage of checks that must have status and for statusWeightedThreshold the average weight of each selected check
rule.problem.severity Integer Yes The severity of the problem. 0=OK, 1=WARNING, 2=CRITICAL, 3=UNKNOWN
rule.problem.category String Yes A string to define the category of the problem, for example CONFIGURATION or PERFORMANCE.
rule.problem.message String Yes A message describing the problem (i.e. what it means when the rules match)
rule.problem.solutions List of Strings No A list of potential solutions to the problem

Rule Types

Name Description
forEachSite A site in this context is a a group member that is in a row and/or column. This type of rule loops through these unique groupMembers and applies the rule object in the rules property.
matchAll All the sub-rules defined in rules must match for the pattern to match
matchFirst The first sub-rule defined in rules that matches causes this rule to match
siteRule A rule that only looks at the site specified by a site property
rule The fundamental building block of the other rule types that selects a set of cells, matches them against a criteria and defines the problem a match identifies.

Selector Types

Name Description
cell Selects a specific cell for a site given by rowSite and/or colSite
check Selects a an individual check specified by rowIndex and colIndex
column Selects a column
grid Selects the entire grid
row Selects an entire row
site Selects row and column

Match Types

Name Description
status Match if all the checks selected have the given status
statusThreshold Match if the percentage of cells specified by threshold have the given status
statusWeightedThreshold Assign a weight to each status using the statuses array and if the average weight of the selected cells is above threshold generate an alarm

notifications

notifications define when and where to send notifications of problems found in reports. You define them as as a list of YAML objects with the following properties:

Name Type Required Description
name String Yes An identifier that can be any string. Mainly used as a description and may be used by notification plug-in.
type String Yes The type of notification. Current valid values are email or servicenow.
schedule String Yes A cron schedule in MIN HOUR DAY-OF-MONTH MONTH DAY-OF-WEEK format.
problemReportFrequency Integer Yes Frequency in seconds with which to report the same problem.
problemResolveAfter Integer Yes If supported, the length of time to wait after a problem has gone away to mark it as resolved.
minimumSeverity Integer Yes The minimum severity a problem must be to generate a notification where 0=OK, 1=Warning, 2=Critical
filters List of YAML Object Yes A list of filter objects that define which types of problems to generate notifications.
filters.type String Yes The type of filter. Valid values are dashboard, grid, site, and category.
filters.value String Yes The value to match. For example, if this is of type dashboard then this is the name of the dashboard for which notifications should be generated.
parameters YAML Object Yes Type specific parameters. See :ref:`maddash_config_server-notifications-email` and :ref:`maddash_config_server-notifications-servicenow`.

Email Parameters

Name Type Required Description
mailServer String No An object describing how to contact your mail server. If not specified, defaults are used.
mailServer.address String No The IP or hostname of your mail server. Default is 127.0.0.1.
mailServer.port String No The port of your mail server. Default is port 25.
mailServer.username String No The username for authenticating to your mail server. Default is none.
mailServer.password String No The password for authenticating to your mail server. Default is none.
mailServer.useSSL boolean No Indicates whether or not to use SSL when communicating with mail server. Default is false.
from String No The from address of emails sent.
replyTo List of Strings No A list of replyTo addresses for emails sent.
to List of Strings No A list of addresses where the emails should be sent.
cc List of Strings No A list of CC addresses to send copies of the email notifications.
bcc List of Strings No A list of BCC addresses to send blind copies of the email notifications.
subjectPrefix String No A string to append on beginning of the email subject. Note that the subject is the name property of the notification definition.
format String No The format of the email sent. Valid values are html or text. Default is html.
dashboardUrl String No The URL of your dashboard used to add links in emails. If not specified then links will not be included.

ServiceNow Parameters

Name Type Required Description
instance String Yes Name of the ServiceNow instance.
oauthFile String No A YAML file containing a combination of the username, password, clientID, and clientSecret.
clientID String No The client ID to authenticate to ServiceNow
clientSecret String No The client secret to authenticate to ServiceNow.
username String No The username to authenticate to ServiceNow.
password String No The password to authenticate to ServiceNow.
recordTable String No The name of the table to create the record.
recordFields YAML Object Yes A YAML object with the fields to set on creation. This is highly dependent on your ServiceNow setup. The keys are the name of the fields. The values support multiple :ref:`maddash_config_server-notifications-servicenow-template`.
resolveFields YAML Object No A YAML object with the fields to set on resolve. This is highly dependent on your ServiceNow setup. The keys are the name of the fields. The values support multiple :ref:`maddash_config_server-notifications-servicenow-template`.
dashboardUrl String No The URL of the dashboard. This is used to generate links to the dashboard in created records.
duplicateRules YAML Object No Rules used to determine if a record that already exists in ServiceNow matches a problem and what action to take if it does.
duplicateRules.identityFields List of Strings No A list of fields that MUST match in both records for them to be considered equal.
duplicateRules.rules List of YAML Object No A list of rules evaluated in order until one matches that determine the action to take when a record is found that matches the given identity fields.
duplicateRules.rules[N].equalsFields YAML Object No An object where the key is the field to match and the value is what that field must equal for the given updateFields to be applied to the record.
duplicateRules.rules[N].gtFields YAML Object No An object where the key is the field to match and the value is what that field must be greater than for the given updateFields to be applied to the record.
duplicateRules.rules[N].ltFields YAML Object No An object where the key is the field to match and the value is what that field must less than for the given updateFields to be applied to the record.
duplicateRules.rules[N].updateFields YAML Object No The fields to update in the record and the values they should be given if this rule matches. The values support multiple :ref:`maddash_config_server-notifications-servicenow-template`.

ServiceNow Template Variables

Name Description
%br A line break
%problemEntity The source of the problem - either a grid or a site (i.e. a row and/or column)
%gridUrl The URL of the grid
%gridLink An HTML link to the grid
%siteName The name of the site (i.e. row and/or column) that is the source of the alarm
%gridName Name of the grid that is source of the alarm
%isGlobal Boolean indicating whether this affects the entire grid.
%severity The severity of the problem.
%category The problem category.
%solutions A bulleted list of potential solutions.
%name The name of the problem.

General Properties

Name Type Required Description
database String Yes The path to the directory where the database is stored
jobThreadPoolSize Integer No The maximum number of checks that can run in parallel. Defaults to 20
jobBatchSize Integer No The maximum number of checks that can be running or waiting to run in memory. Defaults to 250.
disableScheduler Boolean No If set to 1 then the server will only run as a REST server and not execute any new checks. Default is 0.
skipTableBuild Boolean No If set to 1 then the database tables will not be built and indexes will not be built/rebuilt. The first time you run the server it must be set to 0. After that, you may find that setting it to 1 significantly speeds-up boot time. Keeping it on though has the advantage of rebuilding indexes on startup which can improve query performance.

Web Server Properties

Name Type Required Description
serverHost String No The hostname of the interface where the web server should listen. Defaults to localhost.
http YAML Object Yes (unless https specified) Parameters related to http. See :ref:`http-props` section.
https YAML Object Yes (unless http specified) Parameters related to https. See :ref:`https-props` section.

http properties

Name Type Required Description
port Integer Yes The port on which the web server should listen for HTTP connections
proxyMode String Yes Reserved for future use. Currently let's the server know that if it is behind a proxy. This may be used in later implementation to extract headers that forward information related to authentication.

https properties

Name Type Required Description
port Integer Yes The port on which the web server should listen for HTTPS connections
keystore String Yes The keystore containing the key 'mykey' to use as the ssl server certificate. It should also contain any trusted certificates if doing client authentication.
keystorePassword String Yes The password to access the keystore.
clientAuth String Yes Indicates whether a client to the rest server must have a trusted SSL certificate. Valid values are require, want and off. require means the user MUST have a trusted certificate or the request will be rejected. want means the server will check the certificate if one is presented, but will not reject requests that do not provide one. off means no certificate is required.
proxyMode String Yes Reserved for future use. Currently let's the server know that if it is behind a proxy. This may be used in later implementation to extract headers that forward information related to authentication.