diff --git a/doc/outdated/ESMON_User_Manual_Raw_cn.md b/doc/outdated/ESMON_User_Manual_Raw_cn.md index 0097db5..61de5c0 100644 --- a/doc/outdated/ESMON_User_Manual_Raw_cn.md +++ b/doc/outdated/ESMON_User_Manual_Raw_cn.md @@ -2,7 +2,7 @@ -##简介 +## 简介 *ESMON* 是一款基于多种开源软件的监控系统,它通过采集DDN Exascaler的系统状态信息以达到对其进行性能监控及分析的目的。 DDN同时还开发了一些外部插件以作功能扩展。 @@ -20,7 +20,7 @@ - **DDN IME**: DDN 的无限内存引擎(*Infinite Memory Engine*) 是一款本地化闪存,由软件定义的存储缓存,它简化了应用程序读取路径,消除了系统瓶颈。 - **Lustre**: Lustre文件系统是一种开源的并行文件系统,它满足了许多高性能计算仿真环境的需求。 -###DDN Collectd 插件 +### DDN Collectd 插件 为支持更多不同功能,DDN添加了一些附加的Collectd插件。 @@ -32,16 +32,16 @@ - **Stress 插件:** Stress插件可以从collectd向服务器推送大量指标数据,以便在高压下对收集系统的性能进行基准测试。 - **Zabbix 插件:** Zabbix插件将指标数据从collectd发送至Zabbix系统。 -##安装要求 +## 安装要求 -###部署服务器 +### 部署服务器 - 操作系统版本: CenOS7/RHEL7 - 硬盘空闲空间: > 500 MB。所有安装日志将被保存于部署服务器的 */var/log/esmon_install* 目录下,须占用一定空间。 - 网络: 部署服务器须能对监控服务器和被监控客户端发起无密码提示的SSH连接。 - *ESMON* ISO 镜像 : ESMON ISO 镜像须在部署服务器上可用。 -###监控服务器 +### 监控服务器 - 操作系统版本: CenOS7/RHEL7 - 硬盘空闲空间: \> 5GB。监控服务器运行有Influxdb,须预留更大空间以容纳更多数据写入Influxdb。 @@ -53,9 +53,9 @@ - 硬盘空闲空间: > 200MB。必要的RPMs将被保存于 /var/log/esmon_install 目录下,须占用一定空间。 - 网络: 被监控客户端须运行SSHD,以便通过无密码提示的SSH与监控服务器连接。 -##安装过程 +## 安装过程 -###1. 在部署服务器上安装ESMON RPM +### 1. 在部署服务器上安装ESMON RPM 1. 将 *ESMON* ISO 镜像文件拷贝至部署服务器上,如 /ISOs/esmon.iso. @@ -71,7 +71,7 @@ # rpm -ivh /media/RPMS/rhel7/esmon*.rpm ``` -###2. 在部署服务器上更新配置文件 +### 2. 在部署服务器上更新配置文件 配置文件 */etc/esmon_install.conf* 包含了所有安装的必要信息。例如: @@ -112,7 +112,7 @@ server_host: **server_hosts**, 包含了所有主机中ESMON server 包安装路径和配置详情。当**erase_influxdb** 为真时,所有 Influxdb 中的数据和原数据都将被完全擦除**。**通过启用**erase_influxdb**可解决 Influxdb 的数据损坏问题。当**drop_database**为真时**, Influxdb **中的 ESMON database将被丢弃,反之将被保留。注意,只有不再需要Influxdb中数据和原数据时才可启用 **erase_influxdb **和 **drop_database。** -###3. **在集群上运行安装程序** +### 3. **在集群上运行安装程序** 在*/etc/esmon_install.conf*部署服务器上正确更新后, 运行以下命令在集群启动安装程序。 @@ -122,17 +122,17 @@ server_host: 所有可用于调试的相关日志将被保存在*/var/log/esmon_install*目录下。 -###4. 访问监控网络页面 +### 4. 访问监控网络页面 Grafana 服务将自动在监控服务器启动 。默认HTTP 端口为 3000。通过访问该端口可跳转至登录页面,默认用户名密码皆为 “admin”。![Login Dashboard](pic/login.jpg) -##页面概览 +## 页面概览 在主页上 (Home dashboard),可通过选择不同的模块页面浏览由 ESMON 收集的不同数据指标。 ![Home Dashboard](pic/home.jpg) -###Cluster Status +### Cluster Status *Cluster Status* 页面显示了集群中服务器的状态信息概要。 其中,面板的背景颜色与服务器的运行状态相关 : @@ -154,7 +154,7 @@ Grafana 服务将自动在监控服务器启动 。默认HTTP 端口为 3000。 | ![Cluster Status Dashboard](pic/cluster_status.jpg) -###Lustre Status +### Lustre Status *Lustre Statistics* 页面显示了 Lustre 文件系统指标数据。 | ![Lustre Statistics Dashboard](pic/lustre_statistics.jpg) @@ -186,7 +186,7 @@ Grafana 服务将自动在监控服务器启动 。默认HTTP 端口为 3000。 -###Server Statistics +### Server Statistics *Server Statistics* 页面显示了服务器详细信息。 ![Server Statistics Dashboard](pic/server_statistics/server_statistics.jpg) diff --git a/doc/outdated/ESMON_User_Manual_Raw_en.md b/doc/outdated/ESMON_User_Manual_Raw_en.md index f163873..db43c12 100644 --- a/doc/outdated/ESMON_User_Manual_Raw_en.md +++ b/doc/outdated/ESMON_User_Manual_Raw_en.md @@ -15,13 +15,13 @@ - **Lustre**: The *Lustre* file system is an open-source, parallel file system that supports many requirements of leadership class HPC simulation environments. -##Introduction +## Introduction *ESMON* is a monitoring system which can collect system statistics of DDN Exascaler for performance monitoring and analyzing. It is based on multiple widely used open-source software. Some extra plugins and are developed by DDN for enhancemen. One of the main components of *ESMON* is *Collectd*. *Collectd* is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways. *ESMON* is based on the open-source *Collectd*, yet includes more plugins, such as Lustre, GPFS, Ganglia, Nagios, Stress, Zabbix and so on. -###Collectd plugins of DDN +### Collectd plugins of DDN Several additional plugins are added to *Collectd* in *ESMON* to support various functions. @@ -33,30 +33,30 @@ Several additional plugins are added to *Collectd* in *ESMON* to support various - **Stress plugin:** The *Stress* plugin can push a large amount of metrics to server from *Collectd* client in order to benchmark the performance of the collecting system under high pressure. - **Zabbix plugin:** The *Zabbix* plugin is able to send metrics from *Collectd* to *Zabbix* system. -##Installation Requirements +## Installation Requirements -###Installation Server +### Installation Server - OS distribution: CenOS7/RHEL7 - Free disk space: > 500 MB. The *installation server* will save all installation logs to */var/log/esmon_install* directory, which requires some free disk space. - Network: The *installation server* be able to start SSH connections to the *monitoring server* and *monitoring clients* without password prompt - *ESMON* ISO image : The *installation server* should posses the *ESMON* ISO image. -###Monitoring Server +### Monitoring Server - OS distribution: CenOS7/RHEL7 - Free disk space: > 5G. *Influxdb* will be running on this server. More disk space is required to keep more data into *Influxdb* - Network: SSHD should be running on the *monitoring server* and it should be able to be connected by *installation server* without prompting for password. -###Monitoring Client +### Monitoring Client - OS distribution: CenOS7/RHEL7 or CentOS6/RHEL6 - Free disk space: > 200M. The *installation server* will save necessary RPMs in directory */var/log/esmon_install*, which requires some free disk space. - Network: SSHD should be running on the *monitoring client* and it should be able to be connected by *installation server* without prompting for password. -##Installation Process +## Installation Process -###1. Install the *ESMON* RPM on *Installation Server* +### 1. Install the *ESMON* RPM on *Installation Server* 1. Grab the *ESMON* ISO image file to the *installation server*, for example: /ISOs/esmon.iso. @@ -72,7 +72,7 @@ Several additional plugins are added to *Collectd* in *ESMON* to support various # rpm -ivh /media/RPMS/rhel7/esmon*.rpm ``` -###2. Update the Configuration File on the *Installation Server* +### 2. Update the Configuration File on the *Installation Server* The configuration file */etc/esmon_install.conf* includes all the necessary information for installation. Following is an example: @@ -112,7 +112,7 @@ server_host: **host_id** in **server_host** is the host ID that *ESMON* server packages should be installed and configured. If **erase_influxdb** is true, all of the data and metadata of *Influxdb* will be erased completely. And if **drop_database** is true, the database of ESMON in *Influxdb* will be dropped. **erase_influxdb** and **drop_database** should only be when the data in *Influxdb* is not needed any more. By enabling **erage_influxdb**, some corruption problems of *Influxdb* could be fixed. -###3. Start the Installation on the Cluster +### 3. Start the Installation on the Cluster After the */etc/esmon_install.conf* file has been updated correctly on the *installation server*, following command could be run to start the installation on the cluster: @@ -122,13 +122,13 @@ After the */etc/esmon_install.conf* file has been updated correctly on the *inst All the logs which are useful for debugging are saved under */var/log/esmon_install* directory of the *installation server*. -###4. Access the Monitoring Web Page +### 4. Access the Monitoring Web Page The *Grafana* service is started on the *monitoring server* automatically. The default HTTP port is 3000. A login web page will been shown through that port. The default user and password are both "admin". | ![Login Dashboard](pic/login.jpg) -##Dashboards +## Dashboards By selection dashboards, different metrics collectd by *ESMON* can be shown. @@ -136,7 +136,7 @@ Different dashboards can be chosen to view different metrics collectd by *ESMON* | ![Home Dashboard](pic/home.jpg) -###Cluster Status Dashboard +### Cluster Status Dashboard The *Cluster Status* dashboard shows a summarized status of the servers in the cluster. The back ground color of panels show the servers' working status. @@ -157,7 +157,7 @@ The back ground color of panels show the servers' working status. | ![Cluster Status Dashboard](pic/cluster_status.jpg) -###Lustre Status Dashboard +### Lustre Status Dashboard The *Lustre Statistics* dashboard show metrics of *Lustre* file systems. TODO: copy the markdown introduction from Grafana dashboard discription. @@ -181,7 +181,7 @@ Following pictures are some of the panels in *Lustre Statistics* Dashboard | ![Quota Accounting(Inode) Panel of Lustre Statistics Dashboard](pic/lustre_statistics_quota2.jpg) -###Server Statistics +### Server Statistics The *Server Statistics* dashboard shows detailed information about a server.