This commit is contained in:
Göran Sander
2018-02-05 21:48:24 +01:00
parent cc6408bbfe
commit fd7d765697
13 changed files with 3035 additions and 545 deletions

2
.vscode/launch.json vendored
View File

@@ -8,7 +8,7 @@
"type": "node",
"request": "launch",
"name": "Launch Program",
"program": "${workspaceFolder}\\butler-sos.js"
"program": "${workspaceFolder}/butler-sos.js"
}
]
}

View File

@@ -7,30 +7,33 @@
# Butler SOS
# Butler SOS v2
Butler SenseOps Stats ("Butler SOS") is a Node.js service publishing operational Qlik Sense Enterprise metrics to MQTT and Influxdb.
It uses the [Sense healthcheck API](http://help.qlik.com/en-US/sense-developer/November2017/Subsystems/EngineAPI/Content/GettingSystemInformation/HealthCheckStatus.htm) to gather operational metrics for the Sense servers specified in the JSON config file.
It also pulls warnings and errors from Sense's Postgres logging database, and forward these to Influx.
It also pulls warnings and errors from [Sense's Postgres logging database](http://help.qlik.com/en-US/sense/November2017/Subsystems/PlanningQlikSenseDeployments/Content/Deployment/Qlik-Logging-Service.htm), and forwards these to Influx and MQTT.
The most interesting use of Butler SOS is probably to create real-time dashboards based on the data in the Influx database, showing operational metrics for a Qlik Sense Enterprise environment:
![Grafana dashboard](img/senseops-1.png "SenseOps dashboard using Grafana")
![Grafana dashboard](img/SenseOps_dashboard_3.png "SenseOps dashboard showing errors and warnings, using Grafana")
![Grafana dashboard](img/SenseOps_dashboard_4.png "SenseOps dashboard showing Qlik Sense metrics, using Grafana")
Butler SOS can however also send the data to [MQTT](https://en.wikipedia.org/wiki/MQTT), for use in any MQTT capable tool or system.
Butler SOS can however also send the data to [MQTT](https://en.wikipedia.org/wiki/MQTT), for use in any MQTT enabled tool or system.
## Install and setup
* Butler SOS has been developed with Qlik Sense Enterprise November 2017 in mind. In order to use Butler SOS with other Sense versions, some adaptations may be needed.
* Butler SOS v2 has been developed with Qlik Sense Enterprise November 2017 in mind. In order to use Butler SOS with other Sense versions, some adaptations may be needed.
* Clone [the repository](https://github.com/ptarmiganlabs/butler-sos) from GitHub to desired location.
* Make sure [Node.js](https://nodejs.org) is installed. Butler-SOS has been tested with Node.js 8.9.4.
* Run "npm install" from within the main butler-sos directory to download and install all Node.js dependencies.
* Make a copy of the [config/default_template.json](https://github.com/ptarmiganlabs/butler-sos/blob/master/config/default_template.json) configuration file. Edit the file as needed, save it as "default.json" in the ./config directory.
Butler SOS will read its config settings from the default.json file.
* Make a copy of the [config/default_template.yaml](https://github.com/ptarmiganlabs/butler-sos/blob/master/config/default_template.yaml) configuration file. Edit the file as needed, save it as "default.yaml" in the ./config directory.
Butler SOS will read its config settings from this file.
* Install [Influxdb](https://docs.influxdata.com/influxdb/v1.4/introduction) (only needed if data is to be stored in Influxdb, of course).
* Install [Mosquitto](https://mosquitto.org) or another MQTT broker (only needed if data is to be forwarded to MQTT).
* Install [Mosquitto](https://mosquitto.org) or another MQTT broker (only needed if data is to be forwarded to MQTT). If you already have an MQTT broker you do not need to install a new one, Butler SOS can use the existing broker.
@@ -38,28 +41,31 @@ Butler SOS will read its config settings from the default.json file.
The latst version of Butler SOS introduce several breaking changes to its configuration file:
* The configuration file format is now YAML rather than JSON. YAML is a more human readable and compact file format compared to JSON. It also allows comments to be used.
* Virtual proxies are no longer used to get the Sense healthcheck data. Instead of virtual proxies the main Qlik Sense Engine Service (QES) of 4747 is queried directly for the health data of each Sense server.
* Virtual proxies are no longer used to get the Sense healthcheck data.
Instead of virtual proxies the main Qlik Sense Engine Service (QES) is called on TCP port 4747 to get the health data of each Sense server that should be monitored.
A consequency of this is that certificates are now used to authenticate with Qlik Sense, rather than the security-by-obscurity that was the most commonly used security solution in the past for Butler SOS.
Please note that the path to these certificates must be properly configured int he config file's Butler-SOS.qrs section.
Please note that the path to these certificates must be properly configured in the config file's Butler-SOS.cert section.
Pleae refer to the conig/default.yaml for further configuration instructions.
Butler SOS will then query https://server1.my.domain:4747/engine/healthcheck to get operational metrics for the Qlik Sense engine on server1.my.domain.
The certificates used are [created from the Sense QMC](http://help.qlik.com/en-US/sense/November2017/Subsystems/ManagementConsole/Content/export-certificates.htm). Export certificates based on instructions at that link, then place them in Butler SOS' ./ssl directory.
### Certificates
The certificates used when connecting to the Sense engine are [created from the Sense QMC](http://help.qlik.com/en-US/sense/November2017/Subsystems/ManagementConsole/Content/export-certificates.htm). Export certificates as described there, then place them in Butler SOS' ./cert directory.
### Postgres log database
The config file allows you to set how often Butler should query the Sense log database for warnings and errors. In order to get real-time (-ish) notifications of warnings and errors, you should set the polling frequency to a reasonably low level. On the other hand, this polling will consume server resources and put some load on the Sense logging database - i.e. you should not set a too low polling frequency...
Experience shows that polling every 15-30 seconds work well and doesn't put too much load on the database.
Finally, there is one caveat to be aware of when it comes to the Butler-SOS.logdb.pollingInterval setting:
By default Butler SOS will query the log database for any warnings and errors that have occured during the last 2 minutes. The reason for having such a limit is simply to limit the query load on the Postgres server.
This however also means that you should **not** configure a polling frequency of 2 minutes or more, as such a setting would mean that Butler SOS would not capture all warnings and errors.
If you need a log database polling frequency longer than 2 minutes, you also need to change the SQL query in the butler-sos.js file to a longer time window.
## Usage
## Usage
Start Influxdb and Mosquitto (or other MQTT broker).
Both Influxdb and Mosquitto should work right after installation - for production use their respective config files should be edited as needed, with respect to use of https etc.
Both Influxdb and Mosquitto should work right after installation - for production use their respective config files should be reviewed and edited as needed, with respect to use of https etc.
Starting Influxdb on OSX will look something like this (for Influx v1.2.3):
@@ -68,11 +74,11 @@ Starting Influxdb on OSX will look something like this (for Influx v1.2.3):
Then start Butler SOS itself from the main butler-sos directory:
"node butler-sos.js".
If the Influxdb database specified in the config file does not exist, it will be created:
If the Influxdb database specified in the config file does not exist, it will be created.
![Starting Butler SOS](img/butler-sos-1.png "Starting Butler SOS")
![Starting Butler SOS](img/butler-sos-cli-1.png "Starting Butler SOS")
Here we see how three servers are queried for data.
Here we see how two servers are queried for data.
The responses are retrived asyncronously as they arrive from the different servers.
Finally, the data is stored to Influxdb and sent as MQTT messages.
@@ -81,7 +87,7 @@ Finally, the data is stored to Influxdb and sent as MQTT messages.
By popular request, here are the commands needed to install Influx and Grafana.
The commands below assume you are using a Mac and have the [Homebrew](https://brew.sh/) package manager installed.
You can also install the software on a Linux server (apt-get install ... on Debian etc). Windows might be possible, but it is usually easier to spin up a small Linux server in a Docker container on your Windows PC, compared to installing the actual software on Windows...
Using Docker containers is actually a great way to play around with software, without clogging down your own computer.
Using Docker containers is actually a great way to play around with software, without clogging down your own computer. Butler SOS is in fact developed using Influx, Grafana and MQTT running in Docker containers.
Install and start Influx:
@@ -100,12 +106,12 @@ Default username/pwd is admin/admin.
## Real-time dashboards using Grafana
Once the data exists in Influxdb it can be visualised using [Grafana](https://grafana.com).
A sample dashboard is included in the Grafana directory - it should work out of the box when imported into your Grafana environment.
A sample dashboard is included in the Grafana directory. Import it into your Grafana environment, then modify it to reflect your server host names, after which it should show real-time metrics for your Sense servers.
Grafana is extremely powerful. Creating automatically updating dashboards for any number of servers is a matter of a few minutes work. Tutorials and docs can be found on their site.
## References
Please see [https://ptarmiganlabs.com](https://ptarmiganlabs.com/blog/2017/04/24/butler-sos-real-time-server-stats-qlik-sense/) and [https://github.com/mountaindude/butler](https://github.com/mountaindude/butler) for more in-depth info on the Butler family of micro services.
Please see [https://ptarmiganlabs.com](https://ptarmiganlabs.com/blog/2017/04/24/butler-sos-real-time-server-stats-qlik-sense/) and [https://github.com/ptarmiganlabs/butler](https://github.com/ptarmiganlabs/butler) for more in-depth info on the Butler family of micro services for Qlik Sense.
At [https://senseops.rocks](https://senseops.rocks) you also find thoughts on using DevOps best practices in the Qlik Sense ecosystem.
At [https://senseops.rocks](https://senseops.rocks) you also find thoughts on using DevOps best practices in the Qlik Sense ecosystem.

View File

@@ -7,12 +7,21 @@ var globals = require("./globals");
// Load certificates to use when connecting to healthcheck API
var fs = require("fs"),
path = require("path"),
certFile = path.resolve(__dirname, globals.config.get("Butler-SOS.cert.clientCert")),
keyFile = path.resolve(__dirname, globals.config.get("Butler-SOS.cert.clientCertKey")),
caFile = path.resolve(__dirname, globals.config.get("Butler-SOS.cert.clientCertCA"));
// certFile = path.resolve(__dirname, "ssl/client.pem"),
// keyFile = path.resolve(__dirname, "ssl/client_key.pem"),
// caFile = path.resolve(__dirname, "ssl/root.pem");
certFile = path.resolve(
__dirname,
globals.config.get("Butler-SOS.cert.clientCert")
),
keyFile = path.resolve(
__dirname,
globals.config.get("Butler-SOS.cert.clientCertKey")
),
caFile = path.resolve(
__dirname,
globals.config.get("Butler-SOS.cert.clientCertCA")
);
// certFile = path.resolve(__dirname, "ssl/client.pem"),
// keyFile = path.resolve(__dirname, "ssl/client_key.pem"),
// caFile = path.resolve(__dirname, "ssl/root.pem");
// Set specific log level (if/when needed)
// Possible values are { error: 0, warn: 1, info: 2, verbose: 3, debug: 4, silly: 5 }
@@ -161,7 +170,24 @@ function postToInfluxdb(host, serverName, body) {
});
}
function postToMQTT(host, serverName, body) {
function postLogDbToMQTT(
process_host,
process_name,
entry_level,
message,
timestamp
) {
// Get base MQTT topic
var baseTopic = globals.config.get("Butler-SOS.mqttConfig.baseTopic");
// Send to MQTT
globals.mqttClient.publish(
baseTopic + process_host + "/" + process_name + "/" + entry_level,
message
);
}
function postHealthToMQTT(host, serverName, body) {
// Get base MQTT topic
var baseTopic = globals.config.get("Butler-SOS.mqttConfig.baseTopic");
@@ -170,7 +196,7 @@ function postToMQTT(host, serverName, body) {
globals.mqttClient.publish(baseTopic + serverName + "/started", body.started);
globals.mqttClient.publish(
baseTopic + serverName + "/mem/comitted",
body.mem.comitted.toString()
body.mem.committed.toString()
);
globals.mqttClient.publish(
baseTopic + serverName + "/mem/allocated",
@@ -280,7 +306,7 @@ function getStatsFromSense(host, serverName) {
// Post to MQTT (if enabled)
if (globals.config.get("Butler-SOS.mqttConfig.enableMQTT")) {
globals.logger.debug("Calling MQTT posting method");
postToMQTT(host, serverName, body);
postHealthToMQTT(host, serverName, body);
}
// Post to Influxdb (if enabled)
@@ -293,9 +319,9 @@ function getStatsFromSense(host, serverName) {
);
}
// Set up timer for getting log data
// Configure timer for getting log data from Postgres
setInterval(function() {
globals.logger.verbose("Event started: Log db query");
globals.logger.verbose("Event started: Query log db");
// checkout a Postgres client from connection pool
globals.pgPool.connect().then(pgClient => {
@@ -318,40 +344,53 @@ setInterval(function() {
)
.then(res => {
pgClient.release();
globals.logger.debug(
"Log db query got a response. Sending to Influxdb "
);
globals.logger.debug("Log db query got a response.");
var rows = res.rows;
rows.forEach(function(row) {
globals.logger.silly("Log db row: " + JSON.stringify(row));
// Write the whole reading to Influxdb
globals.influx
.writePoints([
{
measurement: "log_entry",
tags: {
host: row.process_host,
source_process: row.process_name,
log_level: row.entry_level
},
fields: {
message: row.payload.Message
},
timestamp: row.timestamp
}
])
.then(err => {
globals.logger.silly("Sent log event to Influxdb. ");
})
// Post to Influxdb (if enabled)
if (globals.config.get("Butler-SOS.influxdbConfig.enableInfluxdb")) {
globals.logger.debug("Posting log db data to Influxdb...");
.catch(err => {
console.error(`Error saving log event to InfluxDB! ${err.stack}`);
});
// Write the whole reading to Influxdb
globals.influx
.writePoints([
{
measurement: "log_entry",
tags: {
host: row.process_host,
source_process: row.process_name,
log_level: row.entry_level
},
fields: {
message: row.payload.Message
},
timestamp: row.timestamp
}
])
.then(err => {
globals.logger.silly("Sent log db event to Influxdb. ");
})
.catch(err => {
console.error(
`Error saving log event to InfluxDB! ${err.stack}`
);
});
}
// Publish MQTT message
// globals.mqttClient.publish(globals.config.get('Butler.mqttConfig.taskFailureTopic'), msg[1]);
// Post to MQTT (if enabled)
if (globals.config.get("Butler-SOS.mqttConfig.enableMQTT")) {
globals.logger.debug("Posting log db data to MQTT...");
postLogDbToMQTT(
row.process_host,
row.process_name,
row.entry_level,
row.payload.Message,
row.timestamp
);
}
});
})
.then(res => {
@@ -364,7 +403,7 @@ setInterval(function() {
});
}, globals.config.get("Butler-SOS.logdb.pollingInterval"));
// Set up timer for getting healthcheck data
// Configure timer for getting healthcheck data
setInterval(function() {
globals.logger.verbose("Event started: Statistics collection");

View File

@@ -1,29 +0,0 @@
{
"Butler-SOS": {
"logLevel": "verbose",
"mqttConfig": {
"enableMQTT": true,
"brokerIP": "<IP of MQTT server>",
"baseTopic": "butler-sos/"
},
"influxdbConfig": {
"enableInfluxdb": true,
"hostIP": "<IP or FQDN of Influxdb server>",
"dbName": "SenseOps"
},
"pollingInterval": 5000,
"serversToMonitor": {
"servers": [{
"host": "<server1.my.domain>",
"serverName": "<server1>",
"availableRAM": 32000
},
{
"host": "<server2.my.domain>",
"serverName": "<server2>",
"availableRAM": 24000
}
]
}
}
}

View File

@@ -4,6 +4,7 @@ Butler-SOS:
# Qlik Sense logging db config parameters
logdb:
# How often (milliseconds) should Postgres log db be queried for warnings and errors?
pollingInterval: 15000
host: <IP or FQDN of Qlik Sense logging db>
port: 4432
@@ -20,6 +21,8 @@ Butler-SOS:
mqttConfig:
enableMQTT: true
brokerIP: <IP of MQTT server>
brokerHost: 1883
# Topic should end with /
baseTopic: butler-sos/
# Influx db config parameters

View File

@@ -132,9 +132,14 @@ influx
// ------------------------------------
// Create MQTT client object and connect to MQTT broker
var mqttClient = mqtt.connect(
"mqtt://" + config.get("Butler-SOS.mqttConfig.brokerIP")
);
var mqttClient = mqtt.connect({
port: config.get("Butler-SOS.mqttConfig.brokerPort"),
host: config.get("Butler-SOS.mqttConfig.brokerHost")
});
mqttClient.publish("butler-sos/hej", "1234");
/*
Following might be needed for conecting to older Mosquitto versions
var mqttClient = mqtt.connect('mqtt://<IP of MQTT server>', {

View File

@@ -1,444 +0,0 @@
{
"__inputs": [
{
"name": "DS_SENSEOPS",
"label": "senseops",
"description": "",
"type": "datasource",
"pluginId": "influxdb",
"pluginName": "InfluxDB"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "4.1.1"
},
{
"type": "panel",
"id": "graph",
"name": "Graph",
"version": ""
},
{
"type": "datasource",
"id": "influxdb",
"name": "InfluxDB",
"version": "1.0.0"
},
{
"type": "panel",
"id": "singlestat",
"name": "Singlestat",
"version": ""
}
],
"annotations": {
"list": []
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"id": null,
"links": [],
"refresh": "30s",
"rows": [
{
"collapse": false,
"height": 257,
"panels": [
{
"aliasColors": {},
"bars": false,
"datasource": "${DS_SENSEOPS}",
"fill": 1,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"span": 5,
"stack": false,
"steppedLine": false,
"targets": [
{
"dsType": "influxdb",
"groupBy": [
{
"params": [
"$interval"
],
"type": "time"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "cpu",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"total"
],
"type": "field"
},
{
"params": [],
"type": "mean"
}
]
],
"tags": [
{
"key": "host",
"operator": "=~",
"value": "/^$Hosts$/"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "CPU",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": "",
"logBase": 1,
"max": "100",
"min": "0",
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"aliasColors": {},
"bars": false,
"datasource": "${DS_SENSEOPS}",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"dsType": "influxdb",
"groupBy": [
{
"params": [
"$interval"
],
"type": "time"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "session",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"total"
],
"type": "field"
},
{
"params": [],
"type": "max"
}
]
],
"tags": [
{
"key": "host",
"operator": "=~",
"value": "/^$Hosts$/"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Sessions",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
]
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"datasource": "${DS_SENSEOPS}",
"decimals": 0,
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"id": 7,
"interval": null,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"span": 3,
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": true,
"lineColor": "rgb(31, 120, 193)",
"show": true
},
"targets": [
{
"dsType": "influxdb",
"groupBy": [
{
"params": [
"$interval"
],
"type": "time"
},
{
"params": [
"null"
],
"type": "fill"
}
],
"measurement": "apps",
"policy": "default",
"refId": "A",
"resultFormat": "time_series",
"select": [
[
{
"params": [
"loaded_docs_count"
],
"type": "field"
},
{
"params": [],
"type": "last"
}
]
],
"tags": [
{
"key": "host",
"operator": "=~",
"value": "/^$Hosts$/"
}
]
}
],
"thresholds": "",
"title": "Loaded apps",
"type": "singlestat",
"valueFontSize": "50%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "avg"
}
],
"repeat": "Hosts",
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "$Hosts",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": null,
"current": {},
"datasource": "${DS_SENSEOPS}",
"hide": 0,
"includeAll": true,
"label": null,
"multi": true,
"name": "Hosts",
"options": [],
"query": "SHOW TAG VALUES WITH KEY = \"host\"",
"refresh": 1,
"regex": "",
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-3h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "browser",
"title": "SenseOps dashboard",
"version": 4
}

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 336 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 414 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 313 KiB

BIN
img/butler-sos-cli-1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 387 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 226 KiB