-
Notifications
You must be signed in to change notification settings - Fork 23
/
nozzle.html.md.erb
370 lines (285 loc) · 16.3 KB
/
nozzle.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
---
title: Logs, Metrics, and Nozzles
owner: Services
---
<strong><%= modified_date %></strong>
This topic explains how to integrate PCF services with Cloud Foundry's logging system, the _Loggregator_, by writing to and reading from its _Firehose_ endpoint.
## <a id="overview"></a> Overview
Cloud Foundry's _Loggregator_ logging system collects logs and metrics from PCF apps and platform components and streams them to a single endpoint, the _Firehose_. Your tile can integrate its service with the Loggregator system in two ways:
* By sending your service component logs and metrics to the Firehose, to be streamed along with PCF core platform component logs and metrics.
* By installing a _nozzle_ on the Firehose that directs Firehose data to be consumed by external services or apps. A built-in nozzle can enable a service to:
- Drain metrics to an external dashboard product, for system operators
- Send HTTP request details to search or analysis tools
- Drain app logs to an external system
- [Auto-scale itself](https://youtu.be/skJKvQfpKD4?t=1021) based on Firehose metrics
[Firehose-to-syslog](https://github.com/cloudfoundry-community/firehose-to-syslog) is a real world, production example of a nozzle.
## <a id="firehose"></a> Firehose Communication
PCF components publish logs and metrics to the Firehose through Metron agent processes that run locally on the component VMs. Metron agents input the data to the Loggregator system by writing it to Loggregator's [etcd](https://coreos.com/etcd/) key-value store via a [gRPC](https://coreos.com/etcd/docs/latest/op-guide/grpc_proxy.html) proxy. The topic [Overview of the Loggregator System](https://docs.cloudfoundry.org/loggregator/architecture.html) shows how logs and metrics travel from PCF system components to the Firehose.
Component VMs running PCF services can publish logs and metrics the same way, by including a Metron agent that writes to etcd. In PCF v1.10 and higher, components only communicate with `etcd` via secure, encrypted `https` protocol. Earlier versions of PCF allow both encrypted `https` and unencrypted `http` communications with etcd.
### <a id="https"></a> Secure HTTPS Protocol: PCF 1.10+
To enable a service component to supply logs and metrics to the Firehose through encrypted communications, you need to include a Metron agent and a Consul agent in its template definitions.
The Metron definition includes double-paren properties defining a keypair for accessing etcd. The Consul definition includes double-paren properties for securely looking up the internal IP addresses of the etcd nodes at `cf-etcd.service.cf.internal`. This avoids hard-coding any etcd server addresses.
For example:
```
name: service
label: Service
templates:
- name: consul
release: consul
- name: metron_agent
release: loggregator
- name: service
release: service
manifest: |
metron_agent:
deployment: cf-my-service
etcd:
client_cert: (( ..cf.properties.cf_etcd_client_cert.cert_pem ))
client_key: (( ..cf.properties.cf_etcd_client_cert.private_key_pem ))
metron_endpoint:
shared_secret: (( ..cf.doppler.shared_secret_credentials.password ))
loggregator:
etcd:
require_ssl: true
machines: ['cf-etcd.service.cf.internal']
ca_cert: (( $ops_manager.ca_certificate ))
consul:
encrypt_keys:
- (( ..cf.properties.consul_encrypt_key.value ))
ca_cert: (( $ops_manager.ca_certificate ))
agent_cert: (( ..cf.properties.consul_agent_cert.cert_pem ))
agent_key: (( ..cf.properties.consul_agent_cert.private_key_pem ))
agent:
domain: cf.internal
servers:
lan: (( ..cf.consul_server.ips ))
```
Metron versions v72 and later do not use etcd to communicate with Loggregator, but the configuration above works with any version of Metron. If the Metron agent does not need values for etcd, it safely ignores them.
### <a id="http"></a> HTTP Protocol: PCF 1.9 and Earlier
In PCF v1.9, service components can send logs and metrics to the Firehose encrypted or unencrypted. In v1.8 and earlier releases, components only communicate their log and metrics data unencrypted.
To enable unencrypted communications with etcd, define a Metron agent and list the addresses of the etcd servers in the template definitions as follows:
```
name: service
label: Service
templates:
- name: metron_agent
release: loggregator
- name: service
release: service
manifest: |
metron_agent:
deployment: cf-my-service
metron_endpoint:
shared_secret: (( ..cf.doppler.shared_secret_credentials.password ))
loggregator:
etcd:
machines: (( ..cf.etcd_server.ips ))
```
## <a id="nozzle"></a> Nozzles
A nozzle is a component dedicated to reading and processing data that streams from the Firehose. A service tile can install a nozzle as either a managed service, with package type `bosh-release`; or as an app pushed to Elastic Runtime, with the package type `app`.
### <a id="develop"></a> Develop a Nozzle
Pivotal recommends developing a nozzle in Go, to leverage the
[NOAA library](https://github.com/cloudfoundry/noaa).
NOAA does the heavy lifting of establishing
an authenticated websocket connection to the logging system
as well as de-serializing the protocol buffers.
Draining the logs consists of:
1. Authenticating
1. Establishing a connection to the logging system
1. Forwarding events on to their ultimate destination
Authenticate against the API (https://github.com/cloudfoundry-community/go-cfclient)
with a user in the `doppler.firehose` group:
```go
import "github.com/cloudfoundry-community/go-cfclient"
...
config := &cfclient.Config{
ApiAddress: apiUrl,
Username: username,
Password: password,
SkipSslValidation: sslSkipVerify,
}
client, err := cfclient.NewClient(config)
```
Using the client's token, create a consumer and connect to the Firehose with a subscription id.
The id is important, since the Firehose looks for connections having the same id and only
sends an event to one of those connections. This is how a nozzle developer can prevent
message loss during upgrades an other deployments: run at least two instances.
```go
token, err := client.GetToken()
consumer := consumer.New(config.TrafficControllerURL, &tls.Config{
InsecureSkipVerify: config.SkipSSL,
}, nil)
events, errors := consumer.Firehose(firehoseSubscriptionID, token)
```
`Firehose` will give back two channels: one for events and a second for errors.
The events channel receives six different types of events.
* ValueMetric: Some platform metric at a point in time, emitted by platform components. For example, how many `2xx` responses the router has sent out.
* CounterEvent: An incrementing counter, emitted by platform components. For example, a Diego cell's remaining memory capacity.
* Error: An error.
* HttpStartStop: HTTP request details, including both app and platform requests.
* LogMessage: A log message for an individual app.
* ContainerMetric: Application container information. For example, memory used.
For the full details on events, see the
[dropsonde protocol](https://github.com/cloudfoundry/dropsonde-protocol/tree/master/events).
The above events show how this data targets two different personae:
platform operators and app developers. Keep this in mind when designing an integration.
Having `doppler.firehose` scope gets a nozzle data for *every* app as well as the platform.
Any filtering based on the event payload is the nozzle implementor's responsibility.
An advanced integration could do something like combine a
[service broker](service-brokers.html) with a nozzle to:
* Let app developers opt-in to logging (implementing filtering in the nozzle)
* Establish [SSO](https://docs.cloudfoundry.org/services/dashboard-sso.html) exchange for authentication such that developers only can access logs for their space's apps
For a full working example (suitable as an integration starting point),
see [firehose-nozzle](https://github.com/cf-platform-eng/firehose-nozzle).
### <a id="deploy"></a> Deploy a Nozzle
Once you've build a nozzle, you can deploy it as either a managed service or as an app.
#### <a id="nozzle-service"></a> As a Managed Service
Visit [managed service](managed.html)
for more details on what it means to be a managed service.
See also this
[example nozzle BOSH release](https://github.com/cloudfoundry-incubator/example-nozzle-release).
#### <a id="nozzle-app"></a> As an App
You can also deploy the nozzle as an app on Elastic Runtime.
Visit the Tile Generator's
[section on pushed apps](tile-generator.html#pushed-applications)
for more details.
### <a id="examples"></a> Example Nozzles
There are several open source examples you could use
as a reference for building your nozzle
[firehose-nozzle](https://github.com/cf-platform-eng/firehose-nozzle)
* Example that simply writes to standard out
* Useful starting point: scaffolding, tests, etc are in place
[example-nozzle](https://github.com/cloudfoundry-incubator/example-nozzle)
* A single file implementation with no tests: as minimal as things can get
[gcp-tools-release](https://github.com/cloudfoundry-community/gcp-tools-release)
* In addition to Nozzle data, it drains component syslogs and health data
* Shows how to do a bosh-addon (for additional data outside a nozzle)
* Nozzle is managed through BOSH
* Raw logs and metrics data take different paths in the source
[firehose-to-syslog](https://github.com/cloudfoundry-community/firehose-to-syslog)
* Includes implementation code that adds additional metadata,
which might be needed for an access control list (ACL)
- App name
- Space UUID and name
- Org UUID and name
* [logsearch-for-cloudfoundry](https://github.com/cloudfoundry-community/logsearch-for-cloudfoundry) packages this nozzle as a BOSH release
[splunk-firehose-nozzle](https://github.com/cf-platform-eng/splunk-firehose-nozzle)
* Source code based on `firehose-to-syslog`
* Packaged to run an app on PCF
[datadog-firehose-nozzle](https://github.com/DataDog/datadog-firehose-nozzle)
* Another real world implementation
## <a id="syslog-format-pcf"></a> Log Format for PCF Components
Pivotal's standard log format adheres to the [RFC-5424 syslog protocol](https://tools.ietf.org/html/rfc5424), with log messages formatted as follows:
`<${PRI}>${VERSION} ${TIMESTAMP} ${HOST_IP} ${APP_NAME} ${PROD_ID} ${MSG_ID} ${SD-ELEMENT-instance} ${MESSAGE}`
The [Syslog Message Elements table](#syslog-elements) immediately below describes each element of the log, and the [Structured Instance Data Format](#sd-element) table describes the contents of the structured data element that carries Cloud Foundry VM instance information.
### <a id="syslog-elements"></a> Syslog Message Elements
This table describes each element of a standard PCF syslog message.
<table id='syslog-elements-table' border="1" class="nice" >
<tr>
<th>Syslog Message Element</th>
<th>Meaning or Value</th>
</tr><tr>
<td><code>${PRI}</code></td>
<td><p><a href="https://tools.ietf.org/html/rfc5424#section-6.2.1">Priority value (PRI)</a>, calculated as <code>8 × Facility Code + Severity Code</code></p>
<p>Pivotal uses a Facility Code value of <code>1</code>, indicating a user-level facility. This adds <code>8</code> to the RFC-5424 Severity Codes, resulting in the numbers listed in the <a href="#severity-codes">table below</a>.</p>
<p>If in doubt, default to <code>13</code>, to indicate Notice-level severity.</p>
</td>
</tr><tr>
<td><code>${VERSION}</code></td>
<td><a href="https://tools.ietf.org/html/rfc5424#section-9.1"><code>1</code></a></td>
</tr>
</tr><tr>
<td><code>${TIMESTAMP}</code></td>
<td><p>The <a href="https://tools.ietf.org/html/rfc5424#section-9.1">timestamp</a> of when the log message is forwarded; typically slightly after it was generated. Example: <code>2017-07-24T05:14:15.000003Z</code></p></td>
</tr><tr>
<td><code>${HOST_IP}</code></td>
<td><a href="https://tools.ietf.org/html/rfc5424#section-6.2.4">Internal IP address</a> of origin server</td>
</tr><tr>
<td><code>${APP_NAME}</code></td>
<td><p><a href="https://tools.ietf.org/html/rfc5424#section-6.2.5">Process name</a> of the program the generated the message. Prefixed with <code>vcap</code>. For example:</p>
<ul><li><code>vcap.rep</code></li>
<li><code>vcap.garden</code></li>
<li><code>vcap.cloud_controller_ng</code></li></ul>
<p>
You can derive this process name from either the program name configured for the local Metron agent or the <code>:progname</code>that blackbox derives from the folder that syslog-release forwards logs into.</p></td>
</tr><tr>
<td><code>${PROD_ID}</code></td>
<td>The <a href="https://tools.ietf.org/html/rfc5424#section-6.2.6">Process ID</a> of the syslog process doing the forwarding. If this is not easily available, default to <code>-</code> (hyphen) to indicate unknown.</td>
</td>
</tr><tr>
<td><code>${MSG_ID}</code></td>
<td>The <a href="https://tools.ietf.org/html/rfc5424#section-6.2.7">type</a> of log message. If this is not easily available, default to <code>-</code> (hyphen) to indicate unknown.</td>
</tr><tr>
<td><code>${SD-ELEMENT-instance}</code></td>
<td>Structured data (SD) relevant to PCF about the <a href="https://tools.ietf.org/html/rfc5424#section-6.3.1">source instance (VM)</a> that originates the log message. See the <a href="#sd-element">Structured Instance Data Format table</a> below for content and format.</td>
</tr><tr>
<td><code>${MESSAGE}</code></td>
<td>The log message itself, ideally in JSON</td>
</tr>
</table>
### <a id="severity"></a> RFC-5424 Severity Codes
PCF components generate log messages with the following severity levels. The most common severity level is `13`.
<table id='severity-table' border="1" class="nice" >
<tr>
<th>Severity Code</th>
<th>Meaning</th>
</tr><tr>
<td><code>8</code></td>
<td>Emergency: system is unusable</td>
</tr><tr>
<td><code>9</code></td>
<td>Alert: action must be taken immediately</td>
</tr><tr>
<td><code>10</code></td>
<td>Critical: critical conditions</td>
</tr><tr>
<td><code>11</code></td>
<td>Error: error conditions</td>
</tr><tr>
<td><code>12</code></td>
<td>Warning: warning conditions</td>
</tr><tr>
<td><code>13</code></td>
<td>Notice: normal but significant condition</td>
</tr><tr>
<td><code>14</code></td>
<td>Informational: informational messages</td>
</tr><tr>
<td><code>15</code></td>
<td>Debug: debug-level messages</td>
</tr>
</table>
### <a id="sd-element"></a> Structured Instance Data Format
The RFC-5424 syslog protocol includes a [structured data element](https://tools.ietf.org/html/rfc5424#section-6.3.1) that people can use as they see fit. Pivotal uses this element to carry VM instance information as follows:
<table id='sd-element-table' border="1" class="nice" >
<tr>
<th><code>SD-ELEMENT-instance</code> element</th>
<th>Meaning</th>
</tr><tr>
<td><code>${ENTERPRISE_ID}</code></td>
<td>Your Enterprise Number, as <a href="https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers">listed</a> by the Internet Assigned Numbers Authority (IANA) </td>
</tr><tr>
<td><code>${DIRECTOR}</code></td>
<td>The BOSH director managing the deployment. </td>
</tr><tr>
<td><code>${DEPLOYMENT}</code></td>
<td>BOSH <code>spec.deployment</code> value</td>
</tr><tr>
<td><code>${INSTANCE_GROUP}</code></td>
<td>BOSH <code>instance_group</code>, currently <code>spec.job.name</code></td>
</tr><tr>
<td><code>${AVAILABILITY_ZONE}</code></td>
<td>BOSH <code>spec.az</code> value</td>
</tr><tr>
<td><code>${ID}</code></td>
<td>BOSH <code>spec.id</code> value. This is a GUID, not an index. Necessary because BOSH Availability Zone index values are not always unique or sequential.</td>
</tr>
</table>
## <a id="metrics"></a> Making Sense of Metrics
[Monitoring Pivotal Cloud Foundry](https://docs.pivotal.io/pivotalcf/monitoring/index.html) has a great rundown of the
various metrics and how to make them useful.
## <a id="resources"></a> Other Resources
* CF Summit Video [Monitoring Cloud Foundry: Learning about the Firehose](https://youtu.be/skJKvQfpKD4)
* [Loggregator GitHub repository](https://github.com/cloudfoundry/loggregator/)
* [Overview of the Loggregator System](https://docs.cloudfoundry.org/loggregator/architecture.html)
* [Loggregator's Slack Channel](https://cloudfoundry.slack.com/messages/loggregator/)