forked from pivotal-cf/docs-concourse-olm
-
Notifications
You must be signed in to change notification settings - Fork 0
/
troubleshooting.html.md.erb
126 lines (103 loc) · 5.06 KB
/
troubleshooting.html.md.erb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: Monitoring and Troubleshooting
owner: Concourse
---
<strong><%= modified_date %></strong>
This topic broadly outlines techniques for troubleshooting your Concourse for PCF installation.
##<a id="fly"></a> Troubleshoot with Fly Commands
To get help on all `fly` commands, run: `fly --help`.
The following selected `fly` commands give you useful information to help you troubleshoot Concourse for PCF environments and pipelines.
###<a id="environments"></a> Troubleshoot Concourse for PCF Environments
You can use the following `fly` commands to troubleshoot possible environment problems.
<table class="nice">
<th width=20%>Fly Command</th><th>Description</th>
<tr>
<td><code>containers</code></td><td>Lists active containers.
This confirms which container or task is placed on which worker.</td></tr>
<tr>
<td><code>workers</code></td><td>Lists registered workers.
This helps you verify that the number of containers does not exceed the maximum number allowable.</td></tr>
<tr>
<td><code>prune-worker</code></td><td>Removes a non-running worker.
Stops Concourse for PCF from tracking an out-of-commission worker.</td></tr>
<tr>
<td><code>volumes</code></td><td>Lists active volumes.
Checks disk usage across workers.</td></tr>
</table>
###<a id="pipelines"></a> Troubleshoot Pipelines
You can use the following `fly` commands to troubleshoot possible pipeline problems.
<table class="nice">
<th width=20%>Fly Command</th>
<th>Description</th>
<tr>
<td><code>pipelines</code></td><td>Lists configured pipelines.</td></tr>
<tr>
<td><code>builds</code></td><td>Shows build history.
This is useful for listing build IDs of one-off tasks ran previously using <code>execute</code>.</td></tr>
<tr>
<td><code>validate-pipeline</code></td><td>Validates a pipeline's configuration without calling <code>set-pipeline</code>.</td></tr>
<tr>
<td><code>check-resource</code></td><td>Checks for new versions.
This is useful when developing a new resource.</td></tr>
<tr>
<td><code>watch</code></td><td>Shows logs of in-progress builds.</td></tr>
<tr>
<td><code>intercept</code></td><td>Displays build steps for a running or recent build.</td></tr>
<tr>
<td><code>execute</code></td><td>Submits local tasks.
This is useful for spinning up a task quickly to test it before putting it in a job.</td></tr>
</table>
##<a id="common-issues"></a> Common Concourse for PCF Issues
The following shows some common problems and solutions.
<table>
<th>Problem</th>
<th>Error Description</th>
<th>Solution</th>
<tr>
<td>The worker is out of disk space</td>
<td>An error states there is an inability to create a storage volume and might state that permissions are denied.</td>
<td>Increase persistent disk size for the worker or increase the number of worker VMs.</td>
</tr>
<tr>
<td>Container limit is reached</td>
<td>Cannot create container: limit of 250 containers reached<br><br>This error state is unlikely to appear.</td>
<td><ol><li>Check fly containers.</li>
<li>Increase number of worker VMs.</li>
<li>Decrease <code>gc_interval</code> if set to custom value. A large interval could mean that there are too many expired containers.</li></td>
</tr>
<tr>
<td>Job does not start</td>
<td>This error might present as the build getting stuck in Pending state.</td>
<td>Restart the ATC job:<br/>
<ol><li>Log in as a root user on Concourse for PCF web VMs where the ATC job is located.</li>
<li>Run the <code>monit restart atc</code> command.</td>
</tr>
<tr>
<td>Build fails when updating Concourse for PCF from a job on the same instance</td>
<td>When a build fails after BOSH deploys a Concourse for PCF update from a job running on that
Concourse for PCF instance, typically the job fails with a "worker for container not found" error.
This is expected behavior; the BOSH Director re-creates the worker VM.</td>
<td>Run the job again.</td>
</tr>
<tr>
<td>BOSH cannot finish worker upgrade while tasks are running</td>
<td>BOSH is not able to restart the worker job to finalize the upgrade until all work is completed. </td>
<td>If you have a long-running task, wait for the task to be completed.<br/>
If you need to upgrade quickly, cancel running tasks and jobs.</td>
</tr>
</table>
##<a id="logs"></a> Access Concourse Logs
You might need to contact Pivotal Support for help identifying a problem.
In that case, support might ask you to send job log files.
For general information about accessing log files,
see [Location and use of logs](https://bosh.io/docs/job-logs.html) in the BOSH documentation and
[Advanced Troubleshooting with the BOSH CLI](http://docs.pivotal.io/pivotalcf/2-1/customizing/trouble-advanced.html).
##<a id="other-resources"></a> Other Troubleshooting Resources
The following links provide other troubleshooting resources.
<table>
<th>Topic</th><th>Link</th>
<tr>
<tr><td>Enabling syslog forwarding and getting logs for other Concourse components</td><td><a href="https://bosh.io/docs/job-logs.html">VM Logs</a></td></tr>
<tr>
<td>Common BOSH issues</td><td><a href="https://bosh.io/docs/tips.html">BOSH tips</a></td></tr>
</table>