From 8b03c5fd1cc36abbba23ec2c75d5f5cb00321b51 Mon Sep 17 00:00:00 2001 From: Colin Francis <131073567+colifran@users.noreply.github.com> Date: Mon, 25 Sep 2023 10:52:50 -0700 Subject: [PATCH] chore: add dynamic require failure to runbook (#1315) This PR moves the "Errors Encountered in the Past" section from the `operator-runbook` to a new document named `errors-encountered`. An entry for `Transliterator` task failures caused by missing files has been added to the `errors-encountered` document which explains that the cause may be from a dynamic require being used in a dependency. A link to the `errors-encountered` doc has been added to the appendix section in the `operator-runbook`. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license* --------- Signed-off-by: Francis --- docs/errors-encountered.md | 65 ++++++++++++++++++++++++++++++++++++++ docs/operator-runbook.md | 41 ++---------------------- 2 files changed, 67 insertions(+), 39 deletions(-) create mode 100644 docs/errors-encountered.md diff --git a/docs/errors-encountered.md b/docs/errors-encountered.md new file mode 100644 index 000000000..c78cb0b16 --- /dev/null +++ b/docs/errors-encountered.md @@ -0,0 +1,65 @@ +# Errors Encountered in the Past + +## General Errors + +### `Forbidden: null` + +Usually, "Forbidden" with no additional details comes when you attempt to read +S3 objects that are SSE-encrypted, but you don't have permissions to decrypt +using the KMS key that encrypted the object; or when you attempt to read an +object from S3 that does not exist, or when you simply don't have the +appropriate IAM permissions for. + +If you see this error, try checking that IAM permissions are configured +correctly for the respective backend component (including policies on VPC +resources if Construct Hub is running in a VPC, etc.). + +## Transliteration Task Errors + +### Running Out of File Descriptors (`ENOFILE`) + +The Transliterator task in particular has been susceptible to running out of +file descriptors in the past, making the task extremely slow, or causing it to +fail or time out (sending the StepFunctions heart beat requires opening a +network connection, which requires at least 1 available file descriptor). + +In order to determine where file descriptors are going, tasks can be configured +to have `lsof` run on each heartbeat tick, which will display the list of all +open files to `STDOUT`, which will be visible in the task's log. + +To enable this feature, the task input must contain an +`env.RUN_LSOF_ON_HEARTBEAT` key with a string value (the value is arbitrary, but +must be truthy for Javascript - so non-empty - for the logging to be enabled). + +In the case of the Transliterator task, the command includes the entire state +machine's input object, so one can simply re-run the state machine after having +merged the following into the state machine input object: + +```json +{ + "env": { + "RUN_LSOF_ON_HEARTBEAT": "YES" + } +} +``` + +### Missing Files + +Esbuild bundling does not allow dynamically requiring dependencies. As an example, +the following code snippet is incompatible with esbuild's bundling: + +```ts +require('./commands').forEach(function (command) { + require('./src/' + command); +}); +``` + +In one instance, a dependency upgrade introduced a new dependency that was performing +a dynamic require. By default, the dynamic require error in esbuild is suppressed. +As a result, the bundle used in the Transliterator task was missing files and was +failing on start-up. + +If you see Transliterator task failures where the stack trace points to missing files, +this may be a result of a dynamic require being used. It is recommended that you +look at any dependency upgrades and whether they introduced a new dependency that +might be using a dynamic require. diff --git a/docs/operator-runbook.md b/docs/operator-runbook.md index 56d8b310d..8008bc0ef 100644 --- a/docs/operator-runbook.md +++ b/docs/operator-runbook.md @@ -736,43 +736,6 @@ ECS tasks emit logs into CloudWatch under a log group called in its name and the log stream `transliterator/Resource/$TASKID` (e.g. `transliterator/Resource/6b5c48f0a7624396899c6a3c8474d5c7`). -## Errors encountered in the past +## Appendix -### `Forbidden: null` - -Usually, "Forbidden" with no additional details comes when you attempt to read -S3 objects that are SSE-encrypted, but you don't have permissions to decrypt -using the KMS key that encrypted the object; or when you attempt to read an -object from S3 that does not exist, or when you simply don't have the -appropriate IAM permissions for. - -If you see this error, try checking that IAM permissions are configured -correctly for the respective backend component (including policies on VPC -resources if Construct Hub is running in a VPC, etc.). - -### Running out of file descriptors (`ENOFILE`) - -The Transliterator task in particular has been susceptible to running out of -file descriptors in the past, making the task extremely slow, or causing it to -fail or time out (sending the StepFunctions heart beat requires opening a -network connection, which requires at least 1 available file descriptor). - -In order to determine where file descriptors are going, tasks can be configured -to have `lsof` run on each heartbeat tick, which will display the list of all -open files to `STDOUT`, which will be visible in the task's log. - -To enable this feature, the task input must contain an -`env.RUN_LSOF_ON_HEARTBEAT` key with a string value (the value is arbitrary, but -must be truthy for Javascript - so non-empty - for the logging to be enabled). - -In the case of the Transliterator task, the command includes the entire state -machine's input object, so one can simply re-run the state machine after having -merged the following into the state machine input object: - -```json -{ - "env": { - "RUN_LSOF_ON_HEARTBEAT": "YES" - } -} -``` +1. [Errors encountered in the past](./errors-encountered.md)