# 2023-01-03 Error handling in Bash

Undesired behaviour - an error, can manifest in various ways. In most high-level
programming languages errors are represented as dedicated objects, allowing to
handle them without interrupting program execution. They usually require
developer to explicitly mark bad program flows by throwing or raising problem
description. Bash don't provide any special syntax for this. Instead it allows
to hook custom actions when non-zero exit code from command occurs. It's not
universally treated as an error, since some applications use more than one exit
code to signal success. Yet Bash contains some builtin commands and options named
after _error_ for handling such cases.

## Unofficial strict mode

So called _unofficial strict mode_ consist of the following options:

```
set -o errexit
set -o nounset
set -o pipefail
```

and is usually shortened to `set -euo pipefail`. It's probably the easiest for
starter, but it doesn't provide much information about error when it actually
happens.

> **NOTE**: Even with above flags Bash will ignore some problems like non-zero
> exit code from subshell invoked in builtin command invocation. It's the best
> to always check scripts additionally with tools like
> [Shellcheck](https://www.shellcheck.net/) and review [Bash Pitfalls
> wiki](https://mywiki.wooledge.org/BashPitfalls).

## Using traps

When everything else fails, script can be always launched with `bash -x` or `set
-o xtrace`, but it can be overkill for bigger solutions and might reveal
confidential data from variables. Another approach requires to define trap
functions. They act as hooks and run only on POSIX signals or when some specific
condition is met. See example with `ERR` trap signal below:

```shell
##!/usr/bin/env bash

set -euo pipefail

on_error() {
	declare exit_code=$?
	echo "Something went wrong!" >&2
	exit "$exit_code"
}

trap on_error ERR

ls /nonexistent
```

Upon execution it prints:

```
ls: cannot access '/nonexistent': No such file or directory
Something went wrong!
```

`ls` might not be the best example here, since it describes what went wrong by
itself, but imagine checking for TCP port availability with OpenBSD `netcat`:

```
# ...

nc -zw 3 127.0.0.1 22
```

We wouldn't know that there was error, unless we printed last exit code `echo
$?`. With `ERR` trap useful message appears without any further interaction and
program stops, if we also use `errexit`.

There's edge case in above solution when we use functions. They don't inherit
`ERR` trap, unless we use `errtrace` option. Now strict mode looks like this
`set -eEuo pipefail`.

It's usually enough for small scripts, but we have to guess why program
stopped, if error can appear in multiple places. `BASH_COMMAND` can be used to
show exactly which command caused script to fail:

```
#!/usr/bin/env bash

set -eEuo pipefail

on_error() {
	declare \
		cmd=$BASH_COMMAND \
		exit_code=$?
	echo "Failing with exit code ${exit_code} at ${BASH_SOURCE[0]}:${FUNCNAME[0]} in command: ${cmd}" >&2
	exit "$exit_code"
}

trap 'on_error ${BASH_SOURCE[0]}:${BASH_LINENO[0]}' ERR

function_a() {
	nc -zw 3 127.0.0.1 22
}

function_b() {
	function_a
}

function_b
```

It save us some time when error occurs without `xtrace`:

```
Failing with exit code 1 at /home/jakub/d/kb/other/t.sh:20 in command: nc -zw 3 127.0.0.1 22
```

## Printing call traces

Additionally Bash provides some diagnostic variables defined as arrays, which
can be used to locate where error happened:

- `BASH_SOURCE` - Source file of currently executed function. It's not always
  available, since scripts can be sourced from arbitrary file descriptors.
- `FUNCNAME` - Currently executed function name.
- `BASH_LINENO` - Line number from source file.

They can be iterated to generate a call trace:

```shell
##!/usr/bin/env bash

set -eEuo pipefail

on_error() {
	declare \
		cmd=$BASH_COMMAND \
		exit_code=$? \
		i=0 \
		end="${#FUNCNAME[@]}" \
		next
	end=$((end - 1))
	echo "Failing with exit code ${exit_code} in command: ${cmd}" >&2
	while [ "$i" != "$end" ]; do
		next=$((i + 1))
		echo "	${BASH_SOURCE["$next"]:-}:${BASH_LINENO["$i"]}:${FUNCNAME["$next"]}" >&2
		i=$next
	done
	exit "$exit_code"
}

trap on_error ERR
# Source wrapped_ls function
source ./t2.sh

function_a() {
	nc -zw 3 127.0.0.1 22
}

function_b() {
	wrapped_ls /asdf
}

function_b
```

Running prints:

```
$ ./t.sh
ls: cannot access '/asdf': No such file or directory
Failing with exit code 2 in command: ls "$1"
  ./t2.sh:2:wrapped_ls
  ./t.sh:31:function_b
  ./t.sh:34:main
```

There's one problem with this approach. Bash allows to throw custom error
message, when variable is null or not set with syntax `${varname:?"message"}`.
It does end program, but it doesn't trigger `ERR` trap. To handle this properly
we need to replace `ERR` trap with `EXIT`. It runs unconditionally when script
ends, so exit code needs to be additionally checked. We no longer need
`errtrace`(`-E`) option in this case.

```shell
#!/usr/bin/env bash

set -euo pipefail

on_exit() {
	declare \
		cmd=$BASH_COMMAND \
		exit_code=$? \
		i=0 \
		end="${#FUNCNAME[@]}" \
		next
	if [ "$exit_code" = 0 ]; then
		return 0
	fi
	end=$((end - 1))
	echo "Failing with exit code ${exit_code} in command: ${cmd}" >&2
	while [ "$i" != "$end" ]; do
		next=$((i + 1))
		echo "	${BASH_SOURCE["$next"]:-}:${BASH_LINENO["$i"]}:${FUNCNAME["$next"]}" >&2
		i=$next
	done
	exit "$exit_code"
}

trap on_exit EXIT
# Source wrapped_ls function
source ./t2.sh

function_a() {
	nc -zw 3 127.0.0.1 22
}

function_b() {
	# Force exit with undefined variable.
	: "${dgfdfg:?}"
	wrapped_ls /asdf
}

function_b
```

## Handling non-zero exit codes

It's usually not the issue, since commands used in conditionals are excluded
from `errexit` and `ERR` trap. Difficult cases are when we want to use functions
or pipe commands.

### In functions

It's tempting to use function directly in conditional. Both full syntax and
`&&`/`||` turn off `errexit` and `ERR` trap. They turn them off *globally*! It
means that even nested function calls won't properly terminate on error. This is
almost never a desired behaviour, so instead we have to wrap function call in
subshell. It's highly suboptimal solution, since a new process gets spawned, but
it's still better than not handling errors at all.

```shell
#!/usr/bin/env bash

set -euo pipefail

on_exit() {
	# ...
}

trap on_exit EXIT

get_exit_code() {
	set +e
	(
		set -e
		"$@"
	)
	echo "$?"
	set -e
}

function_a() {
	nc -zw 1 127.0.0.1 80
	return 0
}

function_b() {
	function_a
	return 0
}

if function_b; then
	echo "Expected non-zero!"
fi

declare i
i=$(get_exit_code function_b)
if [ "$i" = 0 ]; then
	echo "Expected non-zero, but got: ${i}"
fi
```

Running gives:

```
$ bash t.sh
Expected non-zero!
```

assuming port 80 is not listening.

### In pipelines

One way to handle non-standard error codes in pipelines is to wrap invocations
in functions and use builtin `return` to signal actual error. Defining a new
function can be skipped, if command list is used with syntax `{ }` or if short
conditional is used.

```shell
# ...

nc -zw 1 127.0.0.1 80 || [ "$?" = 1 ] \
	| cat

{
	declare i=0
	nc -zw 1 127.0.0.1 80 || i=$?
	echo "Returned ${i}"
	if [ "$i" != 1 ]; then
		exit 1
	fi
} \
	| cat
```

Running above gives:

```
# bash t.sh
Returned 1
```