Jakski's blog

Error handling in Bash

2023-01-03

Undesired behaviour - an error, can manifest in various ways. In most high-level programming languages errors are represented as dedicated objects, allowing to handle them without interrupting program execution. They usually require developer to explicitly mark bad program flows by throwing or raising problem description. Bash don't provide any special syntax for this. Instead it allows to hook custom actions when non-zero exit code from command occurs. It's not universally treated as an error, since some applications use more than one exit code to signal success. Yet Bash contains some builtin commands and options named after error for handling such cases.

Unofficial strict mode

So called unofficial strict mode consist of the following options:

set -o errexit
set -o nounset
set -o pipefail

and is usually shortened to set -euo pipefail. It's probably the easiest for starter, but it doesn't provide much information about error when it actually happens.

NOTE: Even with above flags Bash will ignore some problems like non-zero exit code from subshell invoked in builtin command invocation. It's the best to always check scripts additionally with tools like Shellcheck and review Bash Pitfalls wiki.

Using traps

When everything else fails, script can be always launched with bash -x or set -o xtrace, but it can be overkill for bigger solutions and might reveal confidential data from variables. Another approach requires to define trap functions. They act as hooks and run only on POSIX signals or when some specific condition is met. See example with ERR trap signal below:

#!/usr/bin/env bash

set -euo pipefail

on_error() {
    declare exit_code=$?
    echo "Something went wrong!" >&2
    exit "$exit_code"
}

trap on_error ERR

ls /nonexistent

Upon execution it prints:

ls: cannot access '/nonexistent': No such file or directory
Something went wrong!

ls might not be the best example here, since it describes what went wrong by itself, but imagine checking for TCP port availability with OpenBSD netcat:

# ...

nc -zw 3 127.0.0.1 22

We wouldn't know that there was error, unless we printed last exit code echo $?. With ERR trap useful message appears without any further interaction and program stops, if we also use errexit.

There's edge case in above solution when we use functions. They don't inherit ERR trap, unless we use errtrace option. Now strict mode looks like this set -eEuo pipefail.

It's usually enough for small scripts, but we have to guess why program stopped, if error can appear in multiple places. BASH_COMMAND can be used to show exactly which command caused script to fail:

#!/usr/bin/env bash

set -eEuo pipefail

on_error() {
    declare \
        cmd=$BASH_COMMAND \
        exit_code=$?
    echo "Failing with exit code ${exit_code} at ${*} in command: ${cmd}" >&2
    exit "$exit_code"
}

trap on_error ERR

function_a() {
    nc -zw 3 127.0.0.1 22
}

function_b() {
    function_a
}

function_b

It save us some time when error occurs without xtrace:

Failing with exit code 1 at  in command: nc -zw 3 127.0.0.1 22

Printing call traces

Additionally Bash provides some diagnostic variables defined as arrays, which can be used to locate where error happened:

BASH_SOURCE - Source file of currently executed function. It's not always available, since scripts can be sourced from arbitrary file descriptors.
FUNCNAME - Currently executed function name.
BASH_LINENO - Line number from source file.

They can be iterated to generate a call trace:

#!/usr/bin/env bash

set -eEuo pipefail

on_error() {
    declare \
        cmd=$BASH_COMMAND \
        exit_code=$? \
        i=0 \
        end="${#FUNCNAME[@]}" \
        next
    end=$((end - 1))
    echo "Failing with exit code ${exit_code} in command: ${cmd}" >&2
    while [ "$i" != "$end" ]; do
        next=$((i + 1))
        echo "	${BASH_SOURCE["$next"]:-}:${BASH_LINENO["$i"]}:${FUNCNAME["$next"]}" >&2
        i=$next
    done
    exit "$exit_code"
}

trap on_error ERR
# Source wrapped_ls function
source ./t2.sh

function_a() {
    nc -zw 3 127.0.0.1 22
}

function_b() {
    wrapped_ls /asdf
}

function_b

Running prints:

$ ./t.sh
ls: cannot access '/asdf': No such file or directory
Failing with exit code 2 in command: ls "$1"
  ./t2.sh:2:wrapped_ls
  ./t.sh:31:function_b
  ./t.sh:34:main

There's one problem with this approach. Bash allows to throw custom error message, when variable is null or not set with syntax ${varname:?"message"}. It does end program, but it doesn't trigger ERR trap. To handle this properly we need to replace ERR trap with EXIT. It runs unconditionally when script ends, so exit code needs to be additionally checked. We no longer need errtrace(-E) option in this case.

#!/usr/bin/env bash

set -euo pipefail

on_exit() {
    declare \
        cmd=$BASH_COMMAND \
        exit_code=$? \
        i=0 \
        end="${#FUNCNAME[@]}" \
        next
    if [ "$exit_code" = 0 ]; then
        return 0
    fi
    end=$((end - 1))
    echo "Failing with exit code ${exit_code} in command: ${cmd}" >&2
    while [ "$i" != "$end" ]; do
        next=$((i + 1))
        echo "	${BASH_SOURCE["$next"]:-}:${BASH_LINENO["$i"]}:${FUNCNAME["$next"]}" >&2
        i=$next
    done
    exit "$exit_code"
}

trap on_exit EXIT
# Source wrapped_ls function
source ./t2.sh

function_a() {
    nc -zw 3 127.0.0.1 22
}

function_b() {
    # Force exit with undefined variable.
    : "${dgfdfg:?}"
    wrapped_ls /asdf
}

function_b

Handling non-zero exit codes

It's usually not the issue, since commands used in conditionals are excluded from errexit and ERR trap. Difficult cases are when we want to use functions or pipe commands.

In functions

It's tempting to use function directly in conditional. Both full syntax and &&/|| turn off errexit and ERR trap. They turn them off globally! It means that even nested function calls won't properly terminate on error. This is almost never a desired behaviour, so instead we have to wrap function call in subshell. It's highly suboptimal solution, since a new process gets spawned, but it's still better than not handling errors at all.

#!/usr/bin/env bash

set -euo pipefail

on_exit() {
    # ...
}

trap on_exit EXIT

get_exit_code() {
    set +e
    (
        set -e
        "$@"
    )
    echo "$?"
    set -e
}

function_a() {
    nc -zw 1 127.0.0.1 80
    return 0
}

function_b() {
    function_a
    return 0
}

if function_b; then
    echo "Expected non-zero!"
fi

declare i
i=$(get_exit_code function_b)
if [ "$i" = 0 ]; then
    echo "Expected non-zero, but got: ${i}"
fi

Running gives:

$ bash t.sh
Expected non-zero!

assuming port 80 is not listening.

In pipelines

One way to handle non-standard error codes in pipelines is to wrap invocations in functions and use builtin return to signal actual error. Defining a new function can be skipped, if command list is used with syntax { } or if short conditional is used.

# ...

nc -zw 1 127.0.0.1 80 || [ "$?" = 1 ] \
    | cat

{
    declare i=0
    nc -zw 1 127.0.0.1 80 || i=$?
    echo "Returned ${i}"
    if [ "$i" != 1 ]; then
        exit 1
    fi
} \
    | cat

Running above gives:

# bash t.sh
Returned 1