2021-11-18 Low effort website monitoring¶
While the most lazy website monitoring implementation would be probably to check
it in cron
with curl
every few minutes and send message on outage, I’m going
to present a slightly more sophisticated solution to make it a little more
usable. Note that idea below probably isn’t be the best for long term, but
often any solution is better than no solution at all.
Requirements¶
By operator I mean person responsible for service reliability. Ideally that would be someone involved in website development or hosting maintainer.
Protip: Notifications are the most effective when receiver knows how to react on them.
Operator must be notified on outage.
Monitoring must not rely on website hosting availability. I’m talking about single node deployments here. Operator must be informed not only when check failed, but also when check wasn’t performed.
Monitoring must provide reason for notification. Messages like It’s not working aren’t very insightful.
Implementation must be simple, lightweight and cheap. Preferably allowing self-hosting and open-source.
Incremental development¶
Let’s assume that we’re dealing with website hosted on a cheap GNU/Linux VPS instance that consists of application exposed via HTTPS. It makes sense to check liveness at minimum. For now we don’t care, if our application performs well. Availability is our only concern.
Basic HTTP service check with notification¶
To keep things simple and increase reliability we will keep service checking and reporting/notifying logic separately.
Service checking - Since shell is usually available out-of-the-box we will build service checks using widely available command line tools.
Reporting/notifying - This is a bit harder. Keep in mind that we need notifications also when service checking hasn’t finished on time. Hosting anything on the same instance as application will not be sufficient. Using external service might introduce vendor lock-in, but we want it cheap. Fortunately there’s some middle ground like Healthchecks - an open source monitoring platform, also hosted as a service with free subscription tier.
Using Healthchecks means that we will need to notify external service about
service check results. curl
will do the job here. Let’s start with some
minimal script:
#!/usr/bin/env bash
CURL_OPTIONS="--max-time 5 --retry 3 --fail --silent --show-error"
SERVICE_URL=https://app.local/
HEALTCHECK_URL=https://hc-ping.com/82583301-bd04-4028-9695-fc5473abc15d
curl $CURL_OPTIONS "$SERVICE_URL"
curl $CURL_OPTIONS "${HEALTCHECK_URL}/${?}"
Above script will send it’s status code in ping to Healthchecks, where we
should configure notifications(Telegram integration works well) and Schedule -
it’s crucial to receive alert when our service is not sending any pings, it
may mean that whole VPS is down. Yet we still don’t have context what exactly
went wrong. Let’s use Bash strict
mode and change
script to send curl
error text in ping:
#!/usr/bin/env bash
set -euo pipefail
CURL_OPTIONS="--max-time 5 --retry 3 --fail --silent --show-error"
SERVICE_URL=https://app.local/
HEALTCHECK_URL=https://hc-ping.com/82583301-bd04-4028-9695-fc5473abc15d
RESULT=""
on_exit() {
curl $CURL_OPTIONS --data-raw "$RESULT" "${HEALTCHECK_URL}/${?}" >/dev/null
}
main () {
trap on_exit EXIT
RESULT=$(curl $CURL_OPTIONS "$SERVICE_URL" 2>&1 1>/dev/null)
}
main "$@"
Now operator will be notified about possible outages and alert will include some context what exactly went wrong like:
curl: (7) Failed to connect to app.local port 443: Connection refused
Service is available, but does it perform well?¶
While performance can be measured in many ways, we will focus on one simple
metric: response time. curl
can measure it out-of-the-box, but we will need
to invoke it differently and format output message:
# on_exit version supporting custom EXIT_CODE
on_exit() {
local exit_code=$?
if [ -n "$EXIT_CODE" ]; then
exit_code=$EXIT_CODE
fi
curl $CURL_OPTIONS --data-raw "$RESULT" "${HEALTCHECK_URL}/${exit_code}" >/dev/null
exit "$exit_code"
}
# Alert, if curl failed or response takes longer than 2 seconds
CHECK_HTTP_TIMEOUT=2
main () {
trap on_exit EXIT
RESULT="Response time: $(curl $CURL_OPTIONS "$SERVICE_URL" -o /dev/null --write-out '%{time_total}' 2>&1)"
local time_total
time_total=$(echo "$RESULT" | rev | cut -d ' ' -f 1 | rev)
echo "$time_total"
if [ "$(echo "${time_total} > ${CHECK_HTTP_TIMEOUT}" | bc)" -eq 1 ]; then
RESULT="${RESULT}"$'\n'"Too high response time!"
EXIT_CODE=1
fi
}
What about other metrics?¶
Healthchecks limits us to 10 kilobytes payload for each ping, but remember that we want to keep it simple, so it’s not much of a problem. Rearranging our script into self-descriptive functions will be useful to include more checks:
# Trigger alert when used memory exceeds 95%
CHECK_MEMORY_THRESHOLD=95
check_memory() {
RESULT="${RESULT}"$'\n'"Memory usage: $(exec 2>&1; free | grep Mem | awk '{print $3/$2 * 100.0}')"
local used
used=$(echo "$RESULT" | tail -n 1 | rev | cut -d ' ' -f 1 | rev)
if [ "$(echo "${used} > ${CHECK_MEMORY_THRESHOLD}" | bc)" -eq 1 ]; then
RESULT="${RESULT}"$'\n'"Too high memory usage!"
EXIT_CODE=1
fi
}
main () {
trap on_exit EXIT
check_http
check_memory
}
Observing SSL expiration date can be enclosed in single Bash function as well:
CHECK_SSL_DAYS=90
check_ssl() {
local host
host=$(echo "$SERVICE_URL" | cut -d '/' -f 3)
RESULT="${RESULT}"$'\n'"SSL certificate expiration date: $(
exec 2>&1; echo |
openssl s_client -servername "$host" -connect "${host}:443" 2>/dev/null |
openssl x509 -noout -enddate |
cut -d '=' -f 2
)"
local end_seconds
end_seconds=$(("$CHECK_SSL_DAYS" * 24 * 60 * 60))
if ! echo |
openssl s_client -servername "$host" -connect "${host}:443" 2>/dev/null |
openssl x509 -noout -checkend "$end_seconds" >/dev/null; then
RESULT="${RESULT}"$'\n'"Certificate will expire soon!"
EXIT_CODE=1
fi
}
Conclusion¶
It took us ~100 lines of Bash to check:
Service availability via HTTP
Service response time
Service SSL certificate expiration date
Instance memory usage
We can run this script every minute from cron:
* * * * * /opt/low-effort-monitoring.sh
Of course it’s not the most beautiful and performance oriented way of monitoring website, but it serves the purpose. No special software nor planning is required here, so it may be a great ad-hoc solution while you plan long term monitoring system like Icinga2, Prometheus or Zabbix.