Best practices for bash scripting: a quick guide to reliable and performant bash scripting



Shell wallpaper by manapi



Debugging bash scripts is like looking for a needle in a haystack, especially when new additions appear in an existing codebase without timely consideration of structure, logging and reliability issues. You can find yourself in such situations both because of your own mistakes and when managing complex jumbles of scripts.



The Mail.ru Cloud Solutions team hastranslated an article with guidelines that will help you write, debug, and maintain your scripts better. Believe it or not, nothing beats the satisfaction of writing clean, ready-to-use bash code that works every time.



In this article, the author shares what he has learned over the past few years, as well as some common mistakes that have caught him off guard. This is important because every software developer, at some point in their careers, works with scripts to automate routine work tasks.



Trap handlers



Most bash scripts I've come across have never used an efficient cleanup mechanism when something unexpected happens during script execution.



Unexpected things can arise from the outside, for example, receiving a signal from the kernel. Handling such cases is extremely important to ensure that scripts are robust enough to run on production systems. I often use exit handlers to respond to scenarios like this:



function handle_exit() {
  // Add cleanup code here
  // for eg. rm -f "/tmp/${lock_file}.lock"
  // exit with an appropriate status code
}
  
// trap <HANDLER_FXN> <LIST OF SIGNALS TO TRAP>
trap handle_exit 0 SIGHUP SIGINT SIGQUIT SIGABRT SIGTERM


trapIs a built-in shell command that helps you register a cleanup function to be called in case of any signals. However, special care should be taken with handlers such as those SIGINTthat interrupt the script.



Also, in most cases you should only catch EXIT, but the idea is that you can actually customize the behavior of the script for each individual signal.



Set Built-in Functions - Quick Exit on Error



It is very important to react to errors as soon as they occur and to stop execution quickly. Nothing could be worse than continuing with a command like this:



rm -rf ${directory_name}/*


Please note that the variable directory_nameis undefined.



To handle such scenarios, it is important to use built-in functions setsuch as set -o errexit, set -o pipefailor set -o nounsetat the beginning of the script. These functions ensure that your script exits as soon as it encounters any non-zero exit code, undefined variables, invalid piped commands, and so on:



#!/usr/bin/env bash

set -o errexit
set -o nounset
set -o pipefail

function print_var() {
  echo "${var_value}"
}

print_var

$ ./sample.sh
./sample.sh: line 8: var_value: unbound variable


Note: built-in functions such as set -o errexitwill exit the script as soon as a "raw" return code (other than zero) appears. Therefore it is better to introduce custom error handling like:



#!/bin/bash
error_exit() {
  line=$1
  shift 1
  echo "ERROR: non zero return code from line: $line -- $@"
  exit 1
}
a=0
let a++ || error_exit "$LINENO" "let operation returned non 0 code"
echo "you will never see me"
# run it, now we have useful debugging output
$ bash foo.sh
ERROR: non zero return code from line: 9 -- let operation returned non 0 code


Scripting like this forces you to be more careful about the behavior of all commands in the script and to anticipate the possibility of an error before it is caught by surprise.



ShellCheck for catching errors during development



It's worth integrating something like ShellCheck into your development and test pipelines to validate your bash code for best practices.



I use it in my local development environments to get reports on syntax, semantics, and some code errors that I might have missed during development. It is a static analysis tool for your bash scripts and I highly recommend using it.



Using your exit codes



POSIX return codes are not just zero or one, but zero or non-zero. Use these features to return custom error codes (between 201-254) for different error cases.



This information can then be used by other scripts that wrap yours to understand exactly what type of error occurred and react accordingly:



#!/usr/bin/env bash

SUCCESS=0
FILE_NOT_FOUND=240
DOWNLOAD_FAILED=241

function read_file() {
  if ${file_not_found}; then
    return ${FILE_NOT_FOUND}
  fi
}


Note: Please be especially careful with the variable names you define to avoid accidentally overriding environment variables.



Logger functions



Nice and structured logging is important to easily understand the results of your script execution. As with other high-level programming languages, I always use my own logging functions in my bash scripts such as __msg_info, __msg_errorand so on.



This helps to provide a standardized logging structure by making changes in only one place:



#!/usr/bin/env bash

function __msg_error() {
    [[ "${ERROR}" == "1" ]] && echo -e "[ERROR]: $*"
}

function __msg_debug() {
    [[ "${DEBUG}" == "1" ]] && echo -e "[DEBUG]: $*"
}

function __msg_info() {
    [[ "${INFO}" == "1" ]] && echo -e "[INFO]: $*"
}

__msg_error "File could not be found. Cannot proceed"

__msg_debug "Starting script execution with 276MB of available RAM"


I usually try to have some kind of mechanism in my scripts __initwhere such logger variables and other system variables are initialized or set to default values. These variables can also be set from command line parameters during script invocation.



For example, something like:



$ ./run-script.sh --debug


When such a script is executed, it is guaranteed that the system-wide settings are set to their defaults, if required, or at least initialized with something appropriate, if necessary.



I usually base my choice of what to initialize and what not to be a trade-off between the user interface and the details of the configuration that the user can / should delve into.



Architecture for reuse and clean system state



Modular / reusable code



├── framework
│   ├── common
│   │   ├── loggers.sh
│   │   ├── mail_reports.sh
│   │   └── slack_reports.sh
│   └── daily_database_operation.sh


I keep a separate repository that I can use to initialize a new bash project / script I want to develop. Anything that can be reused can be stored in the repository and retrieved in other projects that want to use this functionality. This organization of projects significantly reduces the size of other scripts and also ensures that the codebase is small and easily testable.



As in the above example, all the logging functions, such as __msg_info, __msg_errorand others, such as reports by Slack, held separately in common/*and dynamically connect to other scenarios, such as daily_database_operation.sh.



Leave a clean system behind



If you load some resources while the script is running, it is recommended to keep all such data in a shared directory with a random name, for example /tmp/AlRhYbD97/*. You can use random text generators to choose a directory name:



rand_dir_name="$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 16 | head -n 1)"


Upon completion of the work, the cleanup of such directories can be provided in the hook handlers discussed above. If you don't take care of deleting temporary directories, they accumulate and at some point cause unexpected problems on the host, such as a full disk.



Using lock files



Often, you need to ensure that only one instance of a script is running on a host at any given time. This can be done using lock files.



I usually create lock files in /tmp/project_name/*.lockand check for their presence at the beginning of the script. This helps to correctly terminate the script and avoid unexpected system state changes by another script running in parallel. Lock files are not needed if you need the same script to run in parallel on a given host.



Measure and improve



We often have to work with scripts that run over a long period of time, such as daily database operations. Such operations usually include a sequence of steps: loading data, checking for anomalies, importing data, sending status reports, and so on.



In such cases, I always try to break the script into separate small scripts and report their state and execution time with:



time source "${filepath}" "${args}">> "${LOG_DIR}/RUN_LOG" 2>&1


Later I can see the runtime with:



tac "${LOG_DIR}/RUN_LOG.txt" | grep -m1 "real"


This helps me identify problem / slow areas in scripts that need optimization.



Good luck!



What else to read:



  1. Go and GPU caches.
  2. An example of an event-driven application based on webhooks in the S3 object storage Mail.ru Cloud Solutions.
  3. Our telegram channel about digital transformation.



All Articles