Automating Linux Server Health Checks with a Bash Script

Automating Linux server health and performance checks with a Bash Script for example is a crucial task for system administrators and operations engineers. Regular checks help ensure that systems are running optimally, hardware issues are identified early, and potential problems are addressed before they escalate. However, manually performing these checks can be time-consuming and error-prone, especially when managing multiple servers.

Automation provides an efficient solution. In this article, we’ll show you how to use a simple yet powerful Bash script to automate essential server health checks. This script gathers critical information such as system details, CPU and memory usage, disk space, network activity, and more, providing a comprehensive snapshot of your server’s status in one execution.

Whether you’re troubleshooting issues, preparing for audits, or simply want peace of mind knowing your Linux server is in good shape, this script can save you time and effort. Read on to learn how it works and how you can customize it for your specific needs.

Purpose of the Script

Having quick and reliable access to system information is crucial for maintaining server health and performance. This Bash script is designed to automate the collection of essential system metrics, such as CPU usage, memory usage, disk space, and network information.

Which informations are Collected

CPU Information:
- Model, cores, and speed.

lscpu
cat /proc/cpuinfo

RAM Memory Information:
- Total and available RAM.

free -h

CPU Usage (1, 5, 15 Minutes):
- Load averages.

uptime

RAM Memory Usage:
- Detailed RAM usage.

free -h

HDD Usage (Capacity, Usage, Free):
- Disk space usage.

df -ih

Network Information:
- IP addresses and interface status.

ifconfig
# or
ip a

Additional Information to Consider

Process Information:
- List top memory and CPU consuming processes.

top -b -n 1 | head -n 20

Swap Usage:
- Swap space details.

swapon --show free -h

Temperature Monitoring -> it will not work on all servers!
- You need to install lm-sensors package
  - sudo apt install lm-sensors !
- CPU and system temperatures (if sensors are available).

# Install package
sudo apt install lm-sensors

# Check the output
sensors

Service Status:
- Status of critical services (e.g., web server, database).

systemctl status <service_name>

Network Connections:
- Current network connections and listening ports.

netstat -tuln # or ss -tuln

Uptime and System Load:
- Detailed uptime and load averages.

uptime

Filesystem Inode Usage:
- Inode usage for each mounted filesystem.

df -i

Dmesg Logs:
- Recent kernel logs for hardware and driver messages.

dmesg | tail -n 20

User Logins:
- Current user sessions and recent login attempts.

who # or
last

Installed Updates:
- List of installed updates and available updates.

apt list --upgradable

Script for Health Check

Save the script:

nano sysreport.sh

#!/bin/bash

echo "==== System Information ===="
echo "Hostname: $(hostname)"
echo "Date and Time: $(date)"
echo

echo "==== CPU Information ===="
lscpu | grep -E '^Model name|^CPU\(s\):|^Thread|^Core|^Socket|^NUMA|^CPU MHz|^Architecture'
echo

echo "==== RAM Information ===="
free -h
echo

echo "==== CPU Load Averages ===="
uptime
echo

echo "==== Disk Usage ===="
df -h
echo

echo "==== Swap Usage ===="
swapon --show
echo

echo "==== Network Interfaces ===="
ip a
echo

echo "==== Top Processes ===="
top -b -n 1 | head -n 20
echo

echo "==== Current Network Connections ===="
ss -tuln
echo

echo "==== Temperature ===="
sensors
echo

echo "==== Service Status (e.g., ssh) ===="
systemctl status ssh --no-pager
echo

echo "==== Recent Kernel Messages ===="
dmesg | tail -n 20
echo

echo "==== Recent Logins ===="
last -n 5
echo

echo "==== Available Updates ===="
apt list --upgradable
echo

Make it executable:

chmod +x sysreport.sh

Execute the script:

bash sysreport.sh

Daily run the system report script

If you need your daily Systemreport check use the following script and the Crontab:

nano sysreport.sh

#!/bin/bash

LOG_DIR="/opt/scripts/log"
LOG_FILE="$LOG_DIR/sysreport_$(date +%F).log"

# Ensure the log directory exists
mkdir -p "$LOG_DIR"

exec > "$LOG_FILE" 2>&1

echo "==== System Information ===="
echo "Hostname: $(hostname)"
echo "Date and Time: $(date)"
echo

echo "==== CPU Information ===="
lscpu | grep -E '^Model name|^CPU\(s\):|^Thread|^Core|^Socket|^NUMA|^CPU MHz|^Architecture'
echo

echo "==== RAM Information ===="
free -h
echo

echo "==== CPU Load Averages ===="
uptime
echo

echo "==== Disk Usage ===="
df -h
echo

echo "==== Swap Usage ===="
swapon --show
echo

echo "==== Network Interfaces ===="
ip a
echo

echo "==== Top Processes ===="
top -b -n 1 | head -n 20
echo

echo "==== Current Network Connections ===="
ss -tuln
echo

echo "==== Temperature ===="
sensors
echo

echo "==== Service Status (e.g., ssh) ===="
systemctl status ssh --no-pager
echo

echo "==== Recent Kernel Messages ===="
dmesg | tail -n 20
echo

echo "==== Recent Logins ===="
last -n 5
echo

echo "==== Available Updates ===="
apt list --upgradable
echo

Make it executable: