Automating Linux Server Health Checks with a Bash Script
Automating Linux server health and performance checks with a Bash Script for example is a crucial task for system administrators and operations engineers. Regular checks help ensure that systems are running optimally, hardware issues are identified early, and potential problems are addressed before they escalate. However, manually performing these checks can be time-consuming and error-prone, especially when managing multiple servers.
Automation provides an efficient solution. In this article, we’ll show you how to use a simple yet powerful Bash script to automate essential server health checks. This script gathers critical information such as system details, CPU and memory usage, disk space, network activity, and more, providing a comprehensive snapshot of your server’s status in one execution.
Whether you’re troubleshooting issues, preparing for audits, or simply want peace of mind knowing your Linux server is in good shape, this script can save you time and effort. Read on to learn how it works and how you can customize it for your specific needs.
Purpose of the Script
Having quick and reliable access to system information is crucial for maintaining server health and performance. This Bash script is designed to automate the collection of essential system metrics, such as CPU usage, memory usage, disk space, and network information.
Which informations are Collected
- CPU Information:
- Model, cores, and speed.
lscpu
cat /proc/cpuinfo
- RAM Memory Information:
- Total and available RAM.
free -h
- CPU Usage (1, 5, 15 Minutes):
- Load averages.
uptime
- RAM Memory Usage:
- Detailed RAM usage.
free -h
- HDD Usage (Capacity, Usage, Free):
- Disk space usage.
df -ih
- Network Information:
- IP addresses and interface status.
ifconfig
# or
ip a
Additional Information to Consider
- Process Information:
- List top memory and CPU consuming processes.
top -b -n 1 | head -n 20
- Swap Usage:
- Swap space details.
swapon --show free -h
- Temperature Monitoring -> it will not work on all servers!
- You need to install
lm-sensors
packagesudo apt install lm-sensors
!
- CPU and system temperatures (if sensors are available).
- You need to install
# Install package
sudo apt install lm-sensors
# Check the output
sensors
- Service Status:
- Status of critical services (e.g., web server, database).
systemctl status <service_name>
- Network Connections:
- Current network connections and listening ports.
netstat -tuln # or ss -tuln
- Uptime and System Load:
- Detailed uptime and load averages.
uptime
- Filesystem Inode Usage:
- Inode usage for each mounted filesystem.
df -i
- Dmesg Logs:
- Recent kernel logs for hardware and driver messages.
dmesg | tail -n 20
- User Logins:
- Current user sessions and recent login attempts.
who
# or
last
- Installed Updates:
- List of installed updates and available updates.
apt list --upgradable
Script for Health Check
Save the script:
nano sysreport.sh
#!/bin/bash
echo "==== System Information ===="
echo "Hostname: $(hostname)"
echo "Date and Time: $(date)"
echo
echo "==== CPU Information ===="
lscpu | grep -E '^Model name|^CPU\(s\):|^Thread|^Core|^Socket|^NUMA|^CPU MHz|^Architecture'
echo
echo "==== RAM Information ===="
free -h
echo
echo "==== CPU Load Averages ===="
uptime
echo
echo "==== Disk Usage ===="
df -h
echo
echo "==== Swap Usage ===="
swapon --show
echo
echo "==== Network Interfaces ===="
ip a
echo
echo "==== Top Processes ===="
top -b -n 1 | head -n 20
echo
echo "==== Current Network Connections ===="
ss -tuln
echo
echo "==== Temperature ===="
sensors
echo
echo "==== Service Status (e.g., ssh) ===="
systemctl status ssh --no-pager
echo
echo "==== Recent Kernel Messages ===="
dmesg | tail -n 20
echo
echo "==== Recent Logins ===="
last -n 5
echo
echo "==== Available Updates ===="
apt list --upgradable
echo
Make it executable:
chmod +x sysreport.sh
Execute the script:
bash sysreport.sh
Daily run the system report script
If you need your daily Systemreport check use the following script and the Crontab:
nano sysreport.sh
#!/bin/bash
LOG_DIR="/opt/scripts/log"
LOG_FILE="$LOG_DIR/sysreport_$(date +%F).log"
# Ensure the log directory exists
mkdir -p "$LOG_DIR"
exec > "$LOG_FILE" 2>&1
echo "==== System Information ===="
echo "Hostname: $(hostname)"
echo "Date and Time: $(date)"
echo
echo "==== CPU Information ===="
lscpu | grep -E '^Model name|^CPU\(s\):|^Thread|^Core|^Socket|^NUMA|^CPU MHz|^Architecture'
echo
echo "==== RAM Information ===="
free -h
echo
echo "==== CPU Load Averages ===="
uptime
echo
echo "==== Disk Usage ===="
df -h
echo
echo "==== Swap Usage ===="
swapon --show
echo
echo "==== Network Interfaces ===="
ip a
echo
echo "==== Top Processes ===="
top -b -n 1 | head -n 20
echo
echo "==== Current Network Connections ===="
ss -tuln
echo
echo "==== Temperature ===="
sensors
echo
echo "==== Service Status (e.g., ssh) ===="
systemctl status ssh --no-pager
echo
echo "==== Recent Kernel Messages ===="
dmesg | tail -n 20
echo
echo "==== Recent Logins ===="
last -n 5
echo
echo "==== Available Updates ===="
apt list --upgradable
echo
Make it executable:
chmod +x sysreport.sh
Update the cron job: Now, simply call the script from the cron job without handling the log file in the cron command:
0 6 * * * /opt/scripts/log/sysreport.sh
Automating Linux Server Health Checks with a Bash Script