Monitoring linux server metrics in Home Assistant via mqtt

There was a need to put another server at home, and I set out to monitor its performance in a home smart home, which is used by Home Assistant. A quick and then thoughtful googling did not give me universal solutions, so I built my own bike.





Introductory: we will monitor the processor load and temperature, RAM and swap load, free disk space, uptime duration, total system load, temperature and state of smart disks separately, and the state of the raid (on a server with ubuntu server 20, a simple software raid1 was raised) ... WD Green drives, GA-525 motherboard with built-in atom525.



The mosquitto broker has already been set up on the smart home server, so mqtt was chosen as the data transfer method.



In the first sections of this work, the principles of the applied data collection methods are given, and at the end - the data transfer scripts and HA settings.



All commands in the examples are executed as root.





Table of contents

Collecting system sensors

Collecting system load

data Collecting hard drive

health data Collecting RAID state

data Sending collected data

Configuring Home Assistant





System sensor readings

To get the built-in sensors, we will use the sensors utility





If it is not installed, put it: apt-get install lm-sensors







First, you need to find all the available sensors. We run the command sensors-detect





and answer all questions y . After that, you can see what happened:sensors







It should be noted that personally, my sensors began to display all the sensors found only after a reboot. Maybe some kind of bug, I don't know.





. sensors json, . sensors -A -u -j



json. , .







, . . json - jp. - ubuntu :



apt-get install jq







xpath . , -.





. , , , temp3, :



sensors -A -u -j | jq '.["coretemp-isa-0000"]["Core 0"].temp2_input'

sensors -A -u -j | jq '.["it8720-isa-0290"].fan1.fan1_input'

sensors -A -u -j | jq '.["it8720-isa-0290"].temp3.temp3_input'








, , , , .





. - free. , -m, .





, . - , .



free -m | grep "Mem" | awk '{print $2}'







grep , awk - , . , . .





, df. , , , . - , . : df









df | grep "/dev/md127p1" | awk '{print $5}' | sed 's/%$//'

df | grep "/dev/md126p1" | awk '{print $5}' | sed 's/%$//'








/proc/loadavg. , - , . , , / 1, 5 15 . . , ( ) , '? 15 :



cat /proc/loadavg | awk '{print $3}'







uptime:



uptime | awk '{print $3}' | sed 's/,$//'







mpstat. , , . , , . , , , . mpstat , apt install sysstat. ,



mpstat | grep all | awk '{print $13}'







, .



, , . bash . bc



cpuidle=$(mpstat | grep all | awk '{print $13}')

cpuload=$(echo "100-$cpuidle" | bc -l)

echo " : $cpuload"








hddtemp. , :



apt-get install hddtemp







: , -n :





SMART smartmontools



apt-get install smartmontools







, -a, .



smartctl -a /dev/sda







, . , . . :





  • Raw_Read_Error_Rate — . , . , . . , ;





  • Reallocated_Sector_Ct — . ;





  • Seek_Error_Rate — . ;





  • Spin_Retry_Count — . ;





  • Reallocated_Event_Count — ;





  • Offline_Uncorrectable — . .





, - json. -j, :



smartctl -a -j /dev/sda







json, . . , . json xpath .





xpath, jq, ( ):





smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[0].raw.value' #Raw_Read_Error_Rate

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[3].raw.value' #Reallocated_Sector_Ct

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[4].raw.value' #Seek_Error_Rate

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[6].raw.value' #Spin_Retry_Count

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[12].raw.value' #Reallocated_Event_Count

smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[14].raw.value' #Offline_Uncorrectable








, " - " - -H, . -j, json.





json:



smartctl -a /dev/sda -j | jq '.smart_status.passed' #smart_status







, ()

, , , cron . .



smartctl -t short /dev/sda







, 2



smartctl -t long /dev/sda







, 1 .



, , smartd, , . , . smartd .





RAID

raid mdadm. , /var. , mdadm , raid .



, - sys. [1] [2]



- . .





, cat /proc/mdstat





- :









echo 'check' >/sys/block/md126/md/sync_action

echo 'check' >/sys/block/md127/md/sync_action












cat /sys/block/md126/md/mismatch_cnt

cat /sys/block/md127/md/mismatch_cnt








0, .





, .





mosquitto, :



apt-get install mosquitto-clients







- , . - ( ), ( raid ), ( smart):



touch system.sh && touch drives.sh && touch smart.sh

chmod u+x system.sh && chmod u+x drives.sh && chmod u+x smart.sh








:





system.sh
#!/bin/bash
#      
ip=xx.xx.xx.xx
usr="xx"
pass="xx"



tempdrive1=$(hddtemp "/dev/sda" -n)
echo "  1: $tempdrive1"
tempdrive2=$(hddtemp "/dev/sdb" -n)
echo "  2: $tempdrive2"


tempcpu=$(sensors -A -u -j | jq '.["coretemp-isa-0000"]["Core 0"].temp2_input')
echo " : $tempcpu"
fan=$(sensors -A -u -j | jq '.["it8720-isa-0290"].fan1.fan1_input')
echo "  : $fan"
temp3=$(sensors -A -u -j | jq '.["it8720-isa-0290"].temp3.temp3_input')
echo " : $temp3"

totalram=$(free -m | grep "Mem" | awk '{print $2}')
echo " : $totalram"
usedram=$(free -m | grep "Mem" | awk '{print $3}')
echo "  : $usedram"
usedrampercent=$(($usedram * 100 / $totalram))
echo "    : $usedrampercent"

totalswap=$(free -m | grep "Swap" | awk '{print $2}')
echo " : $totalswap"
usedswap=$(free -m | grep "Swap" | awk '{print $3}')
echo "  : $usedswap"
usedswappercent=$(($usedswap * 100 / $totalswap))
echo "    : $usedswappercent"

averageload=$(cat /proc/loadavg | awk '{print $3}')
echo "  : $averageload"

uptimedata=$(uptime | awk '{print $3}' | sed 's/,$//')
echo ": $uptimedata"

cpuidle=$(mpstat | grep all | awk '{print $13}')
cpuload=$(echo "100-$cpuidle" | bc -l) # ,    bash      
echo "  : $cpuload"


echo " "
echo " "

mosquitto_pub -h $ip -t "srv/tempdrive1" -m $tempdrive1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/tempdrive2" -m $tempdrive2 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/tempcpu" -m $tempcpu -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/fan" -m $fan -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/temp3" -m $temp3 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/usedrampercent" -m $usedrampercent -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/usedswappercent" -m $usedswappercent -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/averageload" -m $averageload -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/uptimedata" -m $uptimedata -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/cpuload" -m $cpuload -u $usr -P $pass

      
      



drives.sh
#!/bin/bash
#      
ip=xx.xx.xx.xx
usr="xx"
pass="xx"



raid_system_status=$(cat /sys/block/md126/md/mismatch_cnt)
echo " RAID  : $raid_system_status"
raid_var_status=$(cat /sys/block/md127/md/mismatch_cnt)
echo " RAID  : $raid_var_status"

freesystemdisk=$(df | grep "/dev/md127p1" | awk '{print $5}' | sed 's/%$//')
echo "    : $freesystemdisk"
freedatadisk=$(df | grep "/dev/md126p1" | awk '{print $5}' | sed 's/%$//')
echo "    : $freedatadisk"

echo " "
echo " "

mosquitto_pub -h $ip -t "srv/raid_system_status" -m $raid_system_status -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/raid_var_status" -m $raid_var_status -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/freesystemdisk" -m $freesystemdisk -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/freedatadisk" -m $freedatadisk -u $usr -P $pass

      
      



smart.sh
#!/bin/bash
#      
ip=xx.xx.xx.xx
usr="xx"
pass="xx"



Raw_Read_Error_Rate1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[0].raw.value')
echo "SMART Raw_Read_Error_Rate  1: $Raw_Read_Error_Rate1"
Reallocated_Sector_Ct1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[3].raw.value')
echo "SMART Reallocated_Sector_Ct  1: $Reallocated_Sector_Ct1"
Seek_Error_Rate1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[4].raw.value')
echo "SMART Seek_Error_Rate  1: $Seek_Error_Rate1"
Spin_Retry_Count1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[6].raw.value')
echo "SMART Spin_Retry_Count  1: $Spin_Retry_Count1"
Reallocated_Event_Count1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[12].raw.value')
echo "SMART Reallocated_Event_Count  1: $Reallocated_Event_Count1"
Offline_Uncorrectable1=$(smartctl -a /dev/sda -j | jq '.ata_smart_attributes.table[14].raw.value')
echo "SMART Offline_Uncorrectable  1: $Offline_Uncorrectable1"

smart_status1=$(smartctl -a /dev/sda -j | jq '.smart_status.passed')
echo "  1: $smart_status1"

Raw_Read_Error_Rate2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[0].raw.value')
echo "SMART Raw_Read_Error_Rate  2: $Raw_Read_Error_Rate2"
Reallocated_Sector_Ct2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[3].raw.value')
echo "SMART Reallocated_Sector_Ct  2: $Reallocated_Sector_Ct2"
Seek_Error_Rate2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[4].raw.value')
echo "SMART Seek_Error_Rate  2: $Seek_Error_Rate2"
Spin_Retry_Count2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[6].raw.value')
echo "SMART Spin_Retry_Count  2: $Spin_Retry_Count2"
Reallocated_Event_Count2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[12].raw.value')
echo "SMART Reallocated_Event_Count  2: $Reallocated_Event_Count2"
Offline_Uncorrectable2=$(smartctl -a /dev/sdb -j | jq '.ata_smart_attributes.table[14].raw.value')
echo "SMART Offline_Uncorrectable  2: $Offline_Uncorrectable2"

smart_status2=$(smartctl -a /dev/sdb -j | jq '.smart_status.passed')
echo "  2: $smart_status2"

echo " "
echo " "

mosquitto_pub -h $ip -t "srv/Raw_Read_Error_Rate1" -m $Raw_Read_Error_Rate1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Sector_Ct1" -m $Reallocated_Sector_Ct1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Seek_Error_Rate1" -m $Seek_Error_Rate1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Spin_Retry_Count1" -m $Spin_Retry_Count1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Event_Count1" -m $Reallocated_Event_Count1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Offline_Uncorrectable1" -m $Offline_Uncorrectable1 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/Raw_Read_Error_Rate2" -m $Raw_Read_Error_Rate2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Sector_Ct2" -m $Reallocated_Sector_Ct2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Seek_Error_Rate2" -m $Seek_Error_Rate2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Spin_Retry_Count2" -m $Spin_Retry_Count2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Reallocated_Event_Count2" -m $Reallocated_Event_Count2 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/Offline_Uncorrectable2" -m $Offline_Uncorrectable2 -u $usr -P $pass

mosquitto_pub -h $ip -t "srv/smart_status1" -m $smart_status1 -u $usr -P $pass
mosquitto_pub -h $ip -t "srv/smart_status2" -m $smart_status2 -u $usr -P $pass

      
      



, Mosquitto broker Home Assistant





, , , .





Home Assistant

, . Home Assistant .





sensor:
  - platform: mqtt
    state_topic: "srv/tempdrive1"
    name: " nextcloud   1"
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/tempdrive2"
    name: " nextcloud   2"
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/tempcpu"
    name: " nextcloud  "
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/fan"
    name: " nextcloud  "
    unit_of_measurement: ppm
  - platform: mqtt
    state_topic: "srv/temp3"
    name: " nextcloud  "
    unit_of_measurement: °C
  - platform: mqtt
    state_topic: "srv/usedrampercent"
    name: " nextcloud  RAM"
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/usedswappercent"
    name: " nextcloud  SWAP"
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/freesystemdisk"
    name: " nextcloud     "
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/freedatadisk"
    name: " nextcloud     "
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/averageload"
    name: " nextcloud   "
  - platform: mqtt
    state_topic: "srv/uptimedata"
    name: " nextcloud "
  - platform: mqtt
    state_topic: "srv/cpuload"
    name: " nextcloud   "
    unit_of_measurement: "%"
  - platform: mqtt
    state_topic: "srv/Raw_Read_Error_Rate1"
    name: " nextcloud  1 SMART Raw_Read_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Reallocated_Sector_Ct1"
    name: " nextcloud  1 SMART Reallocated_Sector_Ct"
  - platform: mqtt
    state_topic: "srv/Seek_Error_Rate1"
    name: " nextcloud  1 SMART Seek_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Spin_Retry_Count1"
    name: " nextcloud  1 SMART Spin_Retry_Count"
  - platform: mqtt
    state_topic: "srv/Reallocated_Event_Count1"
    name: " nextcloud  1 SMART Reallocated_Event_Count"
  - platform: mqtt
    state_topic: "srv/Offline_Uncorrectable1"
    name: " nextcloud  1 SMART Offline_Uncorrectable"
  - platform: mqtt
    state_topic: "srv/smart_status1"
    name: " nextcloud  1 SMART "
  - platform: mqtt
    state_topic: "srv/Raw_Read_Error_Rate2"
    name: " nextcloud  2 SMART Raw_Read_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Reallocated_Sector_Ct2"
    name: " nextcloud  2 SMART Reallocated_Sector_Ct"
  - platform: mqtt
    state_topic: "srv/Seek_Error_Rate2"
    name: " nextcloud  2 SMART Seek_Error_Rate"
  - platform: mqtt
    state_topic: "srv/Spin_Retry_Count2"
    name: " nextcloud  2 SMART Spin_Retry_Count"
  - platform: mqtt
    state_topic: "srv/Reallocated_Event_Count2"
    name: " nextcloud  2 SMART Reallocated_Event_Count"
  - platform: mqtt
    state_topic: "srv/Offline_Uncorrectable2"
    name: " nextcloud  2 SMART Offline_Uncorrectable"
  - platform: mqtt
    state_topic: "srv/smart_status2"
    name: " nextcloud  2 SMART "
  - platform: mqtt
    state_topic: "srv/raid_system_status"
    name: " nextcloud RAID   "
  - platform: mqtt
    state_topic: "srv/raid_var_status"
    name: " nextcloud RAID   "
      
      



, , , ! . , , . :





, . , , smart .





- , . , . → → mqtt.



- linux , , , .





- . , . , .





The screenshot shows that the discussed server is planned for nextcloud. Its internal indicators can also be perfectly added to HA, for this there is a wonderful api. And HA has built-in integration.








All Articles