FM-KED-001 — VM Disk Space Exhaustion

Severity: S2 — High
Recovery Class: A — Quick Fix
Covered by Monthly Support: Yes

Description

Disk space on a virtual machine reaches critical levels, leading to degraded system behavior, application instability, or failed background operations.

This issue is operational, recurrent, and typically caused by uncontrolled growth of logs, containers, temporary files, or Kubernetes artifacts.

Typical Symptoms

Services failing to write logs or temporary files
Background jobs failing without explicit errors
Kubernetes pods entering Evicted or Terminating state
System warnings related to low disk space

Diagnostic Checklist

Identify Top Disk Consumers

sudo du -ahx / | sort -rh | head -n 20

Recovery Procedure

Follow the steps below as needed, not necessarily all of them.

1. Clean Package Manager Artifacts

sudo apt-get autoremove
sudo du -sh /var/cache/apt
sudo apt-get autoclean
sudo apt-get clean

2. Clean System Journals

sudo journalctl --vacuum-time=3d

3. Truncate Docker Logs

sudo truncate -s 0 /var/lib/docker/containers/**/*-json.log

4. Prune Docker Resources

sudo docker system prune

5. Remove Obsolete Kubernetes ReplicaSets

kubectl get rs -A -o wide | tail -n +2 | \
awk '{if ($3 + $4 + $5 == 0) print "kubectl delete rs -n "$1, $2 }' | sh

6. Clear Evicted Kubernetes Pods

kubectl get pods | grep Evicted | awk '{print $1}' | xargs kubectl delete pod

With explicit kubeconfig:

kubectl --kubeconfig bank.yaml get pods | grep Evicted | \
awk '{print $1}' | xargs kubectl --kubeconfig bank.yaml delete pod

7. Force Remove Stuck Terminating Pods

for p in $(kubectl --kubeconfig bank.yaml get pods | grep Terminating | awk '{print $1}');
do
  kubectl --kubeconfig bank.yaml delete pod $p --grace-period=0 --force
done

Optional Diagnostics

Inspect Memory Usage (for runaway processes)

ps -eo size,pid,user,command --sort -size | \
awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } \
{ for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'

Preventive Notes

Disk usage monitoring is strongly recommended
Log rotation must be verified after updates
Kubernetes cleanup should be part of routine maintenance