FM-KED-001 — VM Disk Space Exhaustion
Severity: S2 — High
Recovery Class: A — Quick Fix
Covered by Monthly Support: Yes
Description
Disk space on a virtual machine reaches critical levels, leading to degraded system behavior, application instability, or failed background operations.
This issue is operational, recurrent, and typically caused by uncontrolled growth of logs, containers, temporary files, or Kubernetes artifacts.
Typical Symptoms
- Services failing to write logs or temporary files
- Background jobs failing without explicit errors
- Kubernetes pods entering
EvictedorTerminatingstate - System warnings related to low disk space
Diagnostic Checklist
Identify Top Disk Consumers
sudo du -ahx / | sort -rh | head -n 20
Recovery Procedure
Follow the steps below as needed, not necessarily all of them.
1. Clean Package Manager Artifacts
sudo apt-get autoremove
sudo du -sh /var/cache/apt
sudo apt-get autoclean
sudo apt-get clean
2. Clean System Journals
sudo journalctl --vacuum-time=3d
3. Truncate Docker Logs
sudo truncate -s 0 /var/lib/docker/containers/**/*-json.log
4. Prune Docker Resources
sudo docker system prune
5. Remove Obsolete Kubernetes ReplicaSets
kubectl get rs -A -o wide | tail -n +2 | \
awk '{if ($3 + $4 + $5 == 0) print "kubectl delete rs -n "$1, $2 }' | sh
6. Clear Evicted Kubernetes Pods
kubectl get pods | grep Evicted | awk '{print $1}' | xargs kubectl delete pod
With explicit kubeconfig:
kubectl --kubeconfig bank.yaml get pods | grep Evicted | \
awk '{print $1}' | xargs kubectl --kubeconfig bank.yaml delete pod
7. Force Remove Stuck Terminating Pods
for p in $(kubectl --kubeconfig bank.yaml get pods | grep Terminating | awk '{print $1}');
do
kubectl --kubeconfig bank.yaml delete pod $p --grace-period=0 --force
done
Optional Diagnostics
Inspect Memory Usage (for runaway processes)
ps -eo size,pid,user,command --sort -size | \
awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } \
{ for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'
Preventive Notes
- Disk usage monitoring is strongly recommended
- Log rotation must be verified after updates
- Kubernetes cleanup should be part of routine maintenance