infra: weekly docker + gitea-archive prune cron on the CI droplet
- 2026-06-10 incident: 48G disk hit 100% — pipelines died w. ENOSPC in the UT/IT dependency install + the Woodpecker repo UI 500'd (SQLite couldn't write; the repos LIST still rendered, pure read). Culprits: ~15G stale CI images/build cache + 19G of gitea repo-archive cache (crawlers on the public instance generate zip/tar snapshots faster than gitea's @midnight archive_cleanup reclaims them) - cicd/docker-prune.sh → /etc/cron.weekly/docker-prune (installed live via ssh too — running the full playbook clobbers the droplet's nginx config, known template bug): image/builder/container prune keeping the last 168h + a repo-archive cache wipe - gitea container restarted to unwedge the post-receive queues the disk-full era jammed (pushes landed in the bare repo but the UI + Woodpecker webhook never heard about them) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
@@ -101,6 +101,16 @@
|
||||
creates: /etc/letsencrypt/live/gitea.earthmanrpg.me/fullchain.pem
|
||||
become: true
|
||||
|
||||
# Weekly docker + gitea-archive-cache prune — the 2026-06-10 disk-full
|
||||
# incident (48G/48G: pipelines ENOSPC'd, Woodpecker repo UI 500'd) was
|
||||
# ~15G of stale CI images + 19G of crawler-generated repo archives.
|
||||
- name: Install weekly docker/archive prune cron
|
||||
ansible.builtin.copy:
|
||||
src: cicd/docker-prune.sh
|
||||
dest: /etc/cron.weekly/docker-prune
|
||||
mode: "0755"
|
||||
become: true
|
||||
|
||||
- name: Run docker compose -f /opt/cicd/docker-compose.yaml up -d
|
||||
ansible.builtin.command:
|
||||
cmd: docker compose -f /opt/cicd/docker-compose.yaml up -d
|
||||
|
||||
21
infra/cicd/docker-prune.sh
Normal file
21
infra/cicd/docker-prune.sh
Normal file
@@ -0,0 +1,21 @@
|
||||
#!/bin/sh
|
||||
# Weekly CI-droplet hygiene — installed to /etc/cron.weekly/docker-prune by
|
||||
# cicd-playbook.yaml.
|
||||
#
|
||||
# Every Woodpecker pipeline leaves freshly-built/pulled image layers behind
|
||||
# (~1GB+/month); unchecked they filled the 48G disk on 2026-06-10 — pipelines
|
||||
# died with "no space left on device" and the Woodpecker repo UI 500'd
|
||||
# (SQLite could no longer write). Keep anything used within the last week;
|
||||
# the python-tdd-ci image re-pulls from the LOCAL Gitea registry in seconds
|
||||
# if it gets pruned between pipelines.
|
||||
docker image prune -af --filter "until=168h"
|
||||
docker builder prune -af --filter "until=168h"
|
||||
docker container prune -f --filter "until=168h"
|
||||
|
||||
# Gitea's repo-archive cache (zip/tar snapshots, regenerated on demand) grew
|
||||
# to 19G in the same incident — crawlers hitting the public instance's
|
||||
# download links generate archives faster than gitea's own @midnight
|
||||
# archive_cleanup cron reclaims them. It is pure cache; wipe it weekly.
|
||||
rm -rf /opt/cicd/data/gitea/gitea/repo-archive/* 2>/dev/null
|
||||
|
||||
exit 0
|
||||
Reference in New Issue
Block a user