# Changelog A running log of significant changes made to the yttrx/admin.yttrx.com server infrastructure. --- ## 2026-05-18 — changelog.yttrx.com live on admin Stood up changelog.yttrx.com as a tiny static site on admin that publishes this file. Two URLs only: /changelog.md (verbatim) and /changelog.txt (plaintext conversion via pandoc or a regex fallback — see misc-sites.md). / 302s to /changelog.txt; everything else 404s. Sync is manual (scp from a repo checkout); no nginx reload needed for content updates. Site file scripts/changelog.nginx → /etc/nginx/sites-available/changelog. Web root /var/www/changelog/ owned by www-data. Cert issued via certbot certonly --standalone -d changelog.yttrx.com (brief stop of nginx — the documented bootstrap pattern from misc-sites.md), so a few-second outage of the other admin-hosted sites at deploy time. Renewals piggyback on the existing 0 2 * * * certbot renew --nginx cron. ### Verified | Test | Result | |---|---| | curl -I https://changelog.yttrx.com/ | 302 to /changelog.txt | | curl -I https://changelog.yttrx.com/changelog.txt | 200, Content-Type: text/plain; charset=utf-8 | | curl -I https://changelog.yttrx.com/changelog.md | 200, Content-Type: text/plain; charset=utf-8 | | curl -I https://changelog.yttrx.com/notathing | 404 | | curl -I http://changelog.yttrx.com/ | 301 to https | ### Rollback rm /etc/nginx/sites-enabled/changelog nginx -t && systemctl reload nginx Cert and /var/www/changelog/ can stay; the cron renewal is harmless until the cert is pruned. --- ## 2026-05-18 — Scraper UA + CIDR block at nginx (replaces Anubis as first-line defense) After the Anubis rollback earlier today, audited today's /var/log/nginx/access.log (269k lines on the Mastodon vhosts) to understand what we'd actually been trying to defend against. The scrape target is /tags/, no contest: 64,234 hits today, 63,520 of them 200. Two cohorts account for the bulk of it: - Named AI/SEO scrapers identifying themselves in the UA: Applebot (19,523 /tags/ hits), meta-externalagent (14,050), GoogleOther (6,630), plus the long tail (MJ12bot, AhrefsBot, etc.). - A disguised botnet from Alibaba Cloud Singapore (43.172.0.0/15): ~22,000 /tags/ hits today rotating across dozens of "Chrome on Windows" UAs (versions 103, 104, 105, … 148) in suspiciously flat 1,200–2,500-hit buckets. Real users don't distribute themselves uniformly across 30 Chrome versions; this is one operator running a scraper farm with a UA-rotation library. The genuinely surprising find that reframed the Anubis approach: 31,114 requests today to /.within.website/x/cmd/anubis/static/locales/en.json — that's the Anubis challenge interstitial's i18n file. Every one had Referer: https://yttrx.com/tags/, all from the same Alibaba IPs. During the hours Anubis was live, the locale-fetch curve tracked /tags/ traffic almost 1:1, meaning the scraper was completing the Proof-of-Work challenge and walking right through. Anubis-as-deployed wasn't blocking the actual aggressors. Real users got a broken composer; the headless-browser scraper farm just paid a few extra CPU-seconds. That changes the cost/benefit: cheaper and more effective to drop the obvious offenders at nginx than to challenge everyone and hope the offenders fail. ### What got deployed New snippet /etc/nginx/snippets/scraper-block.conf (source: scripts/scraper-block.conf) defining two maps: - $bad_ua — set by map $http_user_agent, matches a list of self-identifying scraper UAs (Applebot, meta-externalagent, GoogleOther, GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, CCBot, Bytespider, Amazonbot, DuckAssistBot, FacebookBot, Diffbot, Omgilibot, MJ12bot, AhrefsBot, SemrushBot, DotBot, ImagesiftBot, TimpiBot, plus a few siblings). - $bad_cidr — set by geo $remote_addr, matches the Alibaba ranges where today's UA-rotation farm lives: 43.172.0.0/15, 8.208.0.0/12, 47.235.0.0/16, 47.236.0.0/16, 47.243.0.0/16, 47.245.0.0/16, 47.77.0.0/16, 43.106.0.0/16, 43.110.0.0/16. Googlebot and Bingbot are intentionally absent — search discovery is wanted. The CIDR check depends on cloudflare-real-ip.conf rewriting $remote_addr from CF-Connecting-IP (in place since 2026-05-03); without it the geo match silently does nothing. Wired into /etc/nginx/sites-available/mastodon (the currently-enabled site) at three points: # top of file, alongside the existing map $http_upgrade include /etc/nginx/snippets/scraper-block.conf; # inside each :443 server { } (yttrx.com and masto.yttrx.com), # after add_header Strict-Transport-Security if ($bad_ua) { return 403; } if ($bad_cidr) { return 403; } if at server scope with return is the one safe form per the "if is evil" (https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/) wiki page. The dormant mastodon-anubis site file (scripts/mastodon-anubis.nginx) got the same edits so that if Anubis ever gets revisited, the scraper block goes with it. ### Why CIDR scope is server-wide (not just /tags/) Considered narrowing the $bad_cidr 403 to scrape-target paths only, so a legitimate fediverse server in those ranges could still federate via /inbox. Audit data ruled it out: of 55,111 requests from 43.172/15 today, exactly 6 had Mastodon/Pleroma/Akkoma/Misskey/Lemmy UAs — collateral on a blanket CIDR block is essentially nil, and the bookkeeping is much simpler. ### Deployment Executed on mammut: cp /etc/nginx/sites-available/mastodon \ /etc/nginx/sites-available/mastodon.pre-scraper-block.20260518-112054 scp scraper-block.conf mammut:/etc/nginx/snippets/scraper-block.conf scp mastodon-live-edited mammut:/etc/nginx/sites-available/mastodon nginx -t && systemctl reload nginx ### Verified post-deploy | Test | Result | |---|---| | curl -H 'User-Agent: …Applebot/0.1…' https://yttrx.com/tags/test | 403 from Server: nginx/1.22.1 (not Mastodon) | | curl -H 'User-Agent: meta-externalagent/1.1' https://yttrx.com/tags/test | 403 from nginx | | curl -H 'User-Agent: Mozilla/5.0 … Chrome/130 …' https://yttrx.com/ | 200 from Mastodon | | curl https://masto.yttrx.com/api/v1/instance | 200 from Mastodon (federation API intact) | | curl -H 'User-Agent: Mastodon/4.5.9 (http.rb/…)' https://yttrx.com/inbox | 404 from Mastodon (HEAD on POST-only route — *not* a 403 from nginx, which is what matters) | ### Expected ongoing impact Applied retroactively against today's 269k-line access.log, the rules would have rejected ~106,000 requests (~40% of traffic) — primarily on /tags/ — without touching /inbox, /api/, /oauth/, or /.well-known/. Real federation (48k /inbox POSTs today from Mastodon/x.y UAs) is unaffected. ### Rollback cp /etc/nginx/sites-available/mastodon.pre-scraper-block.20260518-112054 \ /etc/nginx/sites-available/mastodon nginx -t && systemctl reload nginx The scraper-block.conf snippet can stay in /etc/nginx/snippets/ — it does nothing on its own without the if lines and the include. ### What this doesn't do The block matches declared UA strings and specific source CIDRs. A scraper that rotates both — picks a non-listed CIDR *and* sends a generic Chrome UA — slips through. Today's data doesn't show much of that, but if the Alibaba farm migrates to a residential-proxy mesh, this won't catch them. At that point the realistic next step is Cloudflare's bot management at the edge, not more rules on mammut. --- ## 2026-05-18 — Anubis rolled back from yttrx.com and masto.yttrx.com The Anubis deployment from 2026-05-17 broke parts of the Mastodon web client (login, post composer, federated timeline streaming) for real users, so it's been removed from the request path. nginx now serves both vhosts via the original mastodon site again; the anubis@yttrx.service unit is stopped and disabled. ### Rollback steps executed on mammut ln -sfn ../sites-available/mastodon /etc/nginx/sites-enabled/mastodon nginx -t && systemctl reload nginx systemctl stop anubis@yttrx.service systemctl disable anubis@yttrx.service ### Verified post-rollback | Test | Result | |---|---| | curl -I -H 'User-Agent: Mozilla/5.0' https://yttrx.com/ | 200 from Server: Mastodon, no techaro.lol-anubis-* cookies | | curl -I -H 'User-Agent: Mozilla/5.0' https://masto.yttrx.com/ | 302 redirect to yttrx.com/ (expected; Server: nginx) | | curl -I https://masto.yttrx.com/api/v1/instance | 200 from Server: Mastodon | | curl -I 'https://yttrx.com/.well-known/webfinger?resource=…' | 200 from Server: Mastodon | No challenge interstitial, no Anubis cookies. Web client is reachable directly again. ### What's left in place (intentionally) - /etc/nginx/sites-available/mastodon-anubis — the dual-vhost site file with the Anubis bypass blocks. Kept on disk so re-enabling is a single ln -sfn away if/when we revisit. - /etc/anubis/yttrx.env and /etc/anubis/yttrx.botPolicies.yaml — config + key material, untouched. - /etc/nginx/snippets/masto-proxy.conf — shared proxy headers, also referenced by the original mastodon site; safe to leave. - The anubis package itself — not uninstalled. The systemd template instance is disabled, so the service won't come back on reboot. ### Why this regressed the web client Not investigated to root cause — rolling back was cheaper than tuning the policy. Anubis only challenges browser HTML at location /; ActivityPub, OAuth, API, streaming, WebFinger, and nodeinfo were already bypassed in scripts/mastodon-anubis.nginx, yet enough things in the SPA broke for real users that the deployment wasn't viable as configured. If we revisit, the experiment should start on a less critical vhost, capture specific failing requests/flows before changing anything, and test the full first-login flow in a clean browser profile before enabling on yttrx.com. ### Re-enable path (for reference) ln -sfn ../sites-available/mastodon-anubis /etc/nginx/sites-enabled/mastodon nginx -t && systemctl reload nginx systemctl enable --now anubis@yttrx.service --- ## 2026-05-18 — R2 cutover complete: Mastodon writes to Cloudflare R2 The CDN migration is done. As of 07:37 CEST, ~mastodon/live/.env.production points the Mastodon S3 client at Cloudflare R2 (yttrx-media), and all three writer containers (web, sidekiq, streaming) restarted cleanly under the new config. DigitalOcean Spaces is now a frozen historical bucket — reads still flow through the dual-backend nginx (cdn-migration) which tries R2 first and falls back to DO for any objects R2 doesn't have. ### Pre-flight state rclone sync spaces-old:yttrx → r2:yttrx-media had been running since ~2026-05-17 11:19 (with --transfers 64 --checkers 128). By cutover time, R2 was at 811.9 GiB / 1.991M objects — slightly larger than the 797.5 GiB / 1.978M-object DO baseline taken at the start, because rclone caught new uploads written to DO during the run. The remaining delta as of cutover was a few hundred KiB of cache/ (federated-media cache) stragglers, which Mastodon will re-fetch on demand and we explicitly chose not to chase. ### Cutover sequence (executed in that order on mammut) 1. Disabled the in-session monitoring cron, killed the long-running rclone sync (tmux session migration) and its sibling rclone-watcher. Marker line written to /var/log/rclone-migration.log. 2. Stopped Mastodon writers: docker compose stop web sidekiq streaming. Kept nginx, dragonfly, es, and PostgreSQL running so files.yttrx.com continued serving reads via cdn-migration throughout. 3. Final delta rclone sync started with the same --transfers 64 --checkers 128. Wall clock: a few minutes of bucket-comparison walk for negligible new transfer (a couple of preview-card PNGs). Killed once it was clear the remaining tail was all cache/. 4. Skipped the full rclone check. The full bucket-walk would have taken many minutes for content (federated cache) we don't need to verify. Trade-off: a small risk that a non-cache upload near the alphabetical end of the bucket didn't make it to R2; the dual-backend nginx fallback covers that case anyway (R2 404 → DO). 5. Backed up .env.production to /home/mastodon/live/.env.production.pre-r2-cutover.20260518-073707 (md5-verified equal to the live file before edit). 6. Swapped S3 vars in .env.production (sed, transactional via .tmp + mv): `` S3_BUCKET=yttrx → S3_BUCKET=yttrx-media S3_REGION=us-east-1 → S3_REGION=auto S3_HOSTNAME=https://yttrx.sfo3.digitaloceanspaces.com → S3_HOSTNAME=https://yttrx-media.b10d4c19446fc73dcd3af1145490c01b.r2.cloudflarestorage.com S3_ENDPOINT=https://sfo3.digitaloceanspaces.com → S3_ENDPOINT=https://b10d4c19446fc73dcd3af1145490c01b.r2.cloudflarestorage.com AWS_ACCESS_KEY_ID=DO00XZDDD38HM4ML7TTM → AWS_ACCESS_KEY_ID=9301014f2722d65b7c7bd1372648e1a0 AWS_SECRET_ACCESS_KEY= → AWS_SECRET_ACCESS_KEY= ` Real R2 credentials live in the private secondbrain note yttrx-r2-credentials.md. S3_ENABLED, S3_PROTOCOL=https, S3_ALIAS_HOST=files.yttrx.com unchanged. 7. Brought writers back up: docker compose up -d web sidekiq streaming. All five containers reported healthy within 30 seconds. Smoke test: curl https://masto.yttrx.com/api/v1/instance returned 200 from Server: Mastodon. Total downtime for masto.yttrx.com: ~28 minutes (from docker compose stop to all-healthy). files.yttrx.com stayed up the whole time — the dual-backend cdn-migration config served reads from R2/DO without interruption. ### Maintenance page Added during the cutover so the 502 from nginx wouldn't be the user-facing error while the containers were down: - scripts/maintenance.html — friendly maintenance page deployed at /var/www/html/maintenance.html on mammut. - scripts/mastodon-anubis.nginx — added in each :443 server block (both yttrx.com and masto.yttrx.com): `nginx error_page 502 503 504 /maintenance.html; location = /maintenance.html { root /var/www/html; internal; } ` - Confirmed working during the cutover: /api/v1/instance, /.well-known/webfinger, /users/.../outbox all returned the maintenance HTML with HTTP 502. The directives remain in place post-cutover — they're harmless when containers are healthy and useful for any future planned-maintenance window. ### Rollback path (still available) If we discover R2 writes are broken or media reads regress, revert: sudo -u mastodon cp -p /home/mastodon/live/.env.production.pre-r2-cutover.20260518-073707 \ /home/mastodon/live/.env.production sudo -u mastodon bash -c 'cd ~mastodon/live && docker compose restart web sidekiq streaming' Any media uploaded to R2 during the brief post-cutover window will still be reachable via files.yttrx.com — the dual-backend nginx tries R2 first, so users keep seeing their new uploads even after the env revert. No data is lost on rollback. If nginx itself misbehaves, swap the enabled site back: rm /etc/nginx/sites-enabled/cdn-migration && \ ln -s ../sites-available/cdn-digitalocean /etc/nginx/sites-enabled/cdn-digitalocean && \ nginx -t && systemctl reload nginx That sends files.yttrx.com straight to DO via the legacy single-backend nginx config. ### Remaining work - Smoke-test: upload an attachment via the web UI or a mobile client. Verify with rclone ls r2:yttrx-media/media_attachments/files// that it landed in R2 not DO. - Open task #117 ("Execute or retire the DO Spaces → R2 migration") closes after smoke-test passes. - Future cleanup (separate change, at least a week out — see cdn-s3-migration.md Step 7): drop the @s3_fallback block from cdn-migration once R2 has been authoritative without fallback-served requests in the access log. - DO Spaces decommission once we're confident: cost savings start, and the legacy cdn-digitalocean site file can be removed alongside. --- ## 2026-05-17 — Anubis live in front of yttrx.com and masto.yttrx.com Browser HTML on both vhosts now passes through Anubis (https://anubis.techaro.lol) (v1.25.0) before reaching Mastodon. Server-to-server traffic (ActivityPub federation, REST API, OAuth, the streaming WebSocket, WebFinger, nodeinfo) is bypassed at the nginx layer so peer instances and mobile clients never see a JS challenge. ### Architecture client ──HTTPS──> nginx :443 (TLS terminator, both vhosts) ├─ /.well-known, /inbox, /api, /oauth, … ──> webbackend :3000 (direct, no anubis) ├─ /api/v1/streaming ──> streamingbackend :4000 (direct, no anubis) ├─ /system ──> 301 files.yttrx.com └─ location / ──> anubis 127.0.0.1:8923 └──> nginx 127.0.0.1:8081 (backend) └──> Mastodon redirects + webbackend ### Files | Path | Source of truth | Purpose | |---|---|---| | /etc/nginx/sites-available/mastodon-anubis | scripts/mastodon-anubis.nginx | Drop-in replacement for the mastodon site; pivot by re-pointing the symlink. | | /etc/nginx/snippets/masto-proxy.conf | scripts/masto-proxy.conf | Shared proxy header set used by every bypass location. | | /etc/anubis/yttrx.env | (server-local) | BIND/TARGET/policy path, inlined ED25519 key (mode 600). | | /etc/anubis/yttrx.botPolicies.yaml | copy of /usr/share/doc/anubis/botPolicies.yaml | Anubis default policy unchanged. | Service: systemctl status anubis@yttrx.service (uses DynamicUser=yes, listens on 127.0.0.1:8923, metrics on 127.0.0.1:9090). ### Pivot # enable ln -sfn ../sites-available/mastodon-anubis /etc/nginx/sites-enabled/mastodon && nginx -t && systemctl reload nginx # revert ln -sfn ../sites-available/mastodon /etc/nginx/sites-enabled/mastodon && nginx -t && systemctl reload nginx ### Gotchas hit during setup 1. COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN are mutually exclusive. Anubis exits with you can't set COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN at the same time. We want dynamic mode since the same instance covers both yttrx.com and masto.yttrx.com, so COOKIE_DOMAIN is omitted. 2. The Debian package uses DynamicUser=yes (transient uid/gid per instance). That makes unix-socket permission sharing with nginx's www-data a hassle. Solution: use TCP loopback both ways (BIND=127.0.0.1:8923, TARGET=http://127.0.0.1:8081, backend nginx listen 127.0.0.1:8081). Slight per-request overhead, zero permission plumbing. 3. ED25519_PRIVATE_KEY_HEX_FILE vs inlining. With DynamicUser=yes the anubis process can't read a root-owned key file. Inlining the key as ED25519_PRIVATE_KEY_HEX= in yttrx.env (mode 600, root-owned) works because systemd reads the EnvironmentFile as root and passes the var to the child after privilege drop. 4. Default Anubis policy doesn't challenge bare curl. It WEIGHs User-Agent: Mozilla|Opera only. A bare curl -I will go straight through to the Mastodon backend — that's by design, not misconfiguration. Verify with a browser UA: curl -H 'User-Agent: Mozilla/5.0' https://yttrx.com/. ### Verified | Test | Result | |---|---| | curl /.well-known/webfinger?resource=… | 200 from Server: Mastodon (federation bypass works) | | curl /api/v1/instance | 200 from Server: Mastodon | | curl /api/v1/streaming | 400 from streaming backend (HEAD; expected) | | curl /nodeinfo/2.0 | 200 from Server: Mastodon | | curl -H 'User-Agent: Mozilla/5.0' / | 200 with Anubis challenge HTML (techaro.lol-anubis-* cookies set) | | Browser refresh on yttrx.com | Anubis challenge interstitial then site (user-confirmed) | | Metrics after ~3 min | 252 DENY action="bot/ai-catchall", 346 challenges issued, 54 validated | ### Rollback signal If federation backlog grows (sidekiq inbox queue), check the bypass blocks first. To take Anubis fully out of the path: flip the symlink back (see "Pivot" above) and systemctl stop anubis@yttrx.service. --- ## 2026-05-17 — files.yttrx.com flipped to mammut: dual-backend (R2 + DO fallback) live The CDN migration's first production-facing step is done. files.yttrx.com now resolves through Cloudflare to mammut nginx, which proxies to R2 first and falls through to DO Spaces on 404. Mastodon continues to write to DO during the rclone backfill; reads serve from whichever bucket has the object. ### Cutover sequence 1. Created Cloudflare R2 custom domain files-r2.yttrx.com pointed at the yttrx-media bucket — gives anonymous, signed-request-free reads from R2. 2. Provisioned the cdn-migration site on mammut: - /etc/nginx/sites-available/cdn-migration — copy in scripts/cdn-migration.nginx - /data/nginx/cache/ (created, then emptied — see below) - /etc/nginx/dmca (empty, just needed for the include) - Reuses the existing /etc/certs/{fullchain,privkey}.pem *.yttrx.com wildcard 3. Symlinked cdn-migration into sites-enabled/, nginx -t, reload. 4. Flipped the Cloudflare proxy origin for files.yttrx.com from DO Spaces to 144.76.4.67 (mammut) in the CF dashboard. CF still fronts and terminates TLS; mammut nginx is now the origin behind CF. ### Three gotchas to remember 1. nginx needs proxy_ssl_server_name on + proxy_ssl_name for HTTPS upstreams. Without these, the upstream TLS handshake never sets SNI, and Cloudflare-fronted hostnames (both files-r2.yttrx.com and the R2 S3 endpoint) reject with SSL alert 40 (handshake failure). Every proxy_pass https://... location now has these two directives explicitly. 2. add_header only fires for 2xx/3xx by default — use always. Almost burned us during debugging because failing 404 responses were missing the X-Upstream / X-Cache-Status headers we'd added for diagnosis, making it look like nginx had skipped the proxy location entirely when it actually hadn't. All add_header directives in the config now have always. 3. Local proxy_cache was a phantom-404 trap and is now disabled. With proxy_intercept_errors on and error_page 404 = @s3_fallback; at @s3, R2's 404 response was getting cached at @s3's cache key before the error_page redirect ran, then served as a phantom HIT 404 to every subsequent request — even though the fallback to DO Spaces would have returned 200. Symptoms: rm -rf /data/nginx/cache/* and full systemctl restart nginx didn't fix it (the shared-memory key zone retained ghost entries that mapped to absent disk files). Brand-new URLs that had never been requested returned X-Cache-Status: HIT on first request, which doesn't make any sense unless the cache layer is broken. Workaround: remove the proxy_cache mycache; server-level directive and the proxy_cache_* directives in each location. CF's edge cache sits in front of mammut now, so caching at the mammut layer is marginal — most user requests serve from CF edge and never reach the origin. If we ever need origin-side caching back, the safe approach is probably: - Move 404-fallback out of proxy_intercept_errors + error_page (try a Lua-based or two-layered design that doesn't share a cache key between locations), or - Add proxy_no_cache 1 to the @s3 block so it never caches its own responses, leaving caching only at @s3_fallback's level. ### Verified | Test | Result | |---|---| | Object in R2 | 200, X-Upstream: r2 | | Object in DO only (e.g. /cache/... not yet migrated) | 200, X-Upstream: do-spaces | | Object in neither | 404/403 (browsers render as broken image — acceptable) | | Real DNS through CF | 200 end-to-end | Mastodon writes still go to DO Spaces (.env.production unchanged; the timestamped backup at ~mastodon/live/.env.production.pre-r2-migration.20260517-110003 was the source for the revert after the brief midday outage). rclone sync continues in tmux session migration; ~28% through 797.5 GiB at time of cutover. When sync completes: - Run a final rclone sync to catch the delta of new uploads written to DO during the run - rclone check spaces-old:yttrx r2:yttrx-media — verify zero mismatches - Edit .env.production to point Mastodon writes at R2 (template is the timestamped backup's *opposite*; new backup before this change) - docker compose restart web sidekiq streaming - Decommission cdn-digitalocean (it's still in sites-available/ as the documented rollback target) - Eventually drop the @s3_fallback block from cdn-migration once R2 is 100% authoritative --- ## 2026-05-17 — New mastodon-cleanup.sh replacing dead purge-media.sh Audited ~mastodon/bin/purge-media.sh (in place since 2025-02-20). It was dead code: not scheduled anywhere, hasn't run, and wouldn't have worked if it had — two bugs: 1. export $PATH=~mastodon/bin:$PATH — the literal $ made bash expand $PATH on the LHS, producing a syntax error. 2. Called tootctl directly. tootctl doesn't exist on the host; it lives inside the live-web-1 container. The PATH hack referenced ~mastodon/bin/tootctl (a docker exec -it wrapper), but -it requires a TTY and doesn't work under cron anyway. Wrote ~mastodon/bin/mastodon-cleanup.sh (copy in scripts/mastodon-cleanup.sh) as a from-scratch replacement: - Uses docker compose exec -T web bin/tootctl directly. The -T disables TTY allocation — works in cron. - set -euo pipefail. Verifies live-web-1 is running before touching anything. - flock lock at /tmp/mastodon-cleanup.lock to block concurrent runs. - Each step is timed and logged with start/end markers. - Owner-runnable by mastodon (already in the docker group), so it can be scheduled in the mastodon user crontab without sudo. Scope intentionally narrower than the old script — two steps, no time-based pruning: tootctl accounts prune tootctl media remove-orphans Both are content-preserving: they never touch anything a local user posted or follows. accounts prune removes dormant federated accounts (never followed locally, not seen for a long time); media remove-orphans then sweeps any media records left without an owning account. Order matters — pruning can produce orphans, so the sweep comes second. The deliberately-omitted retention-based commands (statuses remove --days N, preview_cards remove --days N, media remove --days N) are more aggressive and case-by-case; they can be run by hand when needed rather than on every cron tick. Not scheduled yet. The old purge-media.sh was never on cron so there's no migration; deliberately holding off on adding a crontab entry until the R2 migration is done — running remove-orphans during the rclone sync would churn deletions across both buckets unnecessarily. Sibling trim-storage.sh (which runs tootctl media remove --days 14 and logs to /tmp/media_remove.log) is left alone for now; will fold it in or retire it after the migration. Verified docker compose exec -T web bin/tootctl version returns 4.5.9 from the mastodon user, confirming the invocation pattern works. The old purge-media.sh is left in place but vestigial; safe to delete once the new script has been exercised at least once. --- ## 2026-05-17 — CDN migration to Cloudflare R2 (in progress) Started the long-pending DO Spaces → Cloudflare R2 CDN migration documented in cdn-s3-migration.md. Migration is still running at time of writing; this entry captures the prep work that's done and the operational findings that should outlive the migration itself. ### What got done - R2 bucket created: yttrx-media (Standard storage, region automatic). Account ID b10d4c19446fc73dcd3af1145490c01b. - R2 API token created: yttrx-media R2 Read/Write, Object R/W, scoped to yttrx-media. Token is a Cloudflare User API token (raw value prefixed cfut_). Real credentials live in the private secondbrain note yttrx-r2-credentials.md and in /root/.config/rclone/rclone.conf on mammut — never in this repo. - rclone installed and configured on mammut at /root/.config/rclone/rclone.conf (mode 0600), with [spaces-old] and [r2] remotes. Both verified with rclone lsd. - Three nginx site files staged at /etc/nginx/sites-available/: - cdn — original single-backend, untouched - cdn-digitalocean — copy of cdn with header comment; rollback target during migration - cdn-migration — dual-backend (R2 primary, DO Spaces fallback on 404) None enabled yet. See cdn-site.md for the architecture. ### Real bucket size rclone size spaces-old:yttrx against the actual bucket (took 9m41s to list): - 1,978,158 objects · 797.5 GiB This is significantly larger than two other measurements taken earlier in the day: - tootctl media usage: ~737 GB total (sum of categories) - DO Spaces billing: 465.16 GiB billable for the 16-day partial month The DO bill is computed on average storage across the period, not peak — and a 332 GiB gap between billed-average and actual-now suggests rapid growth this month (~20 GiB/day) or accumulated orphans/versioning that the billing amortizes differently. For migration planning, trust rclone size, not tootctl or the bill. ### Throughput findings Initial sync with --transfers 8 --checkers 16 averaged ~120 MiB/min over the first ~25 minutes (mix of small avatars/emoji and some real attachments). Extrapolating: ~113 hours / ~4.7 days for full sync. Unacceptable for a "maybe-take-the-site-down" migration. Killed and restarted with --transfers 64 --checkers 128. rclone is idempotent — already-copied objects are skipped — so the restart only costs a few minutes of relisting. The bottleneck is small-file overhead, not bandwidth. mammut's 1 Gbps link is barely tickled even at high parallelism. The ~2M objects (avg ~400 KB each) mean rclone is spending most of its time on per-object HTTP roundtrips. Higher parallelism is the right knob. ### Brief outage and the "DO writes during sync" pattern Initially planned Path B (cold cutover): stop nginx + Mastodon, let sync complete with zero new writes, then flip everything to R2 in one shot. Took down systemctl stop nginx + docker compose down accordingly, and pre-emptively rewrote .env.production to point Mastodon at R2. Realized once the real bucket size landed (797 GiB, multi-day sync at any reasonable parallelism) that the site can't stay down for the duration. Switched to Path B-with-deferred-cutover: 1. Reverted .env.production from the timestamped backup (md5 verified) so Mastodon would continue writing to DO Spaces on restart. 2. docker compose up -d — containers healthy in seconds. 3. systemctl start nginx — site live again. 4. rclone keeps running in tmux on mammut, syncing DO → R2 in the background. Site total downtime: ~30 minutes during the prep work. Once rclone exits and rclone check passes, the real cutover is a short maintenance window: brief nginx stop, final rclone sync to catch the delta of anything written to DO since the main sync, swap .env.production to R2, enable R2 anonymous reads, flip files.yttrx.com DNS, restart containers, restart nginx. ### Backups taken - ~mastodon/live/.env.production.pre-r2-migration.20260517-110003 — pre-edit copy. Restore: sudo -u mastodon cp ~mastodon/live/.env.production. ### Operational gotchas to remember 1. rclone on Debian bookworm (1.60.1) wants provider = DigitalOcean, not DigitalOceanSpaces (the latter raises "provider not known"). Older docs use the wrong name. 2. R2 cfut_ tokens must be SHA-256 hashed before use as an S3 Secret Access Key. The dashboard's "use the token value above" wording is misleading. Recipe: printf '%s' 'cfut_...' | sha256sum. 3. Bucket-scoped R2 tokens need no_check_bucket = true in rclone — they're authorized for object R/W but not for HeadBucket. Without this rclone fails at startup. 4. Mastodon .env.production does NOT use the same credentials path as rclone. rclone reads /root/.config/rclone/rclone.conf. Changing one doesn't affect the other. 5. S3_ALIAS_HOST=files.yttrx.com decouples Mastodon's URL generation from the bucket location — that's what makes the eventual DNS cutover transparent to users. ### Remaining work (post-migration) - Final delta rclone sync after Mastodon writes are cut over - rclone check zero-mismatch verification - Enable R2 anonymous reads (likely "Connect Custom Domain → files.yttrx.com", or pub-*.r2.dev if keeping mammut on the path) - Apply R2 values to .env.production, restart containers - Flip files.yttrx.com DNS to R2 (if going custom-domain route) - Update Step 7 of cdn-s3-migration.md to reflect actual decision on architecture --- ## 2026-05-05 — Firewall: block external access to PostgreSQL port 5432 PostgreSQL was listening on all interfaces (listen_addresses = '*') with no firewall, leaving port 5432 reachable from the internet. pg_hba.conf was rejecting the connection attempts, but bots were still making TCP connections several times per hour (visible in the logs: FATAL: no pg_hba.conf entry for host "179.43.186.223" etc.). The Mastodon containers connect to PostgreSQL using DB_HOST=144.76.4.67 (the host's public IP), so the fix couldn't simply be binding to localhost — it needed to allow the Docker bridge subnets through. Installed iptables-persistent and added three INPUT rules: -A INPUT -s 127.0.0.0/8 -p tcp --dport 5432 -j ACCEPT # localhost -A INPUT -s 172.16.0.0/12 -p tcp --dport 5432 -j ACCEPT # all Docker subnets (172.16–172.31) -A INPUT -p tcp --dport 5432 -j DROP # everything else Docker networks in use at time of change: | Network | Subnet | |---|---| | bridge (default) | 172.17.0.0/16 | | live_external_network | 172.18.0.0/16 | | live_internal_network | 172.19.0.0/16 | | finger_default | 172.20.0.0/16 | Rules saved to /etc/iptables/rules.v4 and load at boot via netfilter-persistent.service. Mastodon health check confirmed OK after rules were applied. Note: Docker was already blocking external access to ports 3000, 4000, and 9200 via raw PREROUTING rules it manages itself — 5432 was the only gap. ### Rollback iptables -D INPUT -s 127.0.0.0/8 -p tcp --dport 5432 -j ACCEPT iptables -D INPUT -s 172.16.0.0/12 -p tcp --dport 5432 -j ACCEPT iptables -D INPUT -p tcp --dport 5432 -j DROP netfilter-persistent save --- ## 2026-05-05 — PostgreSQL tuning Reviewed PostgreSQL logs and found six misconfigurations on mammut, all stemming from the default postgresql.conf being unchanged since installation despite the server having 62 GB RAM and NVMe RAID storage. File: /etc/postgresql/15/main/postgresql.conf | Parameter | Before | After | Reason | |---|---|---|---| | shared_buffers | 128MB | 8GB | Default is for tiny servers; 8GB gives PostgreSQL its own buffer pool | | effective_cache_size | 4GB | 40GB | Planner hint; now reflects actual available RAM (~57 GB free) | | random_page_cost | 4.0 | 1.1 | 4.0 assumes spinning disks; server has NVMe RAID 1 | | work_mem | 4MB | 32MB | More memory per sort/hash operation | | maintenance_work_mem | 64MB | 512MB | Faster autovacuum and index builds | | wal_buffers | -1 (auto ~4MB) | 32MB | Explicit sizing for the WAL write buffer | shared_buffers and wal_buffers require a full restart; the others only need a reload. PostgreSQL was restarted with systemctl restart postgresql. Mastodon reconnected automatically (health check confirmed OK immediately after). ### Rollback # Revert postgresql.conf sed -i \ -e 's/^shared_buffers = 8GB/shared_buffers = 128MB/' \ -e 's/^work_mem = 32MB/#work_mem = 4MB/' \ -e 's/^maintenance_work_mem = 512MB/#maintenance_work_mem = 64MB/' \ -e 's/^wal_buffers = 32MB/#wal_buffers = -1/' \ -e 's/^random_page_cost = 1.1/#random_page_cost = 4.0/' \ -e 's/^effective_cache_size = 40GB/#effective_cache_size = 4GB/' \ /etc/postgresql/15/main/postgresql.conf systemctl restart postgresql --- ## 2026-05-03 — nginx caching & real IP passthrough ### nginx config backup Before making any changes, a full backup of all nginx configs was created at: /etc/nginx/backup-20260503-164014/ Contains: nginx.conf, sites-available/*, sites-enabled/*. To roll back any site config: cp /etc/nginx/backup-20260503-164014/sites-available/ /etc/nginx/sites-available/ nginx -t && systemctl reload nginx ### Access pattern analysis Reviewed /var/log/nginx/*.log* across all vhosts. Top findings: | URL | 200 hits | Notes | |-----|----------|-------| | / | 190k | coefficiencies.com homepage | | /finger/waffles@yttrx.com | 115k | 98.2% from 6,652 Fediverse instances | | /posts/index.xml | 101k | RSS feed, static file | | /Bingham.json | 73k | json.tommertron.com data file | | /help/2023/03/12/welcome-to-yttrx.html | 56k | static HTML | Traffic is flat ~50k req/hr with no peak window. Server has 960MB RAM, 362MB available, no swap. ### nginx.conf — gzip, proxy cache, open file cache File: /etc/nginx/nginx.conf Replaced the sparse commented-out gzip block with full settings, and added two new cache sections: gzip on; gzip_vary on; gzip_proxied any; gzip_comp_level 5; gzip_buffers 16 8k; gzip_http_version 1.1; gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript image/svg+xml application/manifest+json; proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=yttrx_cache:5m max_size=50m inactive=5m use_temp_path=off; open_file_cache max=2000 inactive=60s; open_file_cache_valid 120s; open_file_cache_min_uses 2; open_file_cache_errors on; - gzip: previously only text/html was compressed (the default); JSON, XML, SVG, webmanifest were not. - proxy_cache_path: 50MB on-disk cache at /var/cache/nginx, 5MB in-memory key zone (yttrx_cache). Used by the finger proxy (see below). Cache directory created and owned by www-data. - open_file_cache: keeps file descriptors and stat() results in worker memory for static files, avoiding a filesystem call per request. ### finger.yttrx.com — proxy response cache File: /etc/nginx/sites-available/finger Added proxy caching to the location / block inside the HTTPS server. The finger service (:5000, Python/Werkzeug) was receiving 115k hits/day, almost entirely from Mastodon instances polling on staggered schedules. proxy_cache yttrx_cache; proxy_cache_valid 200 30s; proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504; proxy_cache_lock on; add_header X-Cache-Status $upstream_cache_status; - 30-second TTL keeps data near-realtime while cutting upstream hits significantly. - X-Cache-Status header added so cache behaviour is observable (MISS/HIT/EXPIRED). ### Static asset cache headers — tommertron, help, coefficiencies Files: sites-available/tommertron, sites-available/help, sites-available/coefficiencies Added location blocks before the catch-all location / in each HTTPS server block: # Content-hashed JS/CSS — safe to cache forever location ~* \.(css|js)$ { try_files $uri =404; expires 1y; add_header Cache-Control "public, immutable"; access_log off; } # Images, fonts, icons — 30 days location ~* \.(png|jpg|jpeg|gif|ico|svg|webp|woff|woff2|ttf|webmanifest)$ { try_files $uri =404; expires 30d; add_header Cache-Control "public"; access_log off; } # XML feeds (RSS, sitemap) — 1 hour [tommertron/help only] location ~* \.xml$ { try_files $uri =404; expires 1h; add_header Cache-Control "public"; } JS/CSS filenames on these sites are content-hashed (e.g. appearance.min.8a082f81...js), so immutable + 1-year expiry is safe — browsers and Cloudflare will never revalidate them unnecessarily. Note: coefficiencies.com app routes (/packing, /mortgage, /charades, /api) proxy to :8081 and are intentionally not cached. ### Real IP passthrough (Cloudflare) File: /etc/nginx/conf.d/cloudflare-real-ip.conf The file already existed but had two issues: 1. Missing the 173.245.48.0/20 Cloudflare range 2. Missing real_ip_recursive on — without this, nginx only strips one proxy hop; Cloudflare can add multiple, leaving a CF IP in $remote_addr Updated file now includes all 15 IPv4 ranges and 7 IPv6 ranges from https://www.cloudflare.com/ips-v4 / ips-v6`, plus: real_ip_header CF-Connecting-IP; real_ip_recursive on; All access logs going forward show actual client IPs. Historical logs (pre-reload) still contain Cloudflare proxy IPs.