Changelog

A running log of significant changes made to the yttrx/admin.yttrx.com
server infrastructure.

------------------------------------------------------------------------

2026-07-06 — fail2ban jail for sustained /tags 429s → firewall ban

Companion to the limit_req throttle added earlier today (below): a new
fail2ban jail escalates clients that keep hammering /tags/* or
/api/v1/timelines/tag/* past the nginx throttle to a firewall ban,
instead of leaving nginx to keep re-evaluating (and TLS-handshaking)
every request from them forever.

Files:

-   /etc/fail2ban/filter.d/nginx-tags-ratelimit.conf (new) — matches 429
    responses for those two path prefixes specifically (not the path
    alone, since real users/apps use them too):
    ^<HOST> -[^"]*"(?:GET|POST) /(?:tags|api/v1/timelines/tag)/[^"]*" 429
-   /etc/fail2ban/jail.d/nginx-tags-ratelimit.local (new) — maxretry=20,
    findtime=10m, bantime=1h escalating to 1w (bantime.increment), same
    nftables-multiport banaction (ports http,https only — never touches
    SSH) as the existing nginx-badprobe/nginx-cdn-badprobe jails,
    sharing the same f2b-table. Threshold is well above the throttle's
    own burst=50, so a handful of 429s (e.g. a double-tapped link) never
    trips it — only sustained hammering past the throttle does.
-   ignoreip on this jail explicitly excludes admin.yttrx.com's egress
    IP 147.182.255.203 (must never be firewall-banned — see
    anti-abuse.md layer 5 / FINGER_BAN_ALLOWLIST) plus the management IP
    used to deploy and test this change.

Validated with fail2ban-regex against the live access log (matched only
the 429s generated by earlier manual testing, zero false positives on
normal /tags traffic), then an end-to-end
fail2ban-client set nginx-tags-ratelimit banip/unbanip cycle against a
TEST-NET-3 address (203.0.113.77) confirmed the nftables rule is added
and removed correctly in the shared f2b-table, without ever generating
live 429 traffic against a real IP (to avoid banning our own management
access mid-test).

Rollback:
rm /etc/fail2ban/filter.d/nginx-tags-ratelimit.conf /etc/fail2ban/jail.d/nginx-tags-ratelimit.local && fail2ban-client reload.

------------------------------------------------------------------------

2026-07-06 — Per-IP rate limit for /tags scraping (nginx limit_req)

Added a per-IP request-rate limit for /tags/* and
/api/v1/timelines/tag/* on both vhosts (yttrx.com, masto.yttrx.com) —
the naive-scraper gap flagged as unaddressed in anti-abuse.md roadmap
item G.

Files:

-   /etc/nginx/snippets/tags-limit.conf (new) — defines
    limit_req_zone $binary_remote_addr zone=tags_limit:10m rate=5r/m;
-   /etc/nginx/sites-available/mastodon — added an include for the new
    snippet at http scope, and a
    location ~ ^/(tags|api/v1/timelines/tag)/ { limit_req zone=tags_limit burst=50 nodelay; limit_req_status 429; try_files $uri @proxy; }
    block ahead of location / in both TLS server blocks (the yttrx.com
    copy also keeps the existing deny lines).

~50 requests/10min per IP before a client starts getting 429s; the zone
is shared across both vhosts since they reference the same
limit_req_zone name. Doesn't touch the distributed residential-proxy
population documented in the 2026-07-03 Huawei entry below (each IP
self-throttles under this threshold) — hCaptcha in front of /tags/*
remains the identified fix for that population.

Backup: /etc/nginx/sites-available/mastodon.bak-tagslimit.20260706.
Rollback: restore that file, rm /etc/nginx/snippets/tags-limit.conf,
nginx -t && systemctl reload nginx.

------------------------------------------------------------------------

2026-07-05 — Suspicious-signup sweep: auto-suspend flagged signups with no activity

New scheduled layer on yttrx-welcomebot: any signup flagged by IP- or
email-domain scrutiny now gets a 24h grace period once live, and is
auto-suspended if it posts nothing and follows no one in that window
(cleared otherwise). Ships live, not dry-run — a stronger action than
the rest of this stack's reversible-first defaults, since zero organic
activity in a day is a strong bulk/bot signal. Runs hourly via cron on
admin.yttrx.com; ABUSE_BOT_TOKEN re-minted with admin:read:accounts.
Full detail in yttrx-welcomebot's README.

------------------------------------------------------------------------

2026-07-04 — waffles.yttrx.com CI removed

The .ci/run.sh push hook added the day before (see entry below) turned
out to spawn too many nested git instances on admin.yttrx.com and
deadlock. Rather than debug the delegation chain, removed .ci/run.sh
from waffles-yttrx.git outright — the git-server CI runner no-ops
cleanly when .ci/run.sh is absent, no other toggle needed. Deploys are
manual again, same as welcome.yttrx.com: see misc-sites.md.

------------------------------------------------------------------------

2026-07-03 — waffles.yttrx.com CI deploy wired up

Added .ci/run.sh to waffles-yttrx.git (git.blairhaus.net) so pushes
auto-deploy, same mechanism changelog.yttrx.com already uses: the
git-server container delegates through host.docker.internal
(admin.blairhaus.net) to SSH into admin.yttrx.com and run
git pull && ./build.sh there.

Gotcha found and fixed during setup: public/ is committed to the repo
(for backup) but is pure build output, regenerated by build.sh every
deploy — so the admin.yttrx.com checkout always has local diffs in
public/ immediately after a build. A plain git pull there fails with a
merge conflict on those files. .ci/run.sh now runs
git checkout -- public/ && git clean -fd public/ before pulling, which
is always safe since public/ never contains hand-authored content.
Manually hit and resolved this same conflict once by hand while
iterating on the script, before it was scripted into the CI itself.

Verified the full delegation chain (docker exec git-server → ssh
host.docker.internal → ssh admin.yttrx.com) manually first, then pushed
the CI script itself as the first live test. That first real CI run
failed — git pull on admin.yttrx.com hit a transient Cloudflare 524
(origin timeout) reaching git.blairhaus.net, confirmed via
curl .../admin/jobs/waffles-yttrx.git/<sha>. Immediately retrying the
same git fetch by hand from admin.yttrx.com succeeded, so it looks like
a one-off blip rather than a routing problem — but it means CI
reliability from admin.yttrx.com to git.blairhaus.net hasn't been proven
over multiple runs yet. Worth revisiting if it recurs.

Full details: misc-sites.md ("Publishing an update" section).

------------------------------------------------------------------------

2026-07-03 — waffles.yttrx.com migrated from Jekyll to Hugo (Compost + Tokyo Night)

waffles.yttrx.com (Pete's personal ops journal, 6 posts from Nov–Dec
2022) had been dormant — nginx vhost disabled, no valid cert, domain
526ing at Cloudflare. Migrated to the same Hugo setup as
welcome.yttrx.com.

-   Backed up the original Jekyll build verbatim:
    /var/www/html/waffles-jekyll-backup-20260703/ on admin.yttrx.com,
    and ~/backups/waffles-yttrx-jekyll-raw/ on admin.blairhaus.net.
    Nothing was deleted.
-   New site source: git.blairhaus.net/waffles-yttrx.git, deployed to
    /var/www/html/waffles/ on admin.yttrx.com (root-owned, same pattern
    as welcome). All 6 posts converted to Markdown with aliases: front
    matter so the original /mastodon/update/YYYY/MM/DD/<slug>.html
    permalinks still resolve. The original .well-known/webfinger
    (fediverse identity claim for acct:waffles@masto.yttrx.com) was
    preserved verbatim.
-   Added <link rel="me" href="https://yttrx.com/@waffles"> via
    params.author.links in hugo.toml (same mechanism welcome.yttrx.com
    already uses) — Compost renders this automatically, no template work
    needed.
-   TLS: rather than provisioning a new per-domain Let's Encrypt cert,
    extended /root/certs on admin.blairhaus.net to also deploy the
    existing *.yttrx.com wildcard cert to
    admin.yttrx.com:/etc/ssl/yttrx-wildcard/ (alongside the pre-existing
    mammut deploy target), and bootstrapped the current cert there
    immediately rather than waiting for the next cert-renew.timer run.
    See ssl-certs.md.
-   New nginx vhost /etc/nginx/sites-available/waffles on
    admin.yttrx.com (previous one backed up as
    waffles.bak-jekyll-20260703), symlinked into sites-enabled/ for the
    first time (it was never enabled before). Same static-asset caching
    pattern as welcome/help; preserves the original
    .well-known/webfinger location block.
-   DNS: waffles.yttrx.com is now a CNAME to admin.yttrx.com (updated by
    the operator during this migration).

Full details: misc-sites.md.

Verified: homepage, a post, /about/, and an old-style alias URL
(/mastodon/update/2022/11/23/welcome-to-my-blog.html) all return 200
over the live domain; /.well-known/webfinger still returns the original
JRD; served cert SAN confirmed as *.yttrx.com / yttrx.com.

Rollback: restore waffles.bak-jekyll-20260703 nginx config and reload;
the untouched Jekyll build is still on disk at
waffles-jekyll-backup-20260703/. The new /root/certs deploy block is
additive and doesn't affect the existing mammut deploy.

------------------------------------------------------------------------

2026-07-03 — check-mail.org scrutiny promoted to live; ABUSE_BOT_TOKEN re-minted

Follow-up to the check-mail.org entry below. Re-minted the moderator
bot's ABUSE_BOT_TOKEN (mammut Rails console) to add the
admin:write:email_domain_blocks scope, verified the new token against
all four required admin scopes before touching anything live, swapped it
into admin.yttrx.com's .env (old .env backed up first), and revoked the
old token once the new one was confirmed working in the running
container.

With the token in place, flipped CHECK_MAIL_DRY_RUN=false. The feature
is now fully live: a flagged signup gets its welcome held and its email
domain actually registered in Mastodon's Admin::EmailDomainBlock; a
report against an account with a high-risk email domain now actually
auto-suspends it on the first report and blocks the domain, rather than
only logging/DMing.

------------------------------------------------------------------------

2026-07-03 — Disposable/high-risk email signup scrutiny (check-mail.org)

Repo: yttrx-welcomebot (deployed to admin.yttrx.com, container
yttrx-welcomebot).

Roadmap item B (anti-abuse.md, "Signup-quality signals at the gate")
gained its disposable-email slice, alongside the existing IP-reputation
slice (layer 2c). Every account.created delivery already carries the
signup email for free (Admin::Account.email); the bot now classifies the
domain only (never the full address — GDPR-friendly) via check-mail.org,
same shape as the RDAP IP check:

-   Flagged if check-mail.org marks the domain is_disposable or its risk
    score (0-100) is ≥ CHECK_MAIL_RISK_THRESHOLD (default 80).
-   A flagged signup gets a moderator DM, an optional auto-registered
    Admin::EmailDomainBlock, and (unless dry-run) has its welcome held
    until account.approved.
-   New, stronger case: a report against an account whose email domain
    is flagged auto-suspends it on the very first report — no
    distinct-reporter threshold, unlike the IP signal which only lowers
    the threshold — and (re-)blocks the domain. Classified live from the
    report payload's target_account.email (a full Admin::Account, always
    present) rather than any record made at signup time, so it also
    covers accounts that signed up before this feature existed.

Rollout safety: shipped CHECK_MAIL_DRY_RUN=true, and the report-
triggered suspend path is gated by ABUSE_DRY_RUN or CHECK_MAIL_DRY_RUN
(either one holds it) — deliberately independent of ABUSE_DRY_RUN alone,
since production's ABUSE_DRY_RUN is already false (live) and this is a
brand-new, unverified action. Nothing acts yet; the bot only classifies,
logs, and DMs the moderator.

Deploy: git pull on admin.yttrx.com, added CHECK_MAIL_* vars to the
box's .env (API key from a check-mail.org free-tier key, 1,000
req/month), docker-compose up -d --build.

Known gap: CHECK_MAIL_AUTO_DOMAIN_BLOCK will 403 (logged, harmless)
until ABUSE_BOT_TOKEN is re-minted with the
admin:write:email_domain_blocks scope — same gotcha pattern as the
earlier admin:write:ip_blocks addition for IP scrutiny. Moot while
CHECK_MAIL_DRY_RUN=true.

------------------------------------------------------------------------

2026-07-03 — Block Huawei Cloud (AS136907) from /tags scraping

File: /etc/nginx/snippets/scraper-cidrs-huawei.conf (new), wired into
the existing geo $bad_cidr_match block in scraper-block.conf.

Investigated /tags/* and /api/v1/timelines/tag/* traffic on mammut after
noticing renewed scraping despite the existing UA/ASN rules. Findings
from access.log / access.log.1 / rotated logs:

-   Current-day volume was low (max 9 hits/IP), but the prior ~3 days of
    rotated logs showed ~7,347 requests to tag endpoints.
-   Traffic split into two distinct populations:
    -   A large, highly distributed set (~50+ IPs, 100–130 hits each)
        spread across residential/mobile ISPs in ~45 countries (Reliance
        Jio, Etisalat, Turk Telekom, Vodafone TR, Orange Romania, VTR
        Chile, Telefônica Brasil, etc.) — consistent with a commercial
        residential-proxy network. Each IP self-throttles below any
        reasonable per-IP rate limit, so IP-based rate-limiting doesn't
        touch this population.
    -   A smaller cluster of 6 IPs on Huawei Cloud (AS136907) — actual
        datacenter/cloud infra, not residential. UA distribution here
        mixed a spoofed legacy Nexus 5 / Android 6 / Chrome 65 string
        with a rotating spread of current desktop Chrome versions
        (145–150) — same UA-rotation fingerprint previously seen from
        the Tencent/Alibaba/ByteDance actors documented in
        scraper-block.conf's history.

Decision: block AS136907 wholesale, same treatment as the existing
Tencent/ Alibaba/ByteDance blocks — it's cloud infra with no residential
collateral, so there's no cost to blocking it outright.

-   Pulled all 636 announced prefixes for AS136907 from RIPEstat
    (https://stat.ripe.net/data/announced-prefixes/data.json?resource=AS136907).
-   Collapsed to 202 CIDRs (169 IPv4 + 33 IPv6) with Python's
    ipaddress.collapse_addresses.
-   Wrote /etc/nginx/snippets/scraper-cidrs-huawei.conf in the same
    <cidr>  1; format as scraper-cidrs.conf /
    scraper-cidrs-tencent-auto.conf, with a regeneration command in the
    header comment.
-   Added an include line for the new file inside the
    geo $bad_cidr_match {} block in scraper-block.conf, alongside the
    existing includes.
-   nginx -t passed; reloaded.

The residential-proxy population is the harder problem and is not
addressed by this change — UA/ASN/IP-based rules can't distinguish it
from real browsers. hCaptcha in front of /tags/* is the next planned
mitigation for that traffic.

------------------------------------------------------------------------

2026-07-03 — Mastodon upgrade v4.6.2 → v4.6.3

Routine patch release: no new env vars, no DB migrations, no breaking
changes (bug fixes, minor UI polish, dependency bumps — see upstream
release notes). Followed the standard procedure in mastodon-upgrade.md:

-   Snapshotted ~mastodon/live → ~mastodon/live-v4.6.2, backed up the
    bespoke docker-compose.yml.
-   git stash → git fetch → git checkout v4.6.3 in ~mastodon/live (as
    mastodon user), restored the bespoke compose file, bumped the three
    image tags.
-   Rebuilt the themed image
    (/root/birdui-build/build-birdui-image.sh v4.6.3) — required on
    every upgrade since web/sidekiq run yttrx-mastodon-birdui, not the
    stock image. Verified config/themes.yml has all 4 entries before
    starting the stack.
-   docker compose up -d; web/sidekiq picked up the local themed image
    (no pull), streaming pulled the stock v4.6.3 tag.

Verified:

-   docker logs live-web-1 / live-sidekiq-1 / live-streaming-1 — clean
    startup, no DB errors, no crash loops, no pending migrations.
-   All containers reached healthy status within ~40s of
    docker compose up -d.
-   https://yttrx.com/ → 200; GET /api/v2/instance reports
    "version":"4.6.3".

Rollback: stop the stack, restore ~mastodon/live-v4.6.2 over
~mastodon/live, point web/sidekiq back at yttrx-mastodon-birdui:v4.6.2,
docker compose up -d.

2026-07-02 — IP-based signup scrutiny live

Added an IP-based signup check to help fight fraudulent/spam signups:
new accounts registering from datacenter/hosting/VPN IPs now get extra
scrutiny (held welcome pending approval, the IP registered as requiring
approval for future signups, and a lower bar to auto-silence if
reported). See anti-abuse.md layer 2c for details.

2026-06-26 — R2 → DO storage migration COMPLETE (serving DO-only)

Closing marker for the R2→DO reversal (cutover 2026-06-18, drain +
fallback retirement 2026-06-26; see the dated entries below for each
step). End state:

-   files.yttrx.com serves from DigitalOcean Spaces only; nginx R2
    fallback disabled.
-   Custom-emoji backlog bulk-copied R2→DO (437,099 obj / 9.8 GiB);
    avatars/headers/status media self-heal on view
    (cdn-refresh-stream.service + cdn-404-refresh.sh backstop);
    refresh-account-media for fully-vacuumed media.
-   R2 bucket + [r2] remote kept a few weeks as the recovery net —
    phase-2 teardown (delete bucket, strip remote, retire the now-inert
    cdn-r2-refresh.sh) is the only remaining item.
-   Realised saving ~$13–15/mo (R2 per-op pricing → DO flat rate).
    Runbook cdn-r2-to-do-migration.md marked complete.

2026-06-26 — CDN self-heal: extend to status media + manual refresh-account-media

Extended cdn-refresh-stream.service to also heal remote status
attachments, not just avatars/headers. A
/cache/media_attachments/files/<6x3>/… 404 (or upstream=r2) is mapped to
its MediaAttachment id and RedownloadMediaWorker.perform_async(id) is
enqueued (async, Sidekiq). Same debounce + one-rails-boot-per-window
batching; accounts still refresh inline. Both paths bump updated_at → a
fresh 14-day retention window. So viewing a remote timeline whose images
dropped re-pulls them onto DO automatically.

Also added scripts/refresh-account-media.sh (deployed
/root/refresh-account-media) — a thin wrapper over
tootctl media refresh --account|--status|--domain for the case the
auto-path can't see: media Mastodon already fully vacuumed (file columns
nulled) emits no CDN URL, so there's no 404 to catch; re-pull the whole
account/post manually.

Verified

-   Extended parser dry-run: caught an avatar 404 and a
    media_attachments 404, ignored a preview_cards 404.
-   Live, on a real post (@ricci@discuss.systems/116767159501753086,
    image expired): page load 404'd it → service flush
    batch=8 (acct=2 media=6) rc=0 → Sidekiq re-downloaded
    media_attachments/…/04b580e3a08a2ac4.webp to DO; CDN now serves
    original+small → 200 X-Upstream: do, DB updated_at bumped to today.
-   refresh-account-media deployed, bash -n clean, usage output correct.

2026-06-26 — CDN phase-1 decommission: nginx R2 fallback DISABLED (DO-only)

With the real-time self-heal live (entry below), retired the nginx R2
fallback so files.yttrx.com now serves from DigitalOcean Spaces only.
The R2 bucket + [r2] rclone remote are intentionally kept for a few
weeks as a recovery net (phase 2 deletes them).

Change (/etc/nginx/sites-available/cdn-migration, backup
*.bak-preR2off.20260626): in @s3_primary, the DO-miss error_page 403 404
now routes to a new terminal @s3_gone location (return 404,
X-Upstream: do) instead of @s3_fallback. The @s3_fallback (R2 proxy)
block is left in place but unreferenced — rollback is a one-line revert
(error_page 403 404 = @s3_fallback;) + reload.

Why a 404, not the R2 fetch: a DO miss now logs as 404 upstream=do,
which is exactly what cdn-refresh-stream.service (+ cdn-404-refresh.sh)
key on — so a not-yet-migrated remote avatar/header 404s on first view
and self-heals onto DO within ~60s. Caveat: remote status media /
preview cards that were R2-only now 404 and do not self-heal (account
refresh only re-fetches avatars/headers) — left to age out; volume is
small (~tens/day, decaying).

Verified

-   nginx -t OK; reloaded. Present avatar → 200 X-Upstream: do; missing
    avatar → 404 X-Upstream: do, R2 never contacted. Miss logs as
    404 … upstream=do cache=MISS.
-   No upstream=r2 after the reload — last-ever r2 line 18:21:13, none
    since.
-   Streaming service observed healing the resulting 404s (batch=… rc=0,
    Paperclip re-fetch to DO).

2026-06-26 — CDN: real-time avatar self-heal service (enables early R2-fallback retirement)

To allow retiring the nginx R2 fallback soon (DO-only) while keeping the
R2 bucket a few more weeks as a recovery net, added a streaming
self-heal that closes the up-to-24h gap the daily refresh crons leave.
New scripts/cdn-refresh-stream.py (deployed /root/, systemd unit
cdn-refresh-stream.service, enabled + running).

What it does. tail -Fs /var/log/nginx/cdn.access.log; for any remote
account avatar/header hit that is upstream=r2 or status 404, it extracts
the account id from the path and refreshes the account
(reset_avatar!/reset_header!/save) so the media re-fetches onto DO. Same
logic as the cdn-r2-refresh.sh + cdn-404-refresh.sh crons, but live — so
once the fallback is removed, a not-yet-migrated avatar 404s on first
view and heals within ~60s instead of staying broken to 24h.

Won't hammer Rails. Candidates are debounced (per-account TTL, default
6h) and micro-batched — at most one rails runner per FLUSH_INTERVAL
(60s) covering all accounts seen that window; low event volume means
most windows do nothing. ~8 MB RSS. Restart=always; exits on tail death
so systemd restarts it. DRY_RUN=1 logs candidates without acting.

Topology. The service is primary (live healing of r2 and 404). The two
daily crons stay as downtime backstops: cdn-r2-refresh.sh goes inert
once the fallback is removed; cdn-404-refresh.sh remains the daily
backstop to the streaming 404 healer.

Phased decommission (now planned). (1) Retire the nginx R2 fallback to a
DO-only config once the service is soaked and upstream=r2 is low; keep
the R2 bucket + [r2] rclone remote a few weeks as the recovery net. (2)
Delete the bucket weeks later once DO is confirmed authoritative. See
cdn-r2-to-do-migration.md (“Real-time self-heal” + “Phased
decommission”).

Verified

-   python3 -m ast parse OK. Dry-run with injected synthetic lines:
    detected an avatar 404 and a header upstream=r2 (2 candidate
    accounts), correctly ignored a media_attachments 404.
-   systemctl: active (enabled); python main + tail -F child up; log
    shows dry_run=False flush=60s ttl=21600s. Tails from EOF so it does
    not reprocess history on start.
-   Context: distinct accounts/day still served by R2 (avatars/headers)
    are trending down — 307/305/346 (Jun 22–24) → 160/183 (Jun 25–26).

2026-06-26 — yttrx mail migration off DigitalOcean COMPLETE: original droplet deleted

The standalone DigitalOcean mail droplet (mail.yttrx.com,
146.190.41.218, Debian 11) has been deleted. yttrx mail is now fully
self-hosted on bsd.peteftw.com, so the long-running migration off
DigitalOcean is finished.

-   As-built path (authoritative ref:
    secondbrain/mail-account-management.md): public IMAP via the imapgw
    jail (.126, 993, Dovecot proxy by domain) → permanent yttrximap
    backend jail (.128), Maildir on host ZFS /vmail; inbound via the mx
    jail (.125) FILTER → yttrximap LMTP; outbound submission via
    submitgw (.127) → smtp jail DKIM-signs (selector 2022). This went
    LIVE 2026-06-15; the DO box was kept frozen/out-of-path as a
    fallback for ~10 days.
-   2026-06-26: with the as-built stack proven stable, the DO droplet
    was destroyed. yttrx no longer has any DigitalOcean mail footprint.
-   Docs updated: mail-migration-to-bsd.md marked COMPLETE (was
    SUPERSEDED — the original front-gateway/yttrxmail design was never
    built, the proxy-jail design replaced it); README.md pointer
    updated.
-   Follow-up: mail-server.md still describes the now-deleted DO box and
    should be retired/rewritten to point at the bsd jails as a separate
    cleanup.
-   Reminder: remove the DO IP 146.190.41.218 from yttrx.com SPF if
    still present.

Verified: on bsd,
doveadm mailbox status -u waffles@yttrx.com messages INBOX reads the
live Maildir from /vmail (1717 msgs) through the yttrximap jail; all
mail jails Up.

2026-06-26 — CDN R2 drain: emoji backlog copy + post-R2 self-heal cron

Follow-on to the R2 fallback gate (below): actively draining the
remaining upstream=r2 fallthroughs toward zero so R2 can be
decommissioned (step 6 of cdn-r2-to-do-migration.md). Classification of
live R2-served objects showed three categories with different drain
mechanisms:

-   cache/media_attachments / preview_cards → already drained by the
    14-day retention vacuum (nightly VacuumScheduler).
-   cache/accounts (avatars/headers) → already drained by
    cdn-r2-refresh.sh (daily cron, refreshes accounts whose
    avatar/header hit R2).
-   cache/custom_emojis → drained by nothing (emoji are cached once and
    never refreshed; retention doesn't touch them). R2 held 434,679
    objects / 9.07 GiB; DO had only 2,200. This was the structural gap.

1. One-shot emoji copy (R2 → DO). No Mastodon-native bulk emoji re-fetch
exists, so copied the backlog with rclone (mammut, [r2]→[spaces]
remotes):

    rclone copy r2:yttrx-media/cache/custom_emojis spaces:yttrx/cache/custom_emojis \
      --size-only --transfers=16 --checkers=32 -P --log-file=/root/rclone-emoji-copy.log

Additive (copy, never deletes on DO); --size-only avoids cross-provider
modtime churn; same key paths, so DO then serves each emoji directly
(X-Upstream: do) instead of falling through. ~9 GiB delta.
Status-attachment bulk (~241 GB) is not copied — too large for DO's 250
GB base; it ages out.

2. New cron cdn-404-refresh.sh (post-R2 safety net), cron 05:15 daily.
cdn-r2-refresh.sh keys on upstream=r2, which disappears when the R2
fallback is removed — after that a missing remote avatar is a plain 404
with nothing to self-heal it. The new companion keys on the response
status instead: avatar/header request lines returning 404 (any upstream)
→ resolve the account id from the path →
reset_avatar!/reset_header!/save. Works during dual-backend (404 = miss
on both backends = already broken) and after R2 retirement. At
decommission, retire cdn-r2-refresh.sh and keep this one. Repo:
scripts/cdn-404-refresh.sh. First dry run (2026-06-25 log): 1 candidate.

Verified

-   Targeted/category sizing via rclone size: R2 cache/custom_emojis =
    434,679 obj / 9.07 GiB; DO = 2,200 obj. Copy COMPLETE — DO now holds
    437,099 obj / 9.8 GiB (≥ R2's 434,679; the surplus is DO's own
    pre-existing objects), so every R2 emoji is on DO. Log:
    /root/rclone-emoji-copy.log.
-   cdn-404-refresh.sh: bash -n clean; DRY_RUN=1 extracted a valid
    18-digit account id from the 404 avatar path; only 6 such 404s
    across current+rotated logs (load near-zero until R2 fallback is
    removed). Cron installed at 05:15.

2026-06-26 — CDN: gate R2 fallback to media paths + media-only histogram

A vuln scanner from a single GCP IP (34.125.27.233, 427 requests in one
hour) probing /actuator/*, /config/*, /.env, … on files.yttrx.com was
404ing on DO, falling through to the R2 fallback (404 again), and
showing up as a phantom R2 spike — cdn-backend-histogram.pl read 500 R2
hits at 2026-06-25 19:00 when only ~70 were real media. Each
fall-through is a billable R2 Class B GET for nothing.

1. nginx — gate @s3_fallback to genuine media paths
(/etc/nginx/sites-available/cdn-migration). An http-scope map classifies
the request path; non-media gets a terminal 404 before proxy_pass, so R2
is never contacted, and the logged upstream tag is - (not r2) so it
doesn't pollute the histogram:

    # http scope, alongside log_format / proxy_cache_path
    map $uri $is_media {
        default                0;
        ~^/accounts/           1;
        ~^/cache/              1;
        ~^/media_attachments/  1;
        ~^/site_uploads/       1;
        ~^/custom_emojis/      1;
    }

    # top of location @s3_fallback
    set $up 'r2';
    if ($is_media = 0) { set $up '-'; return 404; }
    ...
    add_header X-Upstream $up always;   # was the literal "r2"

DO still serves every request as primary — the gate only touches the
fallback — and real media not yet on DO still falls through to R2
unchanged. The five prefixes are the Mastodon S3 key roots; /system/* is
rewritten to them by the app and robots.txt is served locally via
try_files. Full detail in cdn-site.md.

2. cdn-backend-histogram.pl — count only Mastodon media by default
(deployed /root/, source scripts/cdn-backend-histogram.pl). Filters on
the request path (same five prefixes) rather than the upstream tag, so
old rotated logs — where the junk is still tagged upstream=r2 — are
filtered too. --all opts back into the unfiltered view; a footnote
reports the excluded count.

Backups/rollback:

-   nginx: restore
    /etc/nginx/sites-available/cdn-migration.bak.2026-06-26 →
    nginx -t && systemctl reload nginx.
-   perl: /root/cdn-backend-histogram.pl.bak.2026-06-26.

Verified

-   nginx -t OK; reloaded. A temporary X-Upstream-Addr $upstream_addr
    debug header confirmed routing:
    -   /actuator/env, /config/secret → 404, upstream-addr = DO only
        (138.68.x) → R2 untouched.
    -   /cache/<missing>.jpg → 404, upstream-addr = DO + Cloudflare
        (104.21.x) → fell through as designed. Debug header then
        removed.
-   cdn-backend-histogram.pl (media-only) drops 2026-06-25 19:00 R2 from
    500 → 70 and the window's R2 total from 1860 → 911 (1165 non-media
    excluded). --all reproduces the original 500 exactly.

3. fail2ban — new nginx-cdn-badprobe jail for the CDN vhost. The
existing nginx-badprobe jail (2026-06-23) only watches the app log
(/var/log/nginx/access.log) and keys on WordPress/PHP tokens — so it
never saw these CDN scanners (separate files_cache-format log,
/actuator-style paths). The $is_media gate above stops the junk reaching
R2 but the probes still hit DO as primary; this jail bans them at the
firewall.

-   Filter /etc/fail2ban/filter.d/nginx-cdn-badprobe.conf — allow-list
    failregex (the five media prefixes +
    robots.txt/favicon.ico//.well-known//bare /); anything else = probe.
    Anchored ^<HOST> \[ for the files_cache format (no - - after the
    IP):

        failregex = ^<HOST> \[[^]]*\]\s+"[A-Z]+ (?!/(?:accounts|cache|media_attachments|site_uploads|custom_emojis)/)(?!/robots\.txt)(?!/favicon\.ico)(?!/\.well-known/)(?!/ )\S

-   Jail /etc/fail2ban/jail.d/nginx-cdn-badprobe.local —
    logpath = /var/log/nginx/cdn.access.log, same policy as the sibling
    (nftables-multiport, maxretry=2, findtime=10m, bantime=24h → 1w).

-   Rollback: rm both files + fail2ban-client reload.

Verified (jail)

-   fail2ban-regex against the live log: 966 scanner lines matched, 0
    false positives — media, robots.txt, favicon.ico, bare / and
    /.well-known/ all correctly excluded.
-   End-to-end: injected two probe lines from TEST-NET-3 203.0.113.77 →
    fail2ban-client status showed it banned (Total failed: 2), nftables
    f2b-table set contained the IP (enforced at the firewall), then
    unbanned cleanly. fail2ban-client status lists all three jails:
    nginx-badprobe, nginx-cdn-badprobe, sshd.

2026-06-24 — Mastodon upgrade v4.6.0 → v4.6.1

Patch upgrade following mastodon-upgrade.md. v4.6.1 is a security
release (dependency updates) plus UI/bug fixes; no new env vars, no
breaking changes, no manual migrations (asset recompile is covered by
the from-source Bird UI build). PostgreSQL 15 (host systemd, not Docker)
stayed up throughout.

Sequence (build-first to minimize downtime):

1.  Rebuilt the themed image while the site stayed live (~25 min from
    source):

        /root/birdui-build/build-birdui-image.sh v4.6.1
        docker run --rm yttrx-mastodon-birdui:v4.6.1 cat config/themes.yml   # 4 entries ✓

2.  Brief downtime window — drain, snapshot, checkout, restore compose,
    restart:

        systemctl stop nginx
        cd ~mastodon/live && docker compose down
        cp -a ~mastodon/live ~mastodon/live-v4.6.0          # rollback snapshot
        cp ~mastodon/live/docker-compose.yml ~mastodon/docker-compose.yml.bak
        sudo -u mastodon git stash
        sudo -u mastodon git fetch origin --tags
        sudo -u mastodon git checkout v4.6.1
        cp ~mastodon/docker-compose.yml.bak ~mastodon/live/docker-compose.yml
        sed -i 's|:v4.6.0|:v4.6.1|g' ~mastodon/live/docker-compose.yml   # web/sidekiq/streaming tags
        docker compose up -d
        systemctl start nginx

    web + sidekiq run the local yttrx-mastodon-birdui:v4.6.1 (no pull);
    streaming pulled stock ghcr.io/mastodon/mastodon-streaming:v4.6.1.

Rollback: snapshot at ~mastodon/live-v4.6.0;
yttrx-mastodon-birdui:v4.6.0 image retained. Restore per the runbook's
Rollback section.

Verified

-   docker ps — all five live-* containers healthy; restart count 0 on
    web/sidekiq/streaming.
-   Web Puma booted clean (Ruby 4.0.5, Puma 8.0.2); streaming listening
    on :4000; no DB/migration errors in logs (sidekiq
    MentionResolveWorker timeouts are ordinary remote-federation noise).
-   https://yttrx.com/ → HTTP 200; /api/v1/instance reports
    version: 4.6.1; /api/v1/streaming/health → HTTP 200.
-   Bird UI themes intact: docker exec live-web-1 cat config/themes.yml
    shows all 4 entries (default + 3 bird-ui variants).

2026-06-24 — exempt /robots.txt from the scraper hard-block (403)

A UA/CIDR caught by scraper-block.conf was 403'd on every path,
including /robots.txt — so a hard-blocked scraper (e.g. Bytespider)
could never read the Disallow: / meant to send it away. (Spotted as
47.128.25.204 getting 403 on GET /tags/love with a Bytespider UA; the
same client also 403'd on /robots.txt.)

Root cause: the if ($bad_ua) { return 403; } /
if ($bad_cidr) { return 403; } guards live at server scope in
/etc/nginx/sites-available/mastodon, so they fire in the rewrite phase
before the location = /robots.txt block is ever selected.

Fix — entirely inside the shared snippet
/etc/nginx/snippets/scraper-block.conf, server blocks untouched:

-   Renamed the source maps $bad_ua → $bad_ua_match (UA map) and
    $bad_cidr → $bad_cidr_match (geo).

-   Added an exact-path exemption and re-derived the original guard
    variable names so they yield 0 for /robots.txt:

        map $uri $scraper_exempt {            # $uri = normalized path, no query string
            default      0;
            /robots.txt  1;
        }
        map $scraper_exempt$bad_ua_match   $bad_ua   { "01" 1; default 0; }
        map $scraper_exempt$bad_cidr_match $bad_cidr { "01" 1; default 0; }

    Block only when "not-exempt + bad" ("01"). The four if guards in the
    two TLS server blocks still reference $bad_ua / $bad_cidr unchanged,
    so the fix applies to every vhost that includes the snippet
    (yttrx.com, masto.yttrx.com, and the cdn-migration site).

Backup: /etc/nginx/snippets/scraper-block.conf.bak-robots.20260624.

Verified (curl via --resolve …:127.0.0.1, both vhosts):

-   Bytespider UA → /robots.txt = 200 (real Disallow body served), any
    other path (/tags/love) = 403.
-   Normal UA → /robots.txt = 200 (unchanged).

NB the all-China nftables :443 drop still sits in front of nginx, so
CN-range clients won't reach this exemption — but UA-based blocks (the
common case for AI scrapers off non-CN IPs) now get the robots.txt.

------------------------------------------------------------------------

2026-06-23 — fail2ban auto-ban for path scanners (PHP/.env/etc.) + nginx 444 drop

Added a fail2ban-style mitigation against the constant background noise
of WordPress/PHP/secrets scanners (/xs.php, /wp-login.php, /.env,
/.git/…, /vendor/…). yttrx serves no PHP, .env, .git, or wp-* files, so
any request for those paths is unambiguously a scanner. The mitigation
keys on the request path, never on rate, so ActivityPub federation
(/inbox, /users/…, /.well-known/…) can never be caught. This is the
first fail2ban on mammut (previously only on bsd's mail jails).

Two layers, both on mammut:

1. nginx 444 drop (cheap neutralizer). Both TLS server blocks in
/etc/nginx/sites-available/mastodon (yttrx.com, masto.yttrx.com) gained,
immediately after the if ($bad_cidr) guard:

    # PHP/.env/.git/wp-* probes for paths we never serve -> drop cheaply.
    # Still logged (status 444) so fail2ban (nginx-badprobe) bans repeat offenders.
    location ~* "(?:\.php|\.env|/\.git|/wp-|/phpmyadmin|/adminer|/xmlrpc|/\.aws/|/\.ssh/|/vendor/|\.sql)" { return 444; }

Probes are dropped at nginx (connection closed, no Puma 404 churn) but
still written to access.log as status 444 so fail2ban can act on the
line. Backup: /etc/nginx/sites-available/mastodon.bak-badprobe.20260623.

2. fail2ban nginx-badprobe jail. Matches the same path set in
/var/log/nginx/access.log and bans at the firewall via nftables:

-   Filter /etc/fail2ban/filter.d/nginx-badprobe.conf — failregex
    anchored on <HOST> + request line containing any never-served path
    token.
-   Jail /etc/fail2ban/jail.d/nginx-badprobe.local —
    banaction = nftables-multiport, port = http,https, maxretry = 2,
    findtime = 10m, bantime = 24h, bantime.increment = true,
    bantime.maxtime = 1w.
-   Bans live in their own table inet f2b-table — isolated from the
    inet cnblock table and Docker's ip/ip6 tables.

Incidental: sshd jail fixed + activated. Installing fail2ban
auto-enabled the Debian [sshd] jail, which expects /var/log/auth.log
(absent — mammut logs sshd to journald), causing the whole config to
abort with "Have not found any log file for sshd jail". Fixed by adding
[sshd] backend = systemd to the jail file and installing python3-systemd
(the systemd backend's missing dependency — without it the server logs
"Server ready" then exits 255). SSH brute-force protection is now also
active.

Verified.

-   fail2ban-regex over the live access.log: 112 matches, all 4 source
    IPs were scanners (20.195.187.84 + 78.142.18.172, 31.220.84.188,
    194.164.192.228); 0 of the 3,439 legit lines matched (no /inbox,
    /users, /.well-known).
-   curl -H 'Host: yttrx.com' https://127.0.0.1/wp-login.php and /.env →
    HTTP 000 (444 drop); /about → 200 (legit traffic untouched). Both
    444s appear in access.log, so the jail can read them.
-   fail2ban-client status shows both nginx-badprobe and sshd active.
    The sshd jail banned 8 real brute-forcers within seconds of first
    start — confirming the nftables ban action works end-to-end
    (nft list table inet f2b-table shows the populated set + reject
    rule).
-   systemctl is-enabled fail2ban → enabled (survives reboot).

  Gotcha: 127.0.0.1 is in fail2ban's default ignoreip, so local curl
  tests drop (444) but never self-ban — test the ban path from an
  off-box IP.

------------------------------------------------------------------------

2026-06-22 — UA block: mastodon-scraper public-timeline poller

Blocked a self-identified scraper hammering the public timeline.
37.28.232.221 made 3349 hits in one day, single UA
mastodon-scraper/5.0-go (+mailto:stir-clamor-ruined@duck.com), polling
/api/v2/instance, /api/v1/streaming?stream=public, and
/api/v1/timelines/public?limit=40 in a tight since_id loop. App only — 0
hits on the CDN (cdn.access.log).

Why a UA block, not a CIDR. The IP is Vodafone Portugal DSL (AS12353,
inetnum 37.28.232.0/22, PT-VDF-2003) — a residential/dynamic line.
Blocking the /22 would hit innocent Vodafone DSL customers and the
scraper just renews its DHCP lease; the UA is self-declared and stable,
so it's the durable handle. This mirrors why named bots live in $bad_ua
rather than $bad_cidr.

What changed.

-   Added "~*mastodon-scraper"  1; to the $bad_ua map in
    /etc/nginx/snippets/scraper-block.conf (after the AIWebIndex entry),
    with a comment noting the IP/ASN and date. nginx -t + reload
    succeeded.
-   Backup: scraper-block.conf.bak-mastodon-scraper.20260622-170*.

Verified.

-   Exact UA mastodon-scraper/5.0-go (+mailto:…) → 403 on both
    /api/v2/instance and /api/v1/timelines/public; baseline GPTBot →
    403.
-   Legit Mastodon federation UA
    (http.rb/… (Mastodon/4.3.0; +https://…)) → 200 — ~*mastodon-scraper
    doesn't catch federation traffic.
-   Watched the actor's own real requests flip 200→403 mid-stream at
    17:05:46 in access.log (not a synthetic test).

  Gotcha logged: systemctl reload nginx returns before workers finish
  swapping — the first verify curl raced the reload and saw a stale 200.
  Re-test a few seconds later to confirm.

This won't auto-catch a UA rotation. If the same /timelines/public
since_id polling pattern reappears under a new UA, that's the signal it
adapted.

------------------------------------------------------------------------

2026-06-22 — All-China block added at the firewall (nftables, :443 only)

Pushed the all-China block down a layer: CN IPv4+IPv6 is now dropped at
the firewall on :443 in addition to the existing nginx 403. Cheaper (no
TLS handshake / nginx worker for blocked clients) and not bypassable by
a forgotten per-vhost if guard. The nginx 403 is kept as defence in
depth — it also catches non-CN scraper UAs and the off-ASN Tencent
ranges that aren't "all China."

Firewall reality on mammut (informed the design): nftables backend,
Docker owns the ip/ip6 tables, host nginx listens directly on :443 (so
inbound 443 hits INPUT, not Docker's FORWARD), netfilter-persistent
persists the manual Postgres-5432 INPUT rules, no
ipset/ufw/CF/cloud-firewall layer.

What changed.

-   New self-contained table inet cnblock: interval sets cn4/cn6
    (auto-merge) + an input hook at priority −10, policy accept, with
    tcp dport 443 ip saddr @cn4 counter drop and the ip6 saddr @cn6
    twin. A separate table, so it never touches Docker's tables or the
    5432 rules. Only :443 is filtered — SSH/:22 stays open, no lockout
    risk even on a bad CIDR.
-   cn-block-refresh.py now drives both layers from one APNIC fetch —
    writes the nginx include and /etc/nftables-cnblock.conf, reloading
    each (nginx -t+reload; nft -f). Same weekly cron keeps the firewall
    in sync.
-   cnblock-nft.service (oneshot nft -f /etc/nftables-cnblock.conf,
    RemainAfterExit, enabled) restores the table on boot. Persistence is
    independent of netfilter-persistent.
-   Set sizes after auto-merge: 4108 cn4 + 2008 cn6 intervals (collapsed
    from 5488 v4 / 2012 v6 CIDRs — same coverage, adjacent ranges
    merged).

Verified.

-   Membership (nft get element): CN log IP 2408:8722:1040:80::54 and
    1.0.1.5 → in set; Cloudflare 2606:4700:4700::1111 and Google 8.8.8.8
    → not.
-   Drop rule live with climbing counters from real traffic (v4 83→129,
    v6 42→44 packets across two reads) — actual CN clients being dropped
    at :443, not a synthetic test.
-   Boot path: systemctl start cnblock-nft reloads the table cleanly
    (status=0, active (exited)), multi-user.target.wants symlink
    present.
-   SSH and the site for non-CN clients unaffected; nginx still active.

To lift the firewall layer: systemctl disable --now cnblock-nft then
nft delete table inet cnblock. The nginx all-China 403 remains until
separately removed (drop the scraper-cidrs-cn.conf include from
scraper-block.conf).

------------------------------------------------------------------------

2026-06-22 — All-China block extended to IPv6 (closed the v6 gap)

The all-China stopgap block was IPv4-only — CN IPv6 clients (e.g. China
Unicom 2408::/…) sailed straight through to Mastodon. Spotted a China
Unicom v6 client (2408:8722:1040:80::54) loading /tags/ web assets in
the access log; confirmed nothing in $bad_cidr matched it.

Root cause. /root/cn-block-refresh.py hard-filtered f[2] != "ipv4" when
parsing the APNIC delegated-stats file, silently dropping every CN IPv6
allocation. The geo $bad_cidr block matches $remote_addr (v4 or v6
alike), so the only missing piece was generating the v6 CIDRs.

What changed.

-   cn-block-refresh.py now parses both families: ipv4 as before, ipv6
    as start/prefixlen (the APNIC value field is a prefix length for v6,
    a host count for v4). Each family is collapsed separately
    (collapse_addresses can't mix families) and written to the same
    include, with a # --- IPv6 --- divider. Added a v6 sanity floor
    (< 500 → refuse to write) mirroring the v4 one. Backup of the prior
    script: /root/cn-block-refresh.py.bak-prev6.20260622.
-   Ran it live: scraper-cidrs-cn.conf now holds 5488 v4 + 2012 v6
    aggregated CN CIDRs (was v4-only). nginx -t + reload succeeded. The
    weekly cron (50 4 * * 1) now keeps v6 fresh automatically.

The pre-existing [warn] duplicate network lines on reload are unchanged
and harmless — v4 ranges (e.g. Alibaba 39.96.0.0/13) that appear in both
the bulk scraper-cidrs.conf and the all-CN list; the second definition
collapses to the same 1.

Verified.

-   DRY_RUN=1 preview: 5488 v4 + 2012 v6 CN CIDRs.
-   2408:8000::/20 (the China Unicom /20 covering the observed client)
    present in the loaded conf; grep -c confirms 2012 v6 entries.
-   nginx -t syntax OK; systemctl reload nginx clean.

------------------------------------------------------------------------

2026-06-22 — CDN: nginx-layer cache on files.yttrx.com + cache hit/miss in the access log

Added an nginx proxy_cache to the live files.yttrx.com config and
surfaced its result in the access log, so we can see cache hit/miss (not
just which backend served a request).

Context / why. The enabled site is cdn-migration (DO Spaces primary → R2
fallback), not cdn-r2 as cdn-site.md claimed — that doc was stale and
has been corrected. Cloudflare does not front files.yttrx.com; clients
hit mammut nginx directly, and there was no cache at this layer, so
every read re-fetched from the origin and the log had no hit/miss signal
at all.

What changed (/etc/nginx/sites-available/cdn-migration; backup
cdn-migration.bak-cachelog.20260622-044226):

-   New
    proxy_cache_path /var/cache/nginx/cdn levels=1:2 keys_zone=cdn_cache:10m max_size=5g inactive=7d use_temp_path=off;
    at http scope (dir created 0700 www-data).
-   Both @s3_primary and @s3_fallback gained:
        proxy_ignore_headers Set-Cookie Cache-Control Expires X-Accel-Expires;
        proxy_cache cdn_cache;
        proxy_cache_valid 200 30d;
        proxy_cache_revalidate on;
        proxy_cache_lock on;
        proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
        add_header X-Cache-Status $upstream_cache_status always;
-   files_cache log_format now ends
    upstream=$sent_http_x_upstream cache=$upstream_cache_status (was
    just upstream=…).

Phantom-404 landmine, defused. A prior cache attempt here cached
upstream 404s and returned fake 404 HITs that never fell through DO→R2
(the reason the cache was removed before). Cause:
proxy_intercept_errors + error_page 404 = @s3_fallback. Fix:
proxy_ignore_headers … Cache-Control Expires X-Accel-Expires so nginx
ignores upstream caching hints and caches only what proxy_cache_valid
whitelists (200). 404/403 are never cached, so the fallback and real
404s are preserved. The default cache key includes $proxy_host, so DO
and R2 also can't collide on a shared key.

Tooling. cdn-backend-histogram.pl (repo scripts/, deployed /root/) now
parses the new cache= field and prints a second hourly histogram — cache
HIT vs MISS — below the existing DO-vs-R2 backend histogram. Rendering
was refactored into a reusable render_hist() sub. Old (pre-cache) log
lines have no cache= token and are simply uncharted in the cache
histogram.

Operational note. Cached 200s persist up to 30d. After a backend
cutover, force a re-fetch with
rm -rf /var/cache/nginx/cdn/* && systemctl reload nginx.

Verified.

-   nginx -t OK; reloaded.
-   Real object: MISS → HIT → HIT; object present under
    /var/cache/nginx/cdn/; log shows ... upstream=do cache=HIT.
-   Missing object: 404 on both requests, served via upstream=r2 (DO→R2
    fallback intact, no phantom HIT).
-   cdn-backend-histogram.pl runs clean against the mixed-format live
    log and renders both histograms (perl -c OK).

2026-06-19 — fix: cdn-r2-refresh.sh real run was silently failing (LoadError)

The daily R2→DO accelerator cdn-r2-refresh.sh (see
cdn-r2-to-do-migration.md §6) had never actually refreshed anything. Its
real-run path mktemps a temp Ruby file (mode 0600 root:root), docker cps
it into live-web-1, and runs bin/rails runner /tmp/cdn-r2-refresh.rb —
but Rails runs as the container's mastodon uid (991), which can't read a
root-owned 0600 file, so every real run died with
LoadError: cannot load such file -- /tmp/cdn-r2-refresh.rb. The
DRY_RUN=1 path is pure host-side log parsing and was unaffected, which
masked the bug; the 04:45 cron failed silently to /dev/null.

Fix. chmod 0644 the temp ids + Ruby files before docker cp so the
in-container mastodon user can read them (repo
scripts/cdn-r2-refresh.sh, deployed to /root/cdn-r2-refresh.sh).

Verified. A 0644 temp file now loads under bin/rails runner; the first
real run (against yesterday's 355 candidates) is actively re-caching
avatars/headers to DO (deleting cache/accounts/avatars/… + re-fetch
through the DO writer) — the accelerator works for the first time. Net
effect: the upstream=r2 decay signal will now actually move, instead of
relying solely on passive 14-day aging.

2026-06-19 — Mastodon Bird UI as user-selectable site themes (custom image)

Added Mastodon Bird UI (the Twitter-like skin) as opt-in Site themes
users pick under Preferences → Appearance. Stock Mastodon stays the
default; nobody is forced onto it.

Why it needed a custom image. A selectable theme is a compiled asset
(config/themes.yml entry + Vite-built CSS), not something you can paste
in. Our running image is the stock slim runtime
ghcr.io/mastodon/mastodon:v4.6.0, which has no node/Vite and an empty
node_modules — it physically can't precompile assets. (Site-wide Admin →
Custom CSS can be pasted, but that forces the look on everyone and isn't
selectable — not what we wanted.) So web/sidekiq now run a from-source
build of v4.6.0 with the themes baked in.

What was built.

-   /root/birdui-build/build-birdui-image.sh (repo:
    scripts/build-birdui-image.sh) clones Mastodon @ the version tag +
    bird-ui nightly (nightly targets 4.6+; main is 4.5), runs bird-ui's
    install-to-mastodon.sh --variations (feeding n to its "set as
    default?" prompt so bird-ui stays opt-in), and docker builds
    yttrx-mastodon-birdui:v4.6.0 (~20–30 min, Vite precompiles the theme
    CSS).
-   Resulting config/themes.yml: default (stock) + mastodon-bird-ui-auto
    (light/dark auto), mastodon-bird-ui-accessible,
    mastodon-bird-ui-accessible-plus.
-   Bespoke ~mastodon/live/docker-compose.yml: web + sidekiq image
    swapped from ghcr.io/mastodon/mastodon:v4.6.0 →
    yttrx-mastodon-birdui:v4.6.0; streaming left on the stock streaming
    image. docker compose up -d web sidekiq.

Upgrade impact (important). This must be rebuilt from source on every
Mastodon upgrade — if web/sidekiq get pointed back at the stock image,
the themes silently vanish for everyone who selected one. The upgrade
runbook now has a required step 7b covering this (see
mastodon-upgrade.md).

Backup / rollback. Compose backup:
~mastodon/live/docker-compose.yml.bak-birdui-20260619. To remove the
themes entirely: set web/sidekiq back to
ghcr.io/mastodon/mastodon:v4.6.0 and docker compose up -d web sidekiq
(the site runs fine; selected themes fall back to default). Build
context lives in /root/birdui-build/.

Verified.

-   Image: themes.yml has the 4 entries;
    public/packs/assets/themes/mastodon-bird-ui-*.css compiled in (769
    MB image).
-   Live: web+sidekiq recreated on the new image, both healthy;
    streaming untouched; no boot errors. yttrx.com → 200; themed CSS
    served through nginx→web
    (/packs/assets/themes/mastodon-bird-ui-auto-*.css → 200).

Post-deploy cleanup. The from-source build left ~25 GB of dangling
layers + Docker build cache. Pruned with docker image prune -f (dangling
only — keeps tagged images) + docker builder prune -f: reclaimed ~25 GB
(build cache 32.6 → 7.5 GB, all remaining in-use), Docker disk 168 → 142
GB. Kept both yttrx-mastodon-birdui:v4.6.0 and the stock
ghcr.io/mastodon/mastodon:v4.6.0 (the rollback fallback); stack stayed
healthy throughout.

2026-06-19 — anti-abuse: block Tencent scrapers + all-China stopgap + auto-sweep

Tencent Cloud was crawling the instance hard — distinct IPs rotating
across the 114.117.128.0/17 and 43.140.0.0/15 (TENCENT-CN, AS45090)
ranges, all hammering /media_proxy/* on the app and /cache/* on the CDN
with a flat rotating-Chrome-UA fingerprint (versions 103–133) and
spoofed yttrx.com referers — the same operation already documented for
the ACEVILLE/BytePlus ranges, now on a new mainland-Beijing ASN. ~40
distinct 114.117 IPs + dozens of 43.140 IPs in 24h. Extended the
existing nginx $bad_cidr hard-block on three fronts. See anti-abuse.md →
"Crawler / scraper control".

1. Closed the CDN gap. The scraper-block.conf if ($bad_ua)/$bad_cidr
guards were only wired into the mastodon site (app: yttrx.com /
masto.yttrx.com). The cdn-migration site (files.yttrx.com) had neither
guard, so scrapers pulled media off the CDN unblocked (207 114.117 hits
in cdn.access.log). Added both guards to the files.yttrx.com 443 server
block. The map/geo are http-scope (defined once via the mastodon site's
include), so the CDN block references $bad_ua/$bad_cidr without
re-including. Confirmed both vhosts log real client IPs (no CF real-ip
layer active), so the geo match on $remote_addr works.

    # in cdn-migration, files.yttrx.com 443 server { } after keepalive_timeout:
    if ($bad_ua)   { return 403; }
    if ($bad_cidr) { return 403; }

2. Blocked the two Tencent /17–/15 ranges inline in the geo $bad_cidr
block of scraper-block.conf (alongside the existing off-ASN 150.5/16 /
14.191/16 entries):

    114.117.128.0/17  1;   # TENCENT-CN AS45090
    43.140.0.0/15     1;   # TENCENT-CN AS45090 (inetnum 43.140.0.0-43.141.255.255)

3. All-China stopgap (broad). New auto-generated include
/etc/nginx/snippets/scraper-cidrs-cn.conf → 5488 aggregated CN IPv4
CIDRs from the authoritative APNIC delegated-stats file, all
$bad_cidr=1. Generated + refreshed by /root/cn-block-refresh.py (repo:
scripts/cn-block-refresh.py), cron 50 4 * * 1 (weekly; allocations
drift). IPv4 only — v6 clients are unaffected (a known gap; the abuse is
all v4). Reload emits ~30 benign duplicate network warnings where CN
CIDRs (Alibaba ranges) already exist in the bulk scraper-cidrs.conf —
nginx keeps the first, harmless.

4. Targeted Tencent auto-sweep (narrow, self-maintaining). New
/root/tencent-sweep.py (repo: scripts/tencent-sweep.py), cron 20 4 * * *
(daily). Scans app+CDN logs, groups client IPs by /16, rate-limited
whois on a representative IP per busy /16, and for any /16 that whois
identifies as Tencent or ACEVILLE appends the registered route:/inetnum:
CIDR — deduped against everything already blocked — to
/etc/nginx/snippets/scraper-cidrs-tencent-auto.conf, then nginx -t +
reload. Per-/16 whois cache (/var/lib/tencent-sweep/whois-cache.json),
MAX_WHOIS cap, DRY_RUN switch, logs to /var/log/tencent-sweep.log.
Catches ACEVILLE (SG, AS132203) ranges that rotate outside the CN GeoIP
block. Both new includes wired into the geo $bad_cidr block after the
bulk scraper-cidrs.conf include.

Backups / rollback.

-   Config backups on mammut:
    scraper-block.conf.bak-tencent45090.20260619,
    cdn-migration.bak-scraperblock.20260619, crontab.bak-20260619.
-   Undo all-CN: remove the scraper-cidrs-cn.conf include line from
    scraper-block.conf (or empty the file) + reload. Undo the sweep:
    remove its include line + cron. Remove the two cron lines with
    crontab -e.

Verified.

-   Live traffic: 114.117.213.234 (Tencent), 121.237.36.31 (China
    Telecom), 111.37.49.70 (China Mobile) all flipped to 403 on app +
    CDN post-reload.
-   Positive control from a non-CN egress: yttrx.com 200,
    masto.yttrx.com 200, a real files.yttrx.com/cache/... object 200 —
    legit visitors unaffected. (files.yttrx.com/ itself 403s on the
    pre-existing no-directory-listing rule, not the new block.)
-   nginx -t clean after each change; tencent-sweep.py --DRY_RUN scanned
    3927 /16s and correctly found nothing new to add (ranges already
    covered).
-   CN generator dry-run parsed 5488 CIDRs from APNIC (sanity floor
    1000).

2026-06-19 — CDN R2→DO: re-cache test + force-refresh + daily accelerator

Accelerated the R2→DO media migration (cutover was earlier on
2026-06-18, see cdn-r2-to-do-migration.md) by forcing cached remote
avatars/headers to re-cache onto DigitalOcean Spaces, and automated the
same as a daily job. No production config changed; this only moves
re-fetchable cache media.

1. Validated how re-caching works (test). Deleted a live cached remote
avatar from R2 (bigzaphod@mastodon.social) and observed:

-   A CDN miss does not trigger re-caching — Mastodon doesn't watch
    files.yttrx.com; the origin just 404s (DO miss → R2 miss) and the
    account keeps pointing at the dead key.
-   tootctl accounts refresh <acct> (reset_avatar! + reset_header! +
    save) re-downloads from the account's stored remote URL through the
    current S3 writer → lands on DO (x-upstream: do) under a new
    filename. The old R2 object is orphaned (writer points at DO), which
    is moot since the bucket gets deleted at decommission.

2. One-off force-refresh of followed-remote accounts. Refreshed every
remote account followed by ≥1 local user — 2,674 accounts — via a
detached single bin/rails runner (/root/followed-refresh.rb, log
/root/followed-refresh.log). Result: refreshed 2,471, failed 203
(failures = dead/unreachable instances or masto.host CDNs returning
401/404; those keep their existing avatar). This pulls the most
user-visible media onto DO immediately.

3. Daily log-driven accelerator (new cron). Added
/root/cdn-r2-refresh.sh (repo: scripts/cdn-r2-refresh.sh), cron
45 4 * * *, log /var/log/cdn-r2-refresh.log. It scans yesterday's
cdn.access.log for upstream=r2 hits on
/cache/accounts/{avatars,headers}/, extracts each account id from the
object path's id-partition, and refreshes those accounts so they
re-cache to DO. This converges upstream=r2 → zero (the R2-decommission
signal) without re-fetching media nobody requests. DRY_RUN=1 reports
candidates only. First dry run (2026-06-18 log): 355 candidates from 398
R2 avatar/header hits.

Scope/limits. Only avatars + headers are re-cached (re-fetchable);
status-attachment media (the bulk of the 241 GB cache) is intentionally
left to age out within the 14-day retention window. The decommission
trigger is unchanged: upstream=r2 hits trending to zero in
cdn.access.log.

Rollback. Remove the cron line (crontab -e) and /root/cdn-r2-refresh.sh
to stop the daily job; the one-off refresh is not reversible but is
benign (media simply re-downloaded to DO). No nginx/.env/bucket changes
were made here.

Verified.

-   Post-delete origin returned 404; post-refresh the new avatar served
    HTTP 200, x-upstream: do (tested via
    curl --resolve files.yttrx.com:443:127.0.0.1 to bypass Cloudflare's
    30d edge cache).
-   Spot-checked 5 bulk-refreshed accounts (gnomon, Gargron, stux,
    feditips, paul_briley) — each current avatar key present on DO
    Spaces and served x-upstream: do.
-   Bulk run completed 2,674/2,674; daily script dry-run produced 355
    candidates.

2026-06-19 — anti-abuse: dormant-account sweep (live, DM-only)

Added a scheduled sweep that reversibly limits (silences) long-dormant
local accounts and DMs each one an appeal/reactivation link. This is the
proactive counterpart to the report-driven abuse-bot — roadmap item A in
anti-abuse.md.

Why it runs where it does. The dormancy signal is web login
(users.current_sign_in_at), which the admin REST API does not expose —
it's DB-only. So unlike the welcome/abuse bot (which runs on admin over
HTTP), this sweep runs on mammut, inside the live-web-1 container via
bin/rails runner, acting as the bot account directly
(Admin::AccountAction(type: 'silence') + PostStatusService). No API
token or secret lives on the host.

What counts as "dormant" — ALL three older than the cutoff, so it never
catches app-only or lurk-and-post users:

-   web login users.current_sign_in_at (NULL counts if created before
    cutoff),
-   app/API token use Doorkeeper::AccessToken.last_used_at,
-   last post account_stats.last_status_at.

Login alone is a trap — it only updates on web login, so at a 6-month
cutoff it flagged 1004 / 1548 accounts (~65%); the three-signal
definition is honest as app usage grows.

Policy at launch:

-   cutoff 18 months → 906 candidates;
-   DM only, email OFF. A native strike-email blast to ~900 years-dead
    addresses would bounce heavily and damage yttrx's own sending
    reputation (admin@yttrx.com via submitgw; Mastodon has no bounce
    suppression). The strike is still created and appealable in-app
    without the email;
-   reversible silence, applied without a report_id;
-   staff never touched (any assigned role_id) + allowlist
    waffles,tommertron,davis,bot;
-   batch cap 50/run, daily cron at 04:30.

Where:
/root/dormant-sweep/{dormant_sweep.rb,dormant-sweep.sh,dormant-sweep.env}
on mammut; cron 30 4 * * * → /var/log/dormant-sweep.log. Source of
truth: workstation ~/yttrx-welcomebot/dormant-sweep/. Appeals page:
https://welcome.yttrx.com/posts/account-inactive/ (Hugo content
account-inactive.md on admin).

Rollback (un-limit the batch) via the [dormant-sweep]-tagged strike
note:

    AccountWarning.where("text LIKE ?", "%[dormant-sweep]%").find_each do |w|
      w.target_account.unsilence! if w.target_account&.silenced?
    end

…or from the limited @handle lines in /var/log/dormant-sweep.log. Pause
without rollback: set DORMANT_DRY_RUN=true in the env.

Verified

First live batch ran 2026-06-19 04:32 (cap 50), confirmed read-only
afterward:

    acted=50 (dry_run=false), skipped allowlist=1 (davis)
    silenced in last 10 min   = 50
    [dormant-sweep] strikes   = 50
    bot direct DMs            = 51   (50 users + 1 waffles summary)

~856 of the 906 backlog remain → ~17 more days at 50/day. Verified
against Mastodon 4.6.0 (Admin::AccountAction, PostStatusService,
User.confirmed, and the three signal columns all present).

2026-06-18 — welcome.yttrx.com: toplevel-link rewrite for posts

Hugo builds the help posts under /posts/<slug>/index.html, but the short
toplevel URLs (e.g. https://welcome.yttrx.com/account-limited/) are
nicer to share. Added an internal rewrite (not a redirect — the address
bar stays on the toplevel URL) so a toplevel path serves the matching
post when no real toplevel page exists. Same mechanism the location = /
root already uses with try_files.

Edited the main location / block in /etc/nginx/sites-available/welcome
on admin.yttrx.com:

    location / {
        # Serve toplevel links from the matching post when no toplevel page exists,
        # e.g. /account-limited/ -> /posts/account-limited/index.html (internal, no redirect).
        try_files $uri $uri/ /posts$uri/index.html /posts$uri.html =404;
    }

/posts/<slug>/ keeps working unchanged; real toplevel pages (/about/)
are still served directly; unknown paths still 404.

Backup before edit: /etc/nginx/sites-available/welcome.bak-20260618.
Rollback:

    sudo cp /etc/nginx/sites-available/welcome.bak-20260618 /etc/nginx/sites-available/welcome
    sudo nginx -t && sudo systemctl reload nginx

Verified

    /account-limited/        -> 200 (byte-identical to /posts/account-limited/, md5 MATCH)
    /account-limited         -> 200
    /account-inactive/       -> 200
    /posts/account-limited/  -> 200 (unchanged)
    /about/                  -> 200 (real toplevel page, served directly)
    /does-not-exist/         -> 404

sudo nginx -t passed; reloaded with sudo systemctl reload nginx.

2026-06-18 — help.yttrx.com: 301 old welcome page → welcome.yttrx.com

The original welcome/landing post lived at
https://help.yttrx.com/help/2023/03/12/welcome-to-yttrx.html (waffles'
Jekyll build). Its content moved to the Hugo site welcome.yttrx.com back
on 2026-06-16, but the old help URL still served the stale page (200).
Added a permanent redirect so the old URL forwards to the new landing
site.

What changed

-   Added an exact-match location to the HTTPS server block in
    /etc/nginx/sites-available/help on admin.yttrx.com, just before the
    catch-all location /:

        location = /help/2023/03/12/welcome-to-yttrx.html {
          return 301 https://welcome.yttrx.com;
        }

-   nginx -t && systemctl reload nginx. waffles' static files under
    /var/www/html/help are untouched — this is purely an nginx-level
    redirect.

Backup / rollback

-   Config backup:
    /etc/nginx/sites-available/help.bak-welcome-redirect.20260618.
-   Rollback:
    cp help.bak-welcome-redirect.20260618 help && nginx -t && systemctl reload nginx.

Verified

    $ curl -sI https://help.yttrx.com/help/2023/03/12/welcome-to-yttrx.html
    HTTP/2 301
    location: https://welcome.yttrx.com
    $ curl -sL -o /dev/null -w '%{http_code} %{url_effective}\n' \
        https://help.yttrx.com/help/2023/03/12/welcome-to-yttrx.html
    200 https://welcome.yttrx.com/
    $ curl -s -o /dev/null -w '%{http_code}\n' https://help.yttrx.com/   # 200 — rest of help site unaffected

------------------------------------------------------------------------

2026-06-18 — robots.txt: block all crawlers except the Internet Archive

Replaced Mastodon's stock robots.txt on yttrx.com and masto.yttrx.com
with a block-all-but-Internet-Archive policy, matching peteftw.com. All
well-behaved crawlers (Google/Bing included) are told to stay out; only
the Internet Archive's crawler (ia_archiver + archive.org_bot) is
allowed, so the Wayback Machine can still snapshot the site. This is a
deliberate blanket opt-out of search indexing for the whole instance,
not just AI-scraper defence. See anti-abuse.md → "Crawler / scraper
control" for the standing policy.

What changed

-   New static file /var/www/html/robots.txt on mammut:

        User-agent: *
        Disallow: /
        User-agent: ia_archiver
        Disallow:
        User-agent: archive.org_bot
        Disallow:

-   yttrx.com — no config change needed. Its server block already has
    root /var/www/html; and location / does try_files $uri @proxy, so
    dropping the file makes nginx serve it ahead of the Puma proxy.
    (Previously /robots.txt fell through to Mastodon, which served its
    stock file.)

-   masto.yttrx.com — that server block has no root, so added an
    explicit block before its location = / { in
    /etc/nginx/sites-available/mastodon:

        location = /robots.txt {
          root /var/www/html;
          default_type text/plain;
        }

    Applied with nginx -t && systemctl reload nginx.

-   Confirmed the Internet Archive UAs are not in scraper-block.conf, so
    the $bad_ua hard-block (403) won't override the robots.txt allow.

Backup / rollback

-   Config backup:
    /etc/nginx/sites-available/mastodon.bak-robots.20260618.
-   Rollback: rm /var/www/html/robots.txt (reverts both hosts to
    Mastodon's stock app-served robots.txt — the masto location = block
    then 404s and falls back to the proxy; or restore the
    .bak-robots.20260618 config and reload to remove it entirely).

Verified

    $ curl -sI https://yttrx.com/robots.txt        # 200, content-type text/plain, NO x-runtime/x-request-id
    $ curl -sI https://masto.yttrx.com/robots.txt   # 200, content-type text/plain, NO x-runtime/x-request-id
    $ curl -s  https://yttrx.com/robots.txt          # block-all + ia_archiver + archive.org_bot
    $ curl -s  https://masto.yttrx.com/robots.txt    # identical body

Absence of x-runtime/x-request-id confirms nginx (not Puma) now serves
it.

------------------------------------------------------------------------

2026-06-18 — CDN: migrate media back to DigitalOcean Spaces (R2 → DO)

Reversed the May-2026 DO→R2 CDN migration. A cost analysis showed R2's
per-operation (Class A) pricing is the wrong fit for a
Cloudflare-fronted Mastodon media cache (~15/moinwrite − opsalone, 20/mo
total vs ~$5–7/mo on DO Spaces where operations are free). Full
rationale + runbook in cdn-r2-to-do-migration.md. files.yttrx.com stayed
up throughout; only the Mastodon writer restarted (a couple of minutes).

What changed

-   New DO Space yttrx (sfo3). rclone [spaces] remote added on mammut
    (provider = DigitalOcean, endpoint = sfo3.digitaloceanspaces.com).
    Credentials recorded in the private secondbrain note (not this
    repo).
-   Copied LOCAL media only, R2 → DO via a new DB-driven script
    /root/cdn-db-media-sync.sh (repo: scripts/cdn-db-media-sync.sh). It
    asks Mastodon for the exact S3 key of every local-origin attachment
    (attachment.path(style) — no id_partition guesswork) and
    rclone copy --files-from-raw --s3-acl public-read. Scope: local
    uploads, local avatars/headers, local custom emoji, and SiteUpload
    branding — 4,187 objects / 2.5 GiB. Cached remote federation media
    (~210k live objects) is deliberately left on R2: it is re-fetchable,
    ages out within the 14-day retention window, and serves via the
    nginx fallback meanwhile.
-   Dual-backend nginx (scripts/cdn-migration.nginx →
    /etc/nginx/sites-available/cdn-migration, enabled): DO primary, R2
    fallback on 403/404, X-Upstream: do|r2, no nginx-layer cache (the
    phantom-404-HIT footgun). Pivot was rm sites-enabled/cdn-r2 +
    ln -s cdn-migration + reload — both files define
    log_format files_cache, so only one may be enabled at a time.
-   Writer cutover: swapped six S3 vars in
    ~mastodon/live/.env.production (S3_BUCKET yttrx-media→yttrx,
    S3_REGION auto→sfo3, S3_HOSTNAME, S3_ENDPOINT, AWS_ACCESS_KEY_ID,
    AWS_SECRET_ACCESS_KEY); S3_ALIAS_HOST stays files.yttrx.com,
    S3_PERMISSION stays unset (= public-read). Recreated
    web/sidekiq/streaming.
-   New ops tool /root/cdn-backend-histogram.pl
    (scripts/cdn-backend-histogram.pl) — hourly side-by-side DO-vs-R2
    histogram of cdn.access.log, for watching R2 traffic decay before
    decommission.

Backup / rollback

-   ~mastodon/live/.env.production.bak-r2-202606182312 and
    docker-compose.yml.bak-precutover-202606182312.
-   Rollback: restore the .bak-r2-* env →
    docker compose up -d web sidekiq streaming; pivot nginx back
    (rm sites-enabled/cdn-migration, ln -s ../sites-available/cdn-r2,
    reload). The R2 bucket and [r2] remote are untouched.

Gotcha hit during cutover

cdn-db-media-sync.sh enumerates keys through the live-web-1 container,
so the final delta must run before stopping web (and again after web is
healthy) — running it while web is stopped aborts with "container not
found". Runbook ordering corrected accordingly.

Verified

-   Write-test through the app's S3 env: put_object + readback
    (do-cutover-ok) succeeded → DO creds / region=sfo3 / SigV4 /
    public-read ACL all good; test object reachable over public HTTPS,
    then deleted.
-   nginx serving (direct to mammut, bypassing CF): local upload →
    200 x-upstream=do; cached remote → 200 x-upstream=r2; missing key →
    404.
-   DO objects anonymously readable over
    https://yttrx.sfo3.digitaloceanspaces.com.
-   Re-run delta after cutover: 4,187/4,187 checks, 0 new transfers; no
    S3/AWS errors in the web log; all containers healthy.

Still open

-   R2 stays as fallback while its traffic decays. Decommission once
    upstream=r2 hits trend to zero (~retention window): swap to a
    DO-only cdn-do config, strip the [r2] rclone remote, delete the R2
    bucket, and update cdn-site.md.

2026-06-18 — docs: add anti-abuse / anti-spam reference (anti-abuse.md)

Added a single reference doc consolidating the yttrx anti-spam stack and
a forward-looking roadmap. Documentation only — no change to the running
system.

What changed

-   New anti-abuse.md documenting the layered defense: approval-required
    registration gate, the welcome+abuse hooks server (yttrx-welcomebot
    on admin, hooks.yttrx.com), finger abuse controls
    (FINGER_BAN_ALLOWLIST + finger-web limiter), and the
    appeals/transparency path (account-limited page + this changelog).
    Grounded in the welcomebot source/README/CLAUDE.md — no new
    behaviour described.
-   Roadmap section for unbuilt ideas, headlined by a
    dormant-never-posted sweep: a scheduled (not webhook) job to
    reversibly silence local accounts that never authored a post after a
    grace window, reusing the abuse-bot token, staff/allowlist guards,
    appeals DM, and a DRY_RUN first phase. Tracked as a +yttrx @roadmap
    todo.
-   Linked the new doc from README.md's documentation index.

Verified

-   anti-abuse.md + README.md committed (2a69b6c) and pushed to origin
    and github. CI republishes changelog.md only; the doc itself isn't a
    published site.

2026-06-17 — Mastodon upgrade v4.5.11 → v4.6.0 (major release)

Upgraded the production Mastodon stack from v4.5.11 to v4.6.0 (released
the same day). This is a major release with breaking changes; the
upgrade followed mastodon-upgrade.md with three deviations noted below.

What changed

-   All three Mastodon containers now run v4.6.0: live-web-1,
    live-sidekiq-1 (ghcr.io/mastodon/mastodon:v4.6.0) and
    live-streaming-1 (ghcr.io/mastodon/mastodon-streaming:v4.6.0). ES
    7.17.4 and DragonflyDB unchanged. Postgres 15.18 satisfies the new
    ≥14 requirement.
-   Dropped the bespoke local image build. web had an active build: .
    that compiled the image locally solely to carry the TangerineUI
    custom themes (app/javascript/styles/tangerineui*). Those themes
    were already inert (never registered in config/themes.yml, which
    lists only stock themes) and v4.6.0's theme-system overhaul would
    have required reworking them. Per request, the TangerineUI SCSS
    files/dirs were removed and build: . on web was commented out, so
    all three services now run the stock prebuilt official images —
    matching upstream and sidestepping the theme breaking change
    entirely.
-   Two-phase migrations (new in this release): pre-deployment
    migrations were run with SKIP_POST_DEPLOYMENT_MIGRATIONS=true before
    starting the new containers, then the post-deployment pass after.
    (No post-deploy migrations were actually pending; all v4.6.0
    migrations applied in the pre-deploy phase.)
-   v4.6.0 also removes ImageMagick (libvips now mandatory — baked into
    the official image) and adds an opt-in email-subscriptions feature
    (dormant unless an admin enables it;
    DISABLE_EMAIL_SUBSCRIPTIONS=true can hard-off it, not set).

Exact procedure run (on mammut, as root unless noted)

    systemctl stop nginx
    cd ~mastodon/live && docker compose down            # Postgres (systemd) left running
    cp -a ~mastodon/live ~mastodon/live-v4.5.11         # rollback snapshot
    cp ~mastodon/live/docker-compose.yml ~mastodon/docker-compose.yml.bak

    sudo -u mastodon bash -c 'cd ~mastodon/live && git stash && git fetch origin --tags && git checkout v4.6.0'

    cp ~mastodon/docker-compose.yml.bak ~mastodon/live/docker-compose.yml   # restore bespoke compose
    sed -i 's/v4\.5\.11/v4.6.0/g' ~mastodon/live/docker-compose.yml         # bump 3 image tags
    # comment out the active `build: .` on the web service
    # rm -rf app/javascript/styles/tangerineui*   (as mastodon user)

    cd ~mastodon/live
    docker compose pull
    docker compose up -d es dragonfly
    docker compose run --rm -e SKIP_POST_DEPLOYMENT_MIGRATIONS=true web bundle exec rails db:migrate
    docker compose up -d
    docker compose run --rm web bundle exec rails db:migrate   # post-deploy phase
    systemctl start nginx

Rollback

Snapshot ~mastodon/live-v4.5.11 is the rollback path:
docker compose down,
mv live live-broken-v4.6.0 && cp -a live-v4.5.11 live,
docker compose up -d, systemctl start nginx. (Schema migrations are not
auto-reverted, but v4.6.0's migrations are additive.)

Verified

    # all three containers healthy on v4.6.0:
    live-web-1   Up (healthy)  ghcr.io/mastodon/mastodon:v4.6.0
    live-sidekiq-1 / live-streaming-1  healthy
    # Puma 8.0.2 / Ruby 4.0.5 in the image; streaming health 200; sidekiq jobs draining
    curl localhost:3000/health                  -> OK
    curl https://yttrx.com/                      -> 200
    curl https://yttrx.com/api/v1/instance       -> "version":"4.6.0"
    curl https://yttrx.com/api/v2/instance       -> 200
    curl https://yttrx.com/api/v1/timelines/public?limit=1  -> 200
    # media via files.yttrx.com (R2): a real avatar asset -> 200
    # no errors/exceptions in live-web-1 or live-sidekiq-1 logs under live traffic

------------------------------------------------------------------------

2026-06-17 — welcome-bot: welcomebot-logs CLI on admin.yttrx.com

Installed a small log helper at /usr/local/bin/welcomebot-logs on
admin.yttrx.com (source: bin/welcomebot-logs in the yttrx-welcomebot
repo). Read-only wrapper around docker logs yttrx-welcomebot with
-f/--follow, -n/--tail, --since, --abuse/--welcome filters, and
-g/--grep.

    welcomebot-logs                 # last 200 lines
    welcomebot-logs -f --abuse      # follow abuse activity only
    welcomebot-logs -n 1000 --since 24h

Run as root (how ssh admin.yttrx.com logs in); needs Docker access.
Doesn't alter the running stack — purely an ops convenience.

------------------------------------------------------------------------

2026-06-17 — welcome-bot: never auto-silence staff (role guard + allowlist)

Added two safeguards so the abuse handler can never auto-silence staff:

1.  Role guard (ABUSE_SKIP_PRIVILEGED=true, default on). The
    report.created payload's target_account.role is inspected; any
    account holding an assigned staff role (Admin/Owner/Moderator — role
    id > 0, vs the default "everyone" role id -99) is skipped. New staff
    are protected automatically with no list to maintain.
2.  Explicit allowlist backstop (ABUSE_ALLOWLIST). Set on admin to
    waffles,tommertron,davis,bot (current Owners/Admins + the modbot
    itself), covering any case where a role isn't in the payload and
    non-staff special accounts.

Deployed to the yttrx-welcomebot container on admin.yttrx.com (rsync
code + .env edit + docker-compose up -d --build); offline tests cover
the staff skip. Still live (ABUSE_DRY_RUN=false).

Verified

    # running container: ABUSE_SKIP_PRIVILEGED=True DRY_RUN=False
    #                     ABUSE_ALLOWLIST={bot,davis,tommertron,waffles}
    # handle_report() now takes a `privileged` arg; a report against an Owner/Admin
    # logs "holds a staff role, skipping" and takes no action.

------------------------------------------------------------------------

2026-06-17 — welcome-bot: abuse auto-silence handler deployed (live) + welcome on approval

Deployed the extended yttrx-welcomebot (container yttrx-welcomebot on
admin.yttrx.com, 127.0.0.1:8087, nginx hooks.yttrx.com). Live and acting
as of 2026-06-17 (ABUSE_DRY_RUN=false).

What changed

-   report.created handler (new). On a report against a local,
    not-already-limited account, it classifies the target by its posts —
    young (oldest post <30d), dormant (newest post >30d), no-posts, or
    active — counts distinct reporters with open reports (self-reports
    excluded), and silences the account once distinct reporters reach
    the tier threshold (2 for young/dormant/no-posts, 3 for active). The
    silence is applied without a report_id, so the report stays open in
    the mod queue for human review. It then DMs the moderator
    (MOD_ALERT_ACCT=waffles) a summary + report link, and DMs the
    silenced user a link to the appeals page
    (https://welcome.yttrx.com/posts/account-limited/).
-   Welcome now fires on account.created AND account.approved (was
    created-only), so it works whether or not registration approval is
    on; the sqlite dedup store still welcomes each account exactly once.
-   Uses a separate moderator bot token (ABUSE_BOT_TOKEN) with scopes
    admin:write:accounts, admin:read:reports, read:statuses,
    write:statuses; the welcome bot token is unchanged and gains no
    powers.
-   Mastodon endpoints used: POST /api/v1/admin/accounts/:id/action
    (type=silence),
    GET /api/v1/admin/reports?target_account_id&resolved=false,
    GET /api/v1/accounts/:id/statuses (backward-paged with early-exit).

How it was deployed

rsync of code only (.env excluded) from the workstation to
/root/yttrx-welcomebot/, appended the ABUSE_* block to the existing .env
(welcome secrets untouched), then docker-compose up -d --build (compose
v1.29.2). The dedup volume welcomebot-data persists across the rebuild.
Backups on the box: app/main.py.bak-20260617, .env.bak-20260617.

Token scope fix (the one snag)

The bot account is already an Admin (so it has Manage Users + Manage
Reports), but the original "Abuse handler" app token was created without
the admin:write:accounts scope — it had admin:read:reports +
admin:write:reports only. So reports read fine but the silence endpoint
(POST /api/v1/admin/accounts/:id/action) returned 403. Fixed by minting
a new token (app "Abuse handler v2") for bot with the minimal scopes the
bot uses —
read:statuses write:statuses admin:read:reports admin:write:accounts —
swapping it into ABUSE_BOT_TOKEN, and revoking the old token (Doorkeeper
token id 11112).

Verified

    $ curl -s localhost:8087/healthz                       # {"ok":true}
    # write probe (no-op type=none on a non-existent id):
    $ curl ... POST /api/v1/admin/accounts/0/action -d type=none   # 403 (old) -> 404 (now authorized)
    $ curl ... GET  /api/v1/admin/reports?limit=1                  # 200
    # container env: ABUSE_ENABLED=true ABUSE_DRY_RUN=false ABUSE_ACTION=silence
    #                MOD_ALERT_ACCT=waffles ABUSE_BOT_TOKEN=SET (sources 2/3)
    # running image exposes handle_report / dm_silenced_user / classify_account

Startup clean; existing welcome deliveries continue.

Rollback

    cd /root/yttrx-welcomebot
    cp app/main.py.bak-20260617 app/main.py
    cp .env.bak-20260617 .env
    docker-compose up -d --build

------------------------------------------------------------------------

2026-06-17 — welcome.yttrx.com: "Why was my account limited?" appeals page

Added a public help/appeals page to the Hugo (Compost) site at
welcome.yttrx.com, explaining the in-progress automated abuse handling
and how affected users can appeal. Live as of 2026-06-17.

Why

The welcome-bot is being extended with an abuse handler (report.created
webhook) that auto-silences (limits) young/dormant accounts once
multiple distinct members report them, leaving the report open for human
review. A silenced user needs a plain-language explanation and an
appeals path; the bot will DM them a link to this page when it limits an
account.

What changed

-   New content file content/posts/account-limited.md →
    https://welcome.yttrx.com/posts/account-limited/.
-   Explains what "limited/silenced" means (reversible, not a ban), that
    it takes multiple independent reporters + a human review, and how to
    appeal by emailing admin@yttrx.com. Exact thresholds deliberately
    omitted so spammers can't tune around them.
-   Built with ./build.sh (pinned hugomods/hugo:0.157.0 container) and
    committed in the site's local git repo on admin.yttrx.com. nginx
    serves public/ off disk — no reload needed.

Not yet live

The abuse-bot code itself (in ~/yttrx-welcomebot/) is built +
unit-tested but not yet deployed to the yttrx-welcomebot container on
admin.yttrx.com; ships with ABUSE_DRY_RUN=true. A separate changelog
entry will cover that deploy.

Verified

    $ curl -s -o /dev/null -w "%{http_code}" https://welcome.yttrx.com/posts/account-limited/
    200

Page title and admin@yttrx.com mailto present in the served HTML.

------------------------------------------------------------------------

2026-06-17 — finger-web: per-IP rate limiting + cacheable search; finger daemon allowlist

Added abuse protection to the finger-web Flask app (finger.yttrx.com,
container finger-web-app on admin, proxied to 127.0.0.1:5000), plus a
companion FINGER_BAN_ALLOWLIST knob on the mammut finger daemon. Live
and verified on admin as of 2026-06-17.

Why

nginx already caches finger.yttrx.com (proxy_cache_valid 200 30s), so
repeated federated previews of the same user are cheap — confirmed from
the mammut daemon log, which sees only low-rate cache refills for valid
users (waffles) via admin's IP. The failed daemon lookups
(infragrab-finger-probe, MGLNDD_…_79, SSH-2.0-Go, raw garbage) all
originate from direct port-79 scanners, not via finger-web, and are
already handled by the daemon's own per-IP banning (2026-06-15 entry).

The gap nginx caching does not cover is username enumeration: each
distinct /finger/<user> is a unique cache key → guaranteed miss → a
fresh finger call to the mammut daemon, all attributed to admin's single
IP (so the daemon can't ban the real source). The app only ever receives
cache misses, so a per-IP limit at the app layer throttles exactly that
uncached path without touching the cached hot path.

What changed (waffle2k/finger-web, two commits)

1.  Per-IP rate limiting (3115ac6) — added Flask-Limiter, keyed on the
    real client IP:
    -   30/min;600/hr on the finger lookup endpoints (/finger,
        /finger/<user>, /api/finger)
    -   10/min;60/hr on /api/upload (auth brute-force / plan-spam)
    -   120/min;2000/hr global default; index (and the container
        healthcheck) exempt
    -   All limits env-tunable via RATELIMIT_*; storage is in-memory
        (memory://) — counters reset on container restart.
    -   ProxyFix(x_for=1) so the app trusts nginx's X-Forwarded-For.
        Without it the app only saw the Docker bridge gateway
        (172.20.0.1) and every client on the internet shared a single
        rate-limit bucket. nginx was already forwarding
        X-Real-IP/X-Forwarded-For — the fix was app-side.
    -   429 handler (JSON for /api/*, styled 429.html otherwise) +
        WARNING logging of failed/invalid lookups and limit hits, so
        enumeration is observable.
2.  Cacheable search (c8786a2) — the search form used to POST /finger
    (an uncacheable URL with no search term in it), so every search
    re-ran the finger command. It now redirects any query/POST search to
    the canonical /finger/<username> path (GET form + JS fast-path
    straight to that URL), so repeated searches ride the nginx response
    cache. A bare /finger still lists local system users.

Deploy

CI (.github/workflows/docker-build-push.yml) rebuilt
ghcr.io/waffle2k/finger-web:latest on push to main. Redeployed on admin
(docker-compose v1.29.2):

    cd ~/finger-web && docker-compose pull && docker-compose up -d

No env changes needed on admin — the new RATELIMIT_* settings default to
enabled with the limits above.

Verified (on admin, via finger.yttrx.com)

-   Container healthy after recreate.
-   /finger/waffles@yttrx.com → 200, x-cache-status: HIT (cached path
    intact).
-   GET /finger?username=waffles@yttrx.com → 302 →
    /finger/waffles@yttrx.com.
-   Cache MISS response carries x-ratelimit-limit: 30,
    x-ratelimit-remaining: 29.
-   35 distinct cache-miss lookups from one IP → 30×200 then 429s; the
    429 renders the styled "Too Many Requests" page.
-   App logs now show the real client IP (e.g. 147.182.255.203), no
    longer 172.20.0.1 — confirms ProxyFix is working.

Rollback

    # revert to the prior image / commits, then redeploy
    cd ~/finger-web && docker-compose pull && docker-compose up -d
    # or set RATELIMIT_ENABLED=false in the compose env to disable limiting only

Incident during rollout — daemon banned admin's IP

Verifying the new limit, a 35-request burst of distinct bogus usernames
was fired at finger.yttrx.com from admin's egress IP (147.182.255.203).
The mammut finger daemon's per-IP abuse banning (threshold >3 failed
lookups in a 24h window, see the 2026-06-15 entry) counted all 35 as
failures and blocked admin's IP — and because finger-web funnels every
federated lookup through that single IP, the block took out
finger.yttrx.com site-wide (every lookup returned "No information
available."). The daemon log showed a steady
finger drop from 147.182.255.203: blocked.

Fix: restarted the daemon to clear its in-memory ban state:

    ssh mammut 'docker restart finger-finger-1'

(docker-compose is not installed on mammut — it uses the docker compose
plugin, or restart the container directly as above.) Confirmed waffles'
plan served again immediately after.

This exposed a design conflict: per-IP banning on the daemon and an
aggregating proxy are incompatible — the daemon can't see past the proxy
IP, so any burst (attacker or load test) bans the proxy for everyone.
Per-client abuse protection for the proxied path now lives in
finger-web's rate limiter, so the daemon should trust the proxy IP.

Daemon change — FINGER_BAN_ALLOWLIST (waffle2k/finger)

Added an env-configurable allowlist to the finger daemon:
comma-separated client IPs that are never tracked or banned
(parse_ip_allowlist() in ban.cpp + unit tests; the listener marks
allowlisted IPs non-trackable). Set admin's IP on mammut so finger-web
is never banned again:

    # /root/finger/docker-compose.yml (mammut)
    services:
      finger:
        environment:
          - FINGER_BAN_ALLOWLIST=147.182.255.203

Redeploy on mammut (note: docker compose, not docker-compose):

    ssh mammut 'cd /root/finger && docker compose pull && docker compose up -d'

Direct port-79 scanners (the infragrab-finger-probe / MGLNDD_…_79 junk
in the daemon log) are unaffected — they hit the daemon directly, are
not allowlisted, and still get banned after >3 failures.

Verified

-   finger daemon CI (waffle2k/finger): build + gtest suite green
    (includes the new IpAllowlist tests).
-   After setting the env + redeploy on mammut: a burst from admin's IP
    no longer appears as blocked;
    finger.yttrx.com/finger/waffles@yttrx.com serves the plan.

------------------------------------------------------------------------

2026-06-17 — welcome-bot: webhook server deployed on admin (hooks.yttrx.com)

New service to DM a welcome message to every new local signup, driven by
a Mastodon admin webhook. Live and verified end-to-end as of 2026-06-17
with a real Mastodon/4.5.11 delivery.

What changed (admin.yttrx.com)

-   New FastAPI app yttrx-welcomebot (source synced to
    /root/yttrx-welcomebot/, canonical copy at ~/yttrx-welcomebot/ on
    the workstation). On the account.created event it verifies the
    X-Hub-Signature HMAC, deduplicates against a sqlite store, and posts
    a visibility: direct status mentioning the new user via a bot
    account token.
-   Runs as a Docker container (docker-compose v1.29.2, image
    yttrx-welcomebot_welcomebot), restart: unless-stopped, bound to
    127.0.0.1:8087 only. Dedup DB persisted in the welcomebot-data
    volume at /data/welcomed.db.
-   nginx site /etc/nginx/sites-available/hooks (symlinked into
    sites-enabled/) proxies https://hooks.yttrx.com/webhook and /healthz
    to 127.0.0.1:8087; everything else returns 404.
-   TLS cert for hooks.yttrx.com issued via
    certbot certonly --standalone (nginx stopped ~5s during issuance),
    expires 2026-09-15. hooks.yttrx.com already resolved (wildcard
    *.yttrx.com → admin's 147.182.255.203), direct to origin (not
    Cloudflare-proxied), so the HTTP-01 challenge worked.
-   .env written with chmod 600 and blank WEBHOOK_SECRET /
    BOT_ACCESS_TOKEN placeholders — deliveries safely 401 until they're
    set.

Verified

-   https://hooks.yttrx.com/healthz → {"ok":true}
-   unsigned POST /webhook → 401; GET / → 404
-   container Up, local curl localhost:8087/healthz → 200
-   nginx -t passed before reload; offline test suite (test_local.py)
    passes

Mastodon side (done + verified)

-   Bot account token (write:statuses) and webhook secret filled into
    /root/yttrx-welcomebot/.env; container picked them up.
-   Mastodon webhook id=2 (mammut DB): enabled=true, secret fingerprint
    matches our .env, events
    ["account.approved","account.created", "report.created","report.updated"].
-   Bug found + fixed: the webhook URL was set to
    https://hooks.yttrx.com (no path), so real deliveries POSTed to /
    and got 404 (nginx only serves =/webhook and =/healthz). Corrected
    to https://hooks.yttrx.com/webhook via
    Webhook.find(2).update!(url: …) on live-web-1.
-   Signing confirmed against actual v4.5.11 source
    (app/workers/webhooks/delivery_worker.rb): header is
    X-Hub-Signature: sha256=<OpenSSL::HMAC.hexdigest(sha256, webhook.secret, body)>
    — exactly what the server verifies.

Verified live (real delivery)

-   Triggered
    TriggerWebhookWorker.new.perform('account.created','Account',<waffles id>)
    on live-web-1; nginx logged
    "POST /webhook HTTP/1.1" 200 … "Mastodon/4.5.11 (http.rb/5.3.1; …)".
-   welcomebot verified the real signature, parsed the real payload
    (account id 109295727713545515), and posted the welcome DM (Mastodon
    API 200). Manual-test dedup rows then deleted so the store reflects
    only real signups.

Backfill (2026-06-17)

One-time catch-up for signups that joined while the bot was down. Of 20
local accounts created in the prior 7 days, welcomed the 13 that were
confirmed && approved && !suspended && !bot by replaying each through
the real path
(TriggerWebhookWorker.new.perform('account.created','Account',id) on
live-web-1). Skipped 5 unconfirmed + 2 confirmed-but-unapproved.
Verified: 13 dedup rows, 0 errors, 13 Mastodon POST /webhook 200s.

Follow-ups (not yet done)

-   Stale webhook id=1 still enabled, pointing at dead
    https://yttrx-welcome.herokuapp.com/ (old welcome bot, event
    account.approved) — candidate to disable/delete.
-   Webhook id=2 carries extra events (account.approved, report.created,
    report.updated) the bot just 200-ignores; consider trimming to
    account.created only so moderation-report payloads aren't sent
    off-box.

Renewal note

certbot.timer is enabled on admin but the hooks/changelog/welcome certs
use the standalone authenticator, so the timer's certbot renew can't
bind :80 while nginx holds it (harmless failure). Real renewal is the
monthly root cron
0 2 1 * * systemctl stop nginx && certbot renew && systemctl start nginx,
which renews all standalone certs including this one. (The misc-sites.md
note claiming a 0 2 * * * certbot renew --nginx cron is stale — actual
cron is the monthly stop/renew/start above.)

Rollback

    cd /root/yttrx-welcomebot && docker-compose down
    sudo rm /etc/nginx/sites-enabled/hooks && sudo nginx -t && sudo systemctl reload nginx
    # delete/disable the account.created webhook in the Mastodon admin UI

2026-06-16 (later) — welcome.yttrx.com rebuilt on Compost theme (Tokyo Night)

Reskinned the welcome site (same day as the redirect→Hugo migration
below) onto the Compost theme with a custom Tokyo Night palette, dark by
default, and rewrote the welcome post with current (2026) new-user
guidance (app picks, finding people/hashtags, the federation/email
analogy, and the Mastodon 4.5 quote-post / remote-reply features). The
old hand-written layouts/_default/ and static/css/style.css were
removed.

What changed (admin.yttrx.com, /var/www/html/welcome/)

-   Compost needs Hugo-extended + Go + Node, none installed on the host.
    The build runs entirely in the hugomods/hugo container via a new
    build.sh wrapper, so the shared /usr/local/bin/hugo 0.142.0 (used by
    the other admin sites) is untouched. ./build.sh setup fetches the
    theme module + npm deps; ./build.sh is the normal rebuild into
    public/.
-   Pinned to hugomods/hugo:0.157.0. Compost still uses .Site.Author,
    removed in Hugo 0.158; 0.157.0 is the newest Hugo that supports it
    and bundles Go + Node.
-   Vendored + patched partials under
    layouts/partials/{head,schema,footer,home/profile}.html:
    .Site.Author → .Site.Params.author, plus JSON-LD quoting fixes in
    schema.html (description/inLanguage/keywords were invalid JSON).
-   Tokyo Night palette in project-root tailwind.config.js;
    params.colorScheme = "dark".
-   Committed go.mod/go.sum/package*.json; gitignored node_modules/,
    .cache/, resources/, hugo_stats.json, packages/.

Verified

-   ./build.sh builds clean (11 pages, 3 aliases); single
    public/css/main.min.*.css, no stale style.css.
-   Tokyo Night values present in compiled CSS (rgb(26 27 38) bg,
    rgb(192 202 245) fg, rgb(122 162 247) blue).
-   Live over HTTPS: /, /posts/welcome-to-yttrx/, /about/, and the old
    alias /help/2023/03/12/welcome-to-yttrx.html all 200; dark-mode
    bootstrap defaults to dark.

Rollback

Git tag pre-compost-20b0ded marks the pre-Compost state in
/var/www/html/welcome.
git checkout pre-compost-20b0ded -- . && ./build.sh (or revert the
commit) restores the hand-written-layout site.

See misc-sites.md → welcome.yttrx.com for full detail.

2026-06-16 — welcome.yttrx.com: redirect replaced with standalone Hugo site

welcome.yttrx.com used to just 302-redirect to a specific Jekyll post on
help.yttrx.com (/help/2023/03/12/welcome-to-yttrx.html, content owned by
waffles). It now serves its own static site directly, built with Hugo,
so the domain has an actual landing page instead of being a pure
redirect.

What changed

-   New Hugo site built at /var/www/html/welcome/ on admin.yttrx.com,
    using the Hugo 0.142.0 already installed there (same binary as
    coefficiencies, somewhat, sta-tommertron). Hand-written minimal
    layouts under layouts/_default/ — no theme, since current PaperMod
    needs Hugo ≥0.146. Directory git-initialized locally (not yet pushed
    to a remote).
-   Content ported 1:1 from the original Jekyll welcome post + about
    page, with a Hugo aliases: entry so the old Jekyll-style URL
    (/help/2023/03/12/welcome-to-yttrx.html) still resolves on this
    domain.
-   /etc/nginx/sites-available/welcome rewritten: root changed from
    /var/www/html/help to /var/www/html/welcome/public; the
    location / { return 302 ...; } redirect replaced with the same
    try_files/static-asset-caching pattern used by help and
    coefficiencies. Old config backed up to welcome.bak-20260616. No new
    TLS cert needed (reused the existing
    letsencrypt/live/welcome.yttrx.com/).
-   help.yttrx.com was not touched — still serves waffles' original
    Jekyll build from /var/www/html/help, fully independent of this
    change.

Verified

-   curl https://welcome.yttrx.com/ → 200, serves the new Hugo homepage
-   /about/ → 200
-   /help/2023/03/12/welcome-to-yttrx.html (legacy alias) → 200
-   nginx -t passed before systemctl reload nginx

Rollback

    sudo cp /etc/nginx/sites-available/welcome.bak-20260616 /etc/nginx/sites-available/welcome
    sudo nginx -t && sudo systemctl reload nginx

2026-06-15 — finger.yttrx.com 500 fixed; finger-daemon spam stopped

The finger-finger-1 daemon on mammut was being hammered by
admin.yttrx.com (147.182.255.203) — ~30+ lookups/min, all successful
waffles reads. Source was the finger-web Flask app (finger.yttrx.com),
which was 500ing site-wide.

Root cause

-   finger-web (ghcr.io/waffle2k/finger-web, container finger-web-app on
    admin, proxied by nginx finger.yttrx.com → 127.0.0.1:5000) returned
    HTTP 500 on every page: templates/base.html linked
    url_for('api_hello'), but that endpoint had been renamed api_info
    (/api/info) → werkzeug BuildError. Every template extends base.html,
    so the whole site 500'd (incl. the healthcheck → container
    unhealthy).
-   The /finger/<user> route shells out to finger (hitting the mammut
    daemon) before rendering, then 500'd. nginx caches 200 only
    (proxy_cache_valid 200 30s), so the 500s were never cached → every
    federated Mastodon preview-fetch of /finger/waffles@yttrx.com re-ran
    the finger call. Remote instances also kept retrying since they
    never got a usable preview card.

What changed

-   waffle2k/finger-web base.html: url_for('api_hello') →
    url_for('api_info') (commit b99636d). CI rebuilt
    ghcr.io/waffle2k/finger-web:latest.
-   Redeployed on admin:
    cd ~/finger-web && docker-compose pull && docker-compose up -d
    (admin uses docker-compose v1.29.2, not docker compose).

Verified

-   App returns 200 on / and /finger/waffles@yttrx.com; container
    healthy.
-   nginx X-Cache-Status: EXPIRED → HIT → HIT.
-   mammut finger hits dropped from ~50/90s to **~2/30s** (just the 30s
    cache refill).

Rollback

-   docker-compose pull the prior image / revert b99636d,
    docker-compose up -d.

Note

-   finger-web holds basic-auth creds and an SSH key
    (/root/finger-web/finger.key) it uses to scp edited plan files to
    mammut:/root/finger/users/ — it is the web plan-editor as well as
    the viewer.

2026-06-15 — mammut finger daemon: abuse banning + host networking

Updated the finger-finger-1 container (ghcr.io/waffle2k/finger) and
switched it to host networking so its new per-IP abuse protection can
actually work.

What changed

-   Image updated to a build with in-daemon abuse banning (an "offense"
    = any request that fails to read a plan; >3 from one IP in a rolling
    24h window → that IP's connections are dropped; only
    globally-routable IPs are tracked) and quieter logging (normal
    client disconnects no longer logged as asio.misc exceptions).
-   ~root/finger/docker-compose.yml switched from bridge port-mapping to
    network_mode: host (backup docker-compose.yml.bak-20260615). Under
    bridge, Docker SNAT'd every client to the bridge gateway
    (172.20.0.1), so the daemon saw one IP for the whole internet and
    per-IP banning was impossible; host networking exposes the real
    client IP.
-   user: "0:0" added: under host networking the image's non-root user
    (UID 1000) cannot bind privileged port 79 (host netns uses the
    host's ip_unprivileged_port_start = 1024), so the daemon silently
    failed to listen. Running as root binds it. (Cleaner fix later:
    setcap cap_net_bind_service on the binary to stay non-root.)

Verified

-   Listening on 0.0.0.0:79; logs now show real public source IPs (not
    172.20.0.1); waffles lookups serve; container running, restarts=0.

Rollback

-   Restore docker-compose.yml.bak-20260615 (bridge, ports: 79:79) and
    docker-compose up -d.

2026-06-15 — mammut notification mail moved off DO → bsd submitgw

Mastodon (mammut) no longer relays notification mail through the
DigitalOcean box; it now authenticates to the bsd submitgw like any
other client.

What changed

-   submitgw sender_login += admin@yttrx.com  waffles@yttrx.com so
    waffles' login may send as admin@yttrx.com (Mastodon's
    SMTP_FROM_ADDRESS).
-   mammut ~mastodon/live/.env.production (backed up first):
    SMTP_SERVER=bsd.peteftw.com, SMTP_PORT=587, SMTP_AUTH_METHOD=plain,
    SMTP_LOGIN=waffles@yttrx.com, SMTP_PASSWORD=<waffles>,
    SMTP_OPENSSL_VERIFY_MODE=peer, SMTP_ENABLE_STARTTLS=auto,
    SMTP_FROM_ADDRESS=admin@yttrx.com unchanged.
    docker compose up -d sidekiq web (compose file untouched).
-   mammut's IP 144.76.4.67 can be removed from the DO box mynetworks
    once DO is destroyed.

Verified

-   bin/rails runner ... deliver_now from the web container → submitgw
    logged
    client=mammut.masto.yttrx.com[144.76.4.67] sasl_username=waffles@yttrx.com
    → relayed to the smtp jail (DKIM 2022). Mail to external recipients
    delivers.

Self-domain loopback — FIXED (same day)

-   mail to yttrx.com recipients (incl. waffles' own Mastodon
    notifications) was deferring — the smtp jail tried the yttrx MX (=
    bsd's public IP :25) and a bsd jail can't hairpin to it. Fixed: smtp
    jail now has transport_maps yttrx.com → lmtp:[192.168.1.128]:24 +
    virtual_alias_maps @yttrx.com → waffles@yttrx.com, delivering
    yttrx.com locally to yttrximap. Verified
    admin@yttrx.com → admin@yttrx.com lands in the mailbox (orig_to
    rewritten to waffles@yttrx.com); external recipients still go
    direct-to-MX.

2026-06-15 (later) — yttrx mail fully cut over to bsd; DO box retired

Completed the migration: yttrx.com mail now lives entirely on
bsd.peteftw.com. The DigitalOcean droplet (mail.yttrx.com,
146.190.41.218) is frozen / out of the live path — leave it
off-but-intact ~1–2 weeks, then destroy.

What changed

-   MX flip: yttrx.com MX → 10 bsd.peteftw.com (Cloudflare). Inbound now
    hits the bsd mx jail, which accepts yttrx.com (relay_domains), maps
    every address → waffles@yttrx.com (virtual catch-all), and routes it
    via a per-recipient FILTER lmtp:[192.168.1.128]:24 straight to the
    yttrximap Dovecot backend (bypassing the peteftw spam-VM
    content_filter; RBL/HELO checks still apply, no rspamd).
-   IMAP cut over: imapgw's yttrx route repointed DO →
    yttrximap 192.168.1.128:143 (waffles@yttrx.com, dropped the destuser
    shim). Public IMAP unchanged for clients: bsd.peteftw.com:993.
-   Final delta sync DO→yttrximap before repoint (caught 3 gap messages;
    INBOX 1624).
-   Client (mutt): ~/.config/mutt/yttrx.conf now reads via imapgw
    (bsd.peteftw.com, login waffles@yttrx.com) and sends via submitgw
    (bsd.peteftw.com:587), From waffles@yttrx.com.

Verified

-   waffles@yttrx.com login on the public imapgw (bsd.peteftw.com:993) →
    1624 messages.
-   Direct LMTP to yttrximap delivers to the Maildir; mx-jail
    yttrx_access map returns the FILTER. peteftw inbound path unchanged.

Still to do

-   Move mammut notification mail off DO → submitgw (separate entry).
-   Confirm yttrx.com SPF includes 49.12.80.106; after DO destroy,
    remove DO IP from SPF + drop DO as backup MX.

Rollback

-   IMAP: repoint imapgw yttrx route back to
    mail.yttrx.com:993 ssl=any-cert destuser=waffles.
-   Inbound: revert yttrx.com MX → mail.yttrx.com (DO still has the
    mailbox until destroyed).

2026-06-15 — yttrx outbound moved to bsd; permanent IMAP backend built (standby)

Continued the yttrx-off-DigitalOcean migration. Two pieces landed;
inbound + the final IMAP cutover are still pending (DO remains the live
inbox).

yttrximap — permanent IMAP backend (STANDBY)

New bsd jail yttrximap (192.168.1.128): Dovecot serving
waffles@yttrx.com from a Maildir on the bsd host (ZFS rpool/vmail →
/vmail, owner vmail 2000) nullfs-mounted into the jail, so the jail is
disposable. Auth = passwd-file (BLF-CRYPT minted from waffles' existing
password; DO's $y$ yescrypt isn't FreeBSD-portable). Migrated all mail
via doveadm sync over imapc (mbox→Maildir), verified counts match DO
exactly (INBOX 1621, Sent Items 29, Sent Messages 10, Trash 40) and
persist across jail restart. Not yet live — imapgw still proxies yttrx
IMAP to the DO box. Full detail in
secondbrain/mail-account-management.md.

yttrx outbound → bsd smtp jail (DKIM 2022)

waffles@yttrx.com submission now sends entirely on bsd: submitgw
(587/465) authenticates (via imapgw→DO, destuser=waffles) and relays
@yttrx.com → the smtp jail (192.168.1.100), which now signs multi-domain
(peteftw hellyeah + yttrx 2022 via OpenDKIM KeyTable/SigningTable) and
delivers direct-to-MX from 49.12.80.106. The 2022.private key was copied
DO→smtp jail (pubkey already published, no DNS change for DKIM).
Verified waffles@yttrx.com → Gmail delivered. mammut's notification
relay still goes via the DO box until its .env is repointed (tracked).

DNS

-   Required: add 49.12.80.106 to yttrx.com SPF
    (v=spf1 ip4:146.190.41.218/32 ip4:49.12.80.106/32 ~all) — yttrx now
    also sends from bsd's IP. DKIM-aligned DMARC passes regardless, but
    SPF should list it.

Rollback

-   Outbound: point submitgw:/usr/local/etc/postfix/relay_by_sender
    @yttrx.com back to [146.190.41.218]:25 (DO), postmap+reload.
    yttrximap is standby-only (no live traffic), so it can simply be
    left or destroyed.

2026-06-14 — yttrx mail fronted by bsd mail gateways (imapgw/submitgw)

bsd.peteftw.com now publicly fronts yttrx.com mail alongside
peteftw.com, without migrating the mailbox — the DigitalOcean droplet
(mail.yttrx.com, 146.190.41.218) stays the authoritative yttrx backend.
This supersedes the heavier separate-backend plan in
mail-migration-to-bsd.md (now marked SUPERSEDED).

What changed (yttrx-relevant)

-   New bsd jails imapgw (192.168.1.126, public IMAP 993, Dovecot proxy)
    and submitgw (192.168.1.127, public submission 587/465, Postfix).
    imapgw routes waffles@yttrx.com → mail.yttrx.com:993; submitgw
    authenticates via imapgw and relays @yttrx.com senders →
    mail.yttrx.com:25. Full design in
    secondbrain/mail-account-management.md.
-   On the DO droplet (the only change to the yttrx box): added bsd's
    public egress 49.12.80.106/32 to Postfix mynetworks so submitgw can
    relay yttrx submission out through it (DKIM 2022 + send unchanged).
    main.cf backed up to /etc/postfix/main.cf.bak.pre-submitgw;
    postfix reload.

Why

To give waffles@yttrx.com public IMAP + authenticated submission from a
single self-contained host (bsd.peteftw.com, *.peteftw.com wildcard
cert) for mobile/off-net clients, reusing the DO box's existing
storage + DKIM rather than rebuilding it on bsd.

Status / Verified

-   peteftw path fully verified (IMAP login + outbound send; Gmail
    SPF/DKIM/DMARC pass).
-   yttrx path is config-identical but UNTESTED — needs a waffles
    credential to confirm IMAP-993 proxy login and 587/465 authenticated
    send (→ DKIM 2022 → MX).
-   Rollback (yttrx side): remove 49.12.80.106/32 from the DO box
    mynetworks (main.cf.bak.pre-submitgw), postfix reload; clients
    revert to mail.yttrx.com direct.

2026-06-07 — Mastodon upgrade v4.5.10 → v4.5.11 (security)

Routine security-patch upgrade following the procedure in
mastodon-upgrade.md.

Why

v4.5.11 is a security release fixing two advisories —
GHSA-rwcw-vq68-g34p (allowed-attribution-domains spoofing) and
GHSA-qrgq-9fx2-vf2r (uncaught exception in message sanitization → DoS) —
plus dependency bumps. No database migrations, no new/changed env vars,
no breaking changes, so this was a straight container-image bump.

Steps on mammut

1.  systemctl stop nginx (drain connections; PostgreSQL left running
    throughout).
2.  cd ~mastodon/live && docker compose down.
3.  Snapshot: cp -a ~mastodon/live ~mastodon/live-v4.5.10; compose
    backup:
    cp ~mastodon/live/docker-compose.yml ~mastodon/docker-compose.yml.bak.
4.  As the mastodon user:
    git stash && git fetch origin --tags && git checkout v4.5.11 (HEAD
    0748a5ff81 "Bump version to v4.5.11"). The bespoke compose file was
    stashed, not overwritten.
5.  Restored bespoke docker-compose.yml from backup and bumped the three
    image tags to v4.5.11 (mastodon ×2 web+sidekiq, mastodon-streaming
    ×1).
6.  docker compose up -d — pulled new images, recreated the stack.
7.  Confirmed live-web-1, live-sidekiq-1, live-streaming-1 healthy with
    clean logs (web serving /health 200, streaming listening on :4000,
    sidekiq processing jobs).
8.  systemctl start nginx.

Verified

  Test                                                   Result
  ------------------------------------------------------ ---------------------------------------------
  docker ps image tags                                   all :v4.5.11, all (healthy)
  curl https://yttrx.com/health                          200
  curl https://yttrx.com/                                200
  curl https://yttrx.com/api/v1/instance version field   4.5.11
  nginx access log                                       federation /inbox 202s flowing post-restart

Rollback

Snapshot ~mastodon/live-v4.5.10 retained. If needed:
docker compose -f live/docker-compose.yml down,
mv live live-broken-v4.5.11, cp -a live-v4.5.10 live,
cd live && docker compose up -d, systemctl start nginx. No DB migrations
ran, so the database is unchanged and rollback is image-only.

------------------------------------------------------------------------

2026-05-24 — DigitalOcean Spaces bucket deleted; nginx CDN config collapsed to R2-only

A week after the R2 cutover (2026-05-18), the DO Spaces bucket yttrx was
deleted. The dual-backend nginx config (cdn-migration) that fell back to
DO on R2 404s is no longer functional — the fallback upstream is gone —
so it was replaced with an R2-only config (cdn-r2).

Changes on mammut

-   New site file: /etc/nginx/sites-available/cdn-r2. Identical to
    cdn-migration except the @s3_fallback location, the set $s3_do line,
    and the error_page 404 = @s3_fallback; directive are gone. @s3 now
    ends in error_page 403/404 https://masto.yttrx.com/404.html;
    (matching the legacy single-backend behavior).
-   Symlink swap: atomic mv -T from cdn-migration to cdn-r2 symlink in
    /etc/nginx/sites-enabled/. nginx -t passed; systemctl reload nginx
    succeeded.
-   Removed legacy site files:
    /etc/nginx/sites-available/{cdn,cdn-digitalocean,cdn-migration}.
    None were symlinked at the time of removal.
-   rclone: [spaces-old] block stripped from
    /root/.config/rclone/rclone.conf (backed up to
    rclone.conf.bak.20260524); [r2] retained.

Why now

The R2 cutover on 2026-05-18 left the dual-backend in place as a safety
net for any objects that didn't make it across. A week of clean
operation with no fallback-served hits in the access log was enough
confidence to drop DO. Source-of-truth check:
awk '{print $NF}' /var/log/nginx/cdn.access.log* | grep do-spaces | wc -l
was effectively zero for legitimate traffic in the days before deletion
(a few scanner 403s tagged do-spaces from the fallback chain, no real
media).

Verified

  Test                                                                           Result
  ------------------------------------------------------------------------------ ----------------------------------------------------------------------
  nginx -t after symlink swap                                                    successful
  curl -sI https://files.yttrx.com/accounts/avatars/.../834ed05faffef5c1.jpeg    200 (CF cache, but origin path verified via cache-bust query string)
  sudo rclone size r2:yttrx-media (post-edit, sanity-check rclone still works)   returns bucket stats
  ls /etc/nginx/sites-available/cdn*                                             only cdn-r2 remains

Rollback

Not realistically possible — the DO Spaces bucket is deleted. If
something breaks in R2 itself, the recovery path is to fix R2 (or
restore objects from PostgreSQL DB references + re-fetch federated
media), not to fall back to DO.

The nginx config change itself is reversible while the legacy site files
exist in git; restoring cdn-migration would just produce 502s from the
dead DO upstream until R2 is fixed, so there's no scenario where
reverting the config helps.

Followups

-   scripts/cdn-migration.nginx in this repo replaced with
    scripts/cdn-r2.nginx.
-   cdn-site.md rewritten to describe the live R2-only setup.
-   cdn-s3-migration.md retained as a historical record (marked Status:
    complete at the top).
-   Private secondbrain note yttrx-r2-credentials.md had its
    DigitalOcean Spaces section removed.

------------------------------------------------------------------------

2026-05-22 — Scraper block extended: HeadlessChrome + AIWebIndex UA rules added

Followup to the 2026-05-19 CIDR expansion. Log audit showed the scraper
operator had rotated again — this time from the
Tencent/ByteDance/Alibaba cloud ranges (now 403ing) to Hetzner Cloud
(AS24940, Falkenstein DC). Four IPs on 178.104-105.x were responsible
for the bulk of the remaining successful /tags/ hits, all sending
Mozilla/5.0 (X11; Linux x86_64) … HeadlessChrome/148.0.0.0 Safari/537.36.

Blocking Hetzner at the CIDR level was ruled out — AS24940 hosts a large
number of legitimate fediverse instances, and the collateral would be
significant. Instead, blocked on the unambiguous UA string.

Also added AIWebIndex/2.0 (lyrenth.com) which had 10 successful /tags/
hits in the same window and wasn't in the original named-bot list.

The flat Chrome-version rotation (103–133, Windows NT UA template)
continues to slip through — no reliable fingerprint to distinguish those
from real users without Cloudflare bot management.

What changed

Two entries appended to the $bad_ua map in
/etc/nginx/snippets/scraper-block.conf:

    "~*HeadlessChrome"               1;
    "~*AIWebIndex"                   1;

Deployment

    # backed up live config first
    ssh mammut 'cp /etc/nginx/snippets/scraper-block.conf \
                /etc/nginx/snippets/scraper-block.conf.pre-headlesschrome.20260522'

    # edited via sed to append entries to $bad_ua map
    ssh mammut 'nginx -t && systemctl reload nginx'

Verified

  Test                                                                                 Result
  ------------------------------------------------------------------------------------ --------
  curl -H 'User-Agent: … HeadlessChrome/148.0.0.0 …' https://yttrx.com/tags/test       403
  curl -H 'User-Agent: AIWebIndex/2.0 …' https://yttrx.com/tags/test                   403
  curl -H 'User-Agent: Mozilla/5.0 … Chrome/142.0.0.0 …' https://yttrx.com/tags/test   200

Rollback

    ssh mammut 'cp /etc/nginx/snippets/scraper-block.conf.pre-headlesschrome.20260522 \
                /etc/nginx/snippets/scraper-block.conf
    nginx -t && systemctl reload nginx'

------------------------------------------------------------------------

2026-05-20 — nginx patched: CVE-2026-42945 "NGINX Rift" (CVSS 9.2)

Critical heap buffer overflow in ngx_http_rewrite_module
(CVE-2026-42945, CVSS v4 9.2), actively exploited in the wild within
days of disclosure. Allows unauthenticated crash of nginx worker
processes via crafted HTTP request; RCE possible on systems with ASLR
disabled. Affected versions: 0.6.27–1.30.0 (open source), R32–R36
(Plus).

Mammut was running 1.22.1-9+deb12u4 (vulnerable range). Config was
audited first — the rewrite directives in mastodon-anubis and mastodon
use unnamed captures ($1) but none have a ? in the replacement string
immediately before another rewrite, if, or set directive, so the
specific trigger pattern was not present. Upgrade proceeded regardless.

Steps executed

    apt-get update
    apt-get install -y nginx   # upgraded nginx + nginx-common to 1.22.1-9+deb12u7

Debian security advisory: DSA-6278-1 (covers CVE-2026-40701,
CVE-2026-42934, CVE-2026-42945, CVE-2026-42946, CVE-2026-40460). The
upgrade also covers the nginx JavaScript (njs) module CVE-2026-8711
(CVSS 9.2, heap overflow in ngx_http_js_module) disclosed the same week.

Verified

  Test                          Result
  ----------------------------- -----------------------------------------
  nginx -v                      nginx/1.22.1 (binary replaced, deb12u7)
  systemctl is-active nginx     active
  curl -sk https://yttrx.com/   200 HTML response

Other nginx instances (not affected)

-   bsd.peteftw.com website jail — nginx 1.28.3 via FreeBSD pkg (patched
    stable branch, not vulnerable). No rewrite directives in config.
-   homelab Docker — nginx 1.31.0 (patched mainline, pulled 2026-05-13).
    Simple static file server, no rewrite directives.

------------------------------------------------------------------------

2026-05-20 — Mastodon upgraded v4.5.9 → v4.5.10 (security)

Routine patch upgrade following the documented procedure in
mastodon-upgrade.md. Upstream release addresses two security advisories:
an SSRF protection bypass and a Linked-Data Signature bypass. No new env
vars, no schema migrations.

Steps executed

1.  systemctl stop nginx to drain connections
2.  cd ~mastodon/live && docker compose down (PostgreSQL left running as
    documented)
3.  cp -a ~mastodon/live ~mastodon/live-v4.5.9.pre-v4.5.10-20260520 —
    pre-upgrade snapshot (an earlier live-v4.5.9 from the 2026-05-18 R2
    cutover already existed)
4.  cp ~mastodon/live/docker-compose.yml ~mastodon/docker-compose.yml.bak
    — back up bespoke compose
5.  As mastodon:
    git fetch origin --tags && git stash push -m "pre-v4.5.10-upgrade compose changes" && git checkout v4.5.10
6.  Restored docker-compose.yml from backup; bumped 3 image tags
    (mastodon:v4.5.10 ×2, mastodon-streaming:v4.5.10 ×1)
7.  docker compose up -d — pulled new images, started stack
8.  Verified all five containers reached healthy; web/sidekiq/streaming
    logs clean
9.  systemctl start nginx; curl https://yttrx.com/api/v1/instance
    confirmed version: 4.5.10

Rollback (if needed)

Snapshot path: ~mastodon/live-v4.5.9.pre-v4.5.10-20260520. Restore per
mastodon-upgrade.md §Rollback.

------------------------------------------------------------------------

2026-05-19 — Scraper block expanded to full Alibaba + Tencent + ByteDance ASNs

Followup to yesterday's UA + CIDR block. Audit of today's access.log
showed the operator rotated to neighbor cloud ranges within hours —
total /tags/ traffic is down ~97% (yesterday: 63,520 successful hits;
today: 1,858), but 1,824 of those 1,858 hits carry the same flat
Chrome-version-rotation fingerprint (uniform distribution across Chrome
103–133 from a single Windows UA template). The operator just moved to
adjacent CIDRs.

What changed in attribution

The 2026-05-18 entry called 43.172.0.0/15 "Alibaba SG". It's actually
Tencent, registered to ACEVILLE PTE LTD (AS132203) — Tencent Cloud's
Singapore shell entity. Today's new rotation /16s are also ACEVILLE
(43.130/16, 43.134/16, 43.153/16, 43.157/16) plus ByteDance/BytePlus
(101.47/16, 163.7/16). Alibaba is actually a side actor in this
operation, not the lead.

New blocklist scope

Expanded from "a handful of /16s with direct evidence" to "all announced
prefixes of the three implicated cloud providers":

  Provider                    ASNs                                  Prefixes (announced → merged)
  --------------------------- ------------------------------------- -------------------------------
  Alibaba Cloud               AS45102, AS37963, AS134963, AS24429   2,638 → 498
  Tencent ACEVILLE            AS132203                              1,068 → 165
  ByteDance BytePlus          AS150436                              171 → 30
  Total (after cross-merge)                                         673 CIDRs

Plus two /16s that don't have a clean ASN home but show the same
fingerprint:

-   150.5.0.0/16 — transferred to RIPE 2024-05-22, ambiguous
    registration
-   14.191.0.0/16 — VNPT Vietnam residential. Single /16 only, not all
    VNPT (AS45899 announces ~3,700 prefixes covering most of Vietnam —
    blocking that whole ASN would partition Vietnamese fediverse users
    from yttrx).

File layout

Bulk CIDRs went into a new snippet, kept separate so scraper-block.conf
stays readable:

  File                                     Purpose
  ---------------------------------------- --------------------------------------------------------------------
  /etc/nginx/snippets/scraper-block.conf   Main: $bad_ua map, $bad_cidr geo with two inline /16s + an include
  /etc/nginx/snippets/scraper-cidrs.conf   The 673-CIDR bulk list, generated from RIPEstat

Source-of-truth copies live in scripts/ in this repo.

Generation recipe

Documented in the header of scraper-cidrs.conf. Short version:

    for asn in 45102 37963 134963 24429 132203 150436; do
      curl -s "https://stat.ripe.net/data/announced-prefixes/data.json?resource=AS$asn" \
        | python3 -c 'import json,sys;d=json.load(sys.stdin);[print(p["prefix"]) for p in d["data"]["prefixes"]]'
    done | sort -u | grep -v : > all.txt
    # then aggregate with python netaddr.cidr_merge()

The merge dropped 2,579 input prefixes to 498 (Alibaba alone). Final
cross-provider merge: 673 CIDRs.

Deployment

    # backed up live config first
    ssh mammut 'cp /etc/nginx/snippets/scraper-block.conf \
                /etc/nginx/snippets/scraper-block.conf.pre-alibaba-expand.20260519-071953'

    scp scraper-cidrs.conf mammut:/etc/nginx/snippets/
    scp scraper-block.conf mammut:/etc/nginx/snippets/
    ssh mammut 'nginx -t && systemctl reload nginx'

Bug caught mid-deploy

First deploy silently weakened yesterday's block. The original
scraper-block.conf listed 43.172.0.0/15 inline. The first revision of
this change replaced it with an include of a file built from Alibaba ASN
data — under the assumption the /15 was Alibaba. It isn't (see
attribution correction above), so the Alibaba ASN data doesn't cover it,
and the new include dropped the line. Result: a ~7-minute window
(16:19:53 → 16:26:25) where 43.172/15 went from 403'ing to 200'ing about
~140 /tags/ requests.

Caught it by tailing the post-reload access log, found 200s where 403s
should have been, ran WHOIS on a sample IP, confirmed Tencent ownership,
regenerated scraper-cidrs.conf to include AS132203 (ACEVILLE) and
AS150436 (BytePlus) prefixes, and pushed a corrected version that
re-blocked 43.172/15 and added everything else.

Lesson: when migrating an inline list to an ASN-sourced bulk list, diff
coverage explicitly before deploying. Verification step that would have
caught it:

    for original_cidr in inline_list:
        assert any(IPNetwork(original_cidr) in IPNetwork(new) for new in bulk_list)

That check is now part of the regeneration recipe.

Verified post-deploy

  Test                                                         Result
  ------------------------------------------------------------ ------------------------------
  curl https://yttrx.com/ (real-user UA from non-blocked IP)   200
  curl https://yttrx.com/tags/test (same)                      200
  curl https://yttrx.com/api/v1/instance                       200
  curl -H 'User-Agent: Applebot/0.1' .../tags/test             403
  Live tail of 43.172/15 traffic post-reload                   All 403 from 16:26:25 onward
  Federation: Mastodon/4 UA traffic in last hour               4,830 202 + 435 200, no 403

Expected ongoing impact

Yesterday's block dropped /tags/ from 64k successful hits → 1.8k.
Today's expansion catches the rotation set responsible for the remaining
~1.8k. There's still residual /16s in the trailing edge of today's audit
(e.g. 150.5/16 showed 119 successful hits before this deploy) — those
are now blocked.

What this still doesn't do

Same caveat as yesterday: this matches declared UAs and announced
cloud-provider prefixes. A residential-proxy mesh would bypass both. If
the operator moves to bot-net'd home connections, none of these rules
touch them — Cloudflare's bot management at the edge remains the
realistic escalation.

Rollback

    ssh mammut 'cp /etc/nginx/snippets/scraper-block.conf.pre-alibaba-expand.20260519-071953 \
                /etc/nginx/snippets/scraper-block.conf
    rm /etc/nginx/snippets/scraper-cidrs.conf
    nginx -t && systemctl reload nginx'

That restores yesterday's narrower block.

------------------------------------------------------------------------

2026-05-18 — changelog.yttrx.com live on admin

Stood up changelog.yttrx.com as a tiny static site on admin that
publishes this file. Two URLs only: /changelog.md (verbatim) and
/changelog.txt (plaintext conversion via pandoc or a regex fallback —
see misc-sites.md). / 302s to /changelog.txt; everything else 404s. Sync
is manual (scp from a repo checkout); no nginx reload needed for content
updates.

Site file scripts/changelog.nginx →
/etc/nginx/sites-available/changelog. Web root /var/www/changelog/ owned
by www-data. Cert issued via
certbot certonly --standalone -d changelog.yttrx.com (brief stop of
nginx — the documented bootstrap pattern from misc-sites.md), so a
few-second outage of the other admin-hosted sites at deploy time.
Renewals piggyback on the existing 0 2 * * * certbot renew --nginx cron.

Verified

  Test                                                Result
  --------------------------------------------------- ----------------------------------------------
  curl -I https://changelog.yttrx.com/                302 to /changelog.txt
  curl -I https://changelog.yttrx.com/changelog.txt   200, Content-Type: text/plain; charset=utf-8
  curl -I https://changelog.yttrx.com/changelog.md    200, Content-Type: text/plain; charset=utf-8
  curl -I https://changelog.yttrx.com/notathing       404
  curl -I http://changelog.yttrx.com/                 301 to https

Rollback

    rm /etc/nginx/sites-enabled/changelog
    nginx -t && systemctl reload nginx

Cert and /var/www/changelog/ can stay; the cron renewal is harmless
until the cert is pruned.

------------------------------------------------------------------------

2026-05-18 — Scraper UA + CIDR block at nginx (replaces Anubis as first-line defense)

After the Anubis rollback earlier today, audited today's
/var/log/nginx/access.log (269k lines on the Mastodon vhosts) to
understand what we'd actually been trying to defend against. The scrape
target is /tags/, no contest: 64,234 hits today, 63,520 of them 200. Two
cohorts account for the bulk of it:

-   Named AI/SEO scrapers identifying themselves in the UA: Applebot
    (19,523 /tags/ hits), meta-externalagent (14,050), GoogleOther
    (6,630), plus the long tail (MJ12bot, AhrefsBot, etc.).
-   A disguised botnet from Alibaba Cloud Singapore (43.172.0.0/15):
    ~22,000 /tags/ hits today rotating across dozens of "Chrome on
    Windows" UAs (versions 103, 104, 105, … 148) in suspiciously flat
    1,200–2,500-hit buckets. Real users don't distribute themselves
    uniformly across 30 Chrome versions; this is one operator running a
    scraper farm with a UA-rotation library.

The genuinely surprising find that reframed the Anubis approach: 31,114
requests today to /.within.website/x/cmd/anubis/static/locales/en.json —
that's the Anubis challenge interstitial's i18n file. Every one had
Referer: https://yttrx.com/tags/<something>, all from the same Alibaba
IPs. During the hours Anubis was live, the locale-fetch curve tracked
/tags/ traffic almost 1:1, meaning the scraper was completing the
Proof-of-Work challenge and walking right through. Anubis-as-deployed
wasn't blocking the actual aggressors. Real users got a broken composer;
the headless-browser scraper farm just paid a few extra CPU-seconds.

That changes the cost/benefit: cheaper and more effective to drop the
obvious offenders at nginx than to challenge everyone and hope the
offenders fail.

What got deployed

New snippet /etc/nginx/snippets/scraper-block.conf (source:
scripts/scraper-block.conf) defining two maps:

-   $bad_ua — set by map $http_user_agent, matches a list of
    self-identifying scraper UAs (Applebot, meta-externalagent,
    GoogleOther, GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai,
    PerplexityBot, CCBot, Bytespider, Amazonbot, DuckAssistBot,
    FacebookBot, Diffbot, Omgilibot, MJ12bot, AhrefsBot, SemrushBot,
    DotBot, ImagesiftBot, TimpiBot, plus a few siblings).
-   $bad_cidr — set by geo $remote_addr, matches the Alibaba ranges
    where today's UA-rotation farm lives: 43.172.0.0/15, 8.208.0.0/12,
    47.235.0.0/16, 47.236.0.0/16, 47.243.0.0/16, 47.245.0.0/16,
    47.77.0.0/16, 43.106.0.0/16, 43.110.0.0/16.

Googlebot and Bingbot are intentionally absent — search discovery is
wanted. The CIDR check depends on cloudflare-real-ip.conf rewriting
$remote_addr from CF-Connecting-IP (in place since 2026-05-03); without
it the geo match silently does nothing.

Wired into /etc/nginx/sites-available/mastodon (the currently-enabled
site) at three points:

    # top of file, alongside the existing map $http_upgrade
    include /etc/nginx/snippets/scraper-block.conf;

    # inside each :443 server { } (yttrx.com and masto.yttrx.com),
    # after add_header Strict-Transport-Security
    if ($bad_ua)   { return 403; }
    if ($bad_cidr) { return 403; }

if at server scope with return is the one safe form per the "if is evil"
wiki page.

The dormant mastodon-anubis site file (scripts/mastodon-anubis.nginx)
got the same edits so that if Anubis ever gets revisited, the scraper
block goes with it.

Why CIDR scope is server-wide (not just /tags/)

Considered narrowing the $bad_cidr 403 to scrape-target paths only, so a
legitimate fediverse server in those ranges could still federate via
/inbox. Audit data ruled it out: of 55,111 requests from 43.172/15
today, exactly 6 had Mastodon/Pleroma/Akkoma/Misskey/Lemmy UAs —
collateral on a blanket CIDR block is essentially nil, and the
bookkeeping is much simpler.

Deployment

Executed on mammut:

    cp /etc/nginx/sites-available/mastodon \
       /etc/nginx/sites-available/mastodon.pre-scraper-block.20260518-112054

    scp scraper-block.conf mammut:/etc/nginx/snippets/scraper-block.conf
    scp mastodon-live-edited mammut:/etc/nginx/sites-available/mastodon

    nginx -t && systemctl reload nginx

Verified post-deploy

  Test                                                                       Result
  -------------------------------------------------------------------------- -------------------------------------------------------------------------------------------
  curl -H 'User-Agent: …Applebot/0.1…' https://yttrx.com/tags/test           403 from Server: nginx/1.22.1 (not Mastodon)
  curl -H 'User-Agent: meta-externalagent/1.1' https://yttrx.com/tags/test   403 from nginx
  curl -H 'User-Agent: Mozilla/5.0 … Chrome/130 …' https://yttrx.com/        200 from Mastodon
  curl https://masto.yttrx.com/api/v1/instance                               200 from Mastodon (federation API intact)
  curl -H 'User-Agent: Mastodon/4.5.9 (http.rb/…)' https://yttrx.com/inbox   404 from Mastodon (HEAD on POST-only route — not a 403 from nginx, which is what matters)

Expected ongoing impact

Applied retroactively against today's 269k-line access.log, the rules
would have rejected ~106,000 requests (~40% of traffic) — primarily on
/tags/ — without touching /inbox, /api/, /oauth/, or /.well-known/. Real
federation (48k /inbox POSTs today from Mastodon/x.y UAs) is unaffected.

Rollback

    cp /etc/nginx/sites-available/mastodon.pre-scraper-block.20260518-112054 \
       /etc/nginx/sites-available/mastodon
    nginx -t && systemctl reload nginx

The scraper-block.conf snippet can stay in /etc/nginx/snippets/ — it
does nothing on its own without the if lines and the include.

What this doesn't do

The block matches declared UA strings and specific source CIDRs. A
scraper that rotates both — picks a non-listed CIDR and sends a generic
Chrome UA — slips through. Today's data doesn't show much of that, but
if the Alibaba farm migrates to a residential-proxy mesh, this won't
catch them. At that point the realistic next step is Cloudflare's bot
management at the edge, not more rules on mammut.

------------------------------------------------------------------------

2026-05-18 — Anubis rolled back from yttrx.com and masto.yttrx.com

The Anubis deployment from 2026-05-17 broke parts of the Mastodon web
client (login, post composer, federated timeline streaming) for real
users, so it's been removed from the request path. nginx now serves both
vhosts via the original mastodon site again; the anubis@yttrx.service
unit is stopped and disabled.

Rollback steps executed on mammut

    ln -sfn ../sites-available/mastodon /etc/nginx/sites-enabled/mastodon
    nginx -t && systemctl reload nginx

    systemctl stop anubis@yttrx.service
    systemctl disable anubis@yttrx.service

Verified post-rollback

  Test                                                            Result
  --------------------------------------------------------------- ------------------------------------------------------------
  curl -I -H 'User-Agent: Mozilla/5.0' https://yttrx.com/         200 from Server: Mastodon, no techaro.lol-anubis-* cookies
  curl -I -H 'User-Agent: Mozilla/5.0' https://masto.yttrx.com/   302 redirect to yttrx.com/ (expected; Server: nginx)
  curl -I https://masto.yttrx.com/api/v1/instance                 200 from Server: Mastodon
  curl -I 'https://yttrx.com/.well-known/webfinger?resource=…'    200 from Server: Mastodon

No challenge interstitial, no Anubis cookies. Web client is reachable
directly again.

What's left in place (intentionally)

-   /etc/nginx/sites-available/mastodon-anubis — the dual-vhost site
    file with the Anubis bypass blocks. Kept on disk so re-enabling is a
    single ln -sfn away if/when we revisit.
-   /etc/anubis/yttrx.env and /etc/anubis/yttrx.botPolicies.yaml —
    config + key material, untouched.
-   /etc/nginx/snippets/masto-proxy.conf — shared proxy headers, also
    referenced by the original mastodon site; safe to leave.
-   The anubis package itself — not uninstalled.

The systemd template instance is disabled, so the service won't come
back on reboot.

Why this regressed the web client

Not investigated to root cause — rolling back was cheaper than tuning
the policy. Anubis only challenges browser HTML at location /;
ActivityPub, OAuth, API, streaming, WebFinger, and nodeinfo were already
bypassed in scripts/mastodon-anubis.nginx, yet enough things in the SPA
broke for real users that the deployment wasn't viable as configured.

If we revisit, the experiment should start on a less critical vhost,
capture specific failing requests/flows before changing anything, and
test the full first-login flow in a clean browser profile before
enabling on yttrx.com.

Re-enable path (for reference)

    ln -sfn ../sites-available/mastodon-anubis /etc/nginx/sites-enabled/mastodon
    nginx -t && systemctl reload nginx
    systemctl enable --now anubis@yttrx.service

------------------------------------------------------------------------

2026-05-18 — R2 cutover complete: Mastodon writes to Cloudflare R2

The CDN migration is done. As of 07:37 CEST,
~mastodon/live/.env.production points the Mastodon S3 client at
Cloudflare R2 (yttrx-media), and all three writer containers (web,
sidekiq, streaming) restarted cleanly under the new config. DigitalOcean
Spaces is now a frozen historical bucket — reads still flow through the
dual-backend nginx (cdn-migration) which tries R2 first and falls back
to DO for any objects R2 doesn't have.

Pre-flight state

rclone sync spaces-old:yttrx → r2:yttrx-media had been running since
~2026-05-17 11:19 (with --transfers 64 --checkers 128). By cutover time,
R2 was at 811.9 GiB / 1.991M objects — slightly larger than the 797.5
GiB / 1.978M-object DO baseline taken at the start, because rclone
caught new uploads written to DO during the run. The remaining delta as
of cutover was a few hundred KiB of cache/ (federated-media cache)
stragglers, which Mastodon will re-fetch on demand and we explicitly
chose not to chase.

Cutover sequence (executed in that order on mammut)

1.  Disabled the in-session monitoring cron, killed the long-running
    rclone sync (tmux session migration) and its sibling rclone-watcher.
    Marker line written to /var/log/rclone-migration.log.
2.  Stopped Mastodon writers: docker compose stop web sidekiq streaming.
    Kept nginx, dragonfly, es, and PostgreSQL running so files.yttrx.com
    continued serving reads via cdn-migration throughout.
3.  Final delta rclone sync started with the same
    --transfers 64 --checkers 128. Wall clock: a few minutes of
    bucket-comparison walk for negligible new transfer (a couple of
    preview-card PNGs). Killed once it was clear the remaining tail was
    all cache/.
4.  Skipped the full rclone check. The full bucket-walk would have taken
    many minutes for content (federated cache) we don't need to verify.
    Trade-off: a small risk that a non-cache upload near the
    alphabetical end of the bucket didn't make it to R2; the
    dual-backend nginx fallback covers that case anyway (R2 404 → DO).
5.  Backed up .env.production to
    /home/mastodon/live/.env.production.pre-r2-cutover.20260518-073707
    (md5-verified equal to the live file before edit).
6.  Swapped S3 vars in .env.production (sed, transactional via .tmp +
    mv):
        S3_BUCKET=yttrx              → S3_BUCKET=yttrx-media
        S3_REGION=us-east-1          → S3_REGION=auto
        S3_HOSTNAME=https://yttrx.sfo3.digitaloceanspaces.com
                                     → S3_HOSTNAME=https://yttrx-media.b10d4c19446fc73dcd3af1145490c01b.r2.cloudflarestorage.com
        S3_ENDPOINT=https://sfo3.digitaloceanspaces.com
                                     → S3_ENDPOINT=https://b10d4c19446fc73dcd3af1145490c01b.r2.cloudflarestorage.com
        AWS_ACCESS_KEY_ID=DO00XZDDD38HM4ML7TTM
                                     → AWS_ACCESS_KEY_ID=9301014f2722d65b7c7bd1372648e1a0
        AWS_SECRET_ACCESS_KEY=<DO secret>
                                     → AWS_SECRET_ACCESS_KEY=<SHA-256 of R2 cfut_ token>

    Real R2 credentials live in the private secondbrain note
    yttrx-r2-credentials.md. S3_ENABLED, S3_PROTOCOL=https,
    S3_ALIAS_HOST=files.yttrx.com unchanged.
7.  Brought writers back up: docker compose up -d web sidekiq streaming.
    All five containers reported healthy within 30 seconds. Smoke test:
    curl https://masto.yttrx.com/api/v1/instance returned 200 from
    Server: Mastodon.

Total downtime for masto.yttrx.com: ~28 minutes (from
docker compose stop to all-healthy). files.yttrx.com stayed up the whole
time — the dual-backend cdn-migration config served reads from R2/DO
without interruption.

Maintenance page

Added during the cutover so the 502 from nginx wouldn't be the
user-facing error while the containers were down:

-   scripts/maintenance.html — friendly maintenance page deployed at
    /var/www/html/maintenance.html on mammut.
-   scripts/mastodon-anubis.nginx — added in each :443 server block
    (both yttrx.com and masto.yttrx.com):
        error_page 502 503 504 /maintenance.html;
        location = /maintenance.html {
          root /var/www/html;
          internal;
        }
-   Confirmed working during the cutover: /api/v1/instance,
    /.well-known/webfinger, /users/.../outbox all returned the
    maintenance HTML with HTTP 502.

The directives remain in place post-cutover — they're harmless when
containers are healthy and useful for any future planned-maintenance
window.

Rollback path (still available)

If we discover R2 writes are broken or media reads regress, revert:

    sudo -u mastodon cp -p /home/mastodon/live/.env.production.pre-r2-cutover.20260518-073707 \
        /home/mastodon/live/.env.production
    sudo -u mastodon bash -c 'cd ~mastodon/live && docker compose restart web sidekiq streaming'

Any media uploaded to R2 during the brief post-cutover window will still
be reachable via files.yttrx.com — the dual-backend nginx tries R2
first, so users keep seeing their new uploads even after the env revert.
No data is lost on rollback.

If nginx itself misbehaves, swap the enabled site back:

    rm /etc/nginx/sites-enabled/cdn-migration && \
    ln -s ../sites-available/cdn-digitalocean /etc/nginx/sites-enabled/cdn-digitalocean && \
    nginx -t && systemctl reload nginx

That sends files.yttrx.com straight to DO via the legacy single-backend
nginx config.

Remaining work

-   Smoke-test: upload an attachment via the web UI or a mobile client.
    Verify with
    rclone ls r2:yttrx-media/media_attachments/files/<recent-id>/ that
    it landed in R2 not DO.
-   Open task #117 ("Execute or retire the DO Spaces → R2 migration")
    closes after smoke-test passes.
-   Future cleanup (separate change, at least a week out — see
    cdn-s3-migration.md Step 7): drop the @s3_fallback block from
    cdn-migration once R2 has been authoritative without fallback-served
    requests in the access log.
-   DO Spaces decommission once we're confident: cost savings start, and
    the legacy cdn-digitalocean site file can be removed alongside.

------------------------------------------------------------------------

2026-05-17 — Anubis live in front of yttrx.com and masto.yttrx.com

Browser HTML on both vhosts now passes through Anubis (v1.25.0) before
reaching Mastodon. Server-to-server traffic (ActivityPub federation,
REST API, OAuth, the streaming WebSocket, WebFinger, nodeinfo) is
bypassed at the nginx layer so peer instances and mobile clients never
see a JS challenge.

Architecture

    client ──HTTPS──> nginx :443 (TLS terminator, both vhosts)
                      ├─ /.well-known, /inbox, /api, /oauth, …  ──> webbackend :3000      (direct, no anubis)
                      ├─ /api/v1/streaming                       ──> streamingbackend :4000 (direct, no anubis)
                      ├─ /system                                  ──> 301 files.yttrx.com
                      └─ location /                              ──> anubis 127.0.0.1:8923
                                                                        └──> nginx 127.0.0.1:8081 (backend)
                                                                              └──> Mastodon redirects + webbackend

Files

  Path                                         Source of truth                                  Purpose
  -------------------------------------------- ------------------------------------------------ ------------------------------------------------------------------------------
  /etc/nginx/sites-available/mastodon-anubis   scripts/mastodon-anubis.nginx                    Drop-in replacement for the mastodon site; pivot by re-pointing the symlink.
  /etc/nginx/snippets/masto-proxy.conf         scripts/masto-proxy.conf                         Shared proxy header set used by every bypass location.
  /etc/anubis/yttrx.env                        (server-local)                                   BIND/TARGET/policy path, inlined ED25519 key (mode 600).
  /etc/anubis/yttrx.botPolicies.yaml           copy of /usr/share/doc/anubis/botPolicies.yaml   Anubis default policy unchanged.

Service: systemctl status anubis@yttrx.service (uses DynamicUser=yes,
listens on 127.0.0.1:8923, metrics on 127.0.0.1:9090).

Pivot

    # enable
    ln -sfn ../sites-available/mastodon-anubis /etc/nginx/sites-enabled/mastodon && nginx -t && systemctl reload nginx
    # revert
    ln -sfn ../sites-available/mastodon          /etc/nginx/sites-enabled/mastodon && nginx -t && systemctl reload nginx

Gotchas hit during setup

1. COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN are mutually exclusive.
Anubis exits with
you can't set COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN at the same time.
We want dynamic mode since the same instance covers both yttrx.com and
masto.yttrx.com, so COOKIE_DOMAIN is omitted.

2. The Debian package uses DynamicUser=yes (transient uid/gid per
instance). That makes unix-socket permission sharing with nginx's
www-data a hassle. Solution: use TCP loopback both ways
(BIND=127.0.0.1:8923, TARGET=http://127.0.0.1:8081, backend nginx
listen 127.0.0.1:8081). Slight per-request overhead, zero permission
plumbing.

3. ED25519_PRIVATE_KEY_HEX_FILE vs inlining. With DynamicUser=yes the
anubis process can't read a root-owned key file. Inlining the key as
ED25519_PRIVATE_KEY_HEX=<hex> in yttrx.env (mode 600, root-owned) works
because systemd reads the EnvironmentFile as root and passes the var to
the child after privilege drop.

4. Default Anubis policy doesn't challenge bare curl. It WEIGHs
User-Agent: Mozilla|Opera only. A bare curl -I will go straight through
to the Mastodon backend — that's by design, not misconfiguration. Verify
with a browser UA: curl -H 'User-Agent: Mozilla/5.0' https://yttrx.com/.

Verified

  Test                                     Result
  ---------------------------------------- ------------------------------------------------------------------------
  curl /.well-known/webfinger?resource=…   200 from Server: Mastodon (federation bypass works)
  curl /api/v1/instance                    200 from Server: Mastodon
  curl /api/v1/streaming                   400 from streaming backend (HEAD; expected)
  curl /nodeinfo/2.0                       200 from Server: Mastodon
  curl -H 'User-Agent: Mozilla/5.0' /      200 with Anubis challenge HTML (techaro.lol-anubis-* cookies set)
  Browser refresh on yttrx.com             Anubis challenge interstitial then site (user-confirmed)
  Metrics after ~3 min                     252 DENY action="bot/ai-catchall", 346 challenges issued, 54 validated

Rollback signal

If federation backlog grows (sidekiq inbox queue), check the bypass
blocks first. To take Anubis fully out of the path: flip the symlink
back (see "Pivot" above) and systemctl stop anubis@yttrx.service.

------------------------------------------------------------------------

2026-05-17 — files.yttrx.com flipped to mammut: dual-backend (R2 + DO fallback) live

The CDN migration's first production-facing step is done.
files.yttrx.com now resolves through Cloudflare to mammut nginx, which
proxies to R2 first and falls through to DO Spaces on 404. Mastodon
continues to write to DO during the rclone backfill; reads serve from
whichever bucket has the object.

Cutover sequence

1.  Created Cloudflare R2 custom domain files-r2.yttrx.com pointed at
    the yttrx-media bucket — gives anonymous, signed-request-free reads
    from R2.
2.  Provisioned the cdn-migration site on mammut:
    -   /etc/nginx/sites-available/cdn-migration — copy in
        scripts/cdn-migration.nginx
    -   /data/nginx/cache/ (created, then emptied — see below)
    -   /etc/nginx/dmca (empty, just needed for the include)
    -   Reuses the existing /etc/certs/{fullchain,privkey}.pem
        *.yttrx.com wildcard
3.  Symlinked cdn-migration into sites-enabled/, nginx -t, reload.
4.  Flipped the Cloudflare proxy origin for files.yttrx.com from DO
    Spaces to 144.76.4.67 (mammut) in the CF dashboard. CF still fronts
    and terminates TLS; mammut nginx is now the origin behind CF.

Three gotchas to remember

1. nginx needs proxy_ssl_server_name on + proxy_ssl_name for HTTPS
upstreams. Without these, the upstream TLS handshake never sets SNI, and
Cloudflare-fronted hostnames (both files-r2.yttrx.com and the R2 S3
endpoint) reject with SSL alert 40 (handshake failure). Every
proxy_pass https://... location now has these two directives explicitly.

2. add_header only fires for 2xx/3xx by default — use always. Almost
burned us during debugging because failing 404 responses were missing
the X-Upstream / X-Cache-Status headers we'd added for diagnosis, making
it look like nginx had skipped the proxy location entirely when it
actually hadn't. All add_header directives in the config now have
always.

3. Local proxy_cache was a phantom-404 trap and is now disabled. With
proxy_intercept_errors on and error_page 404 = @s3_fallback; at @s3,
R2's 404 response was getting cached at @s3's cache key before the
error_page redirect ran, then served as a phantom HIT 404 to every
subsequent request — even though the fallback to DO Spaces would have
returned 200. Symptoms: rm -rf /data/nginx/cache/* and full
systemctl restart nginx didn't fix it (the shared-memory key zone
retained ghost entries that mapped to absent disk files). Brand-new URLs
that had never been requested returned X-Cache-Status: HIT on first
request, which doesn't make any sense unless the cache layer is broken.

Workaround: remove the proxy_cache mycache; server-level directive and
the proxy_cache_* directives in each location. CF's edge cache sits in
front of mammut now, so caching at the mammut layer is marginal — most
user requests serve from CF edge and never reach the origin. If we ever
need origin-side caching back, the safe approach is probably:

-   Move 404-fallback out of proxy_intercept_errors + error_page (try a
    Lua-based or two-layered design that doesn't share a cache key
    between locations), or
-   Add proxy_no_cache 1 to the @s3 block so it never caches its own
    responses, leaving caching only at @s3_fallback's level.

Verified

  Test                                                   Result
  ------------------------------------------------------ --------------------------------------------------------
  Object in R2                                           200, X-Upstream: r2
  Object in DO only (e.g. /cache/... not yet migrated)   200, X-Upstream: do-spaces
  Object in neither                                      404/403 (browsers render as broken image — acceptable)
  Real DNS through CF                                    200 end-to-end

Mastodon writes still go to DO Spaces (.env.production unchanged; the
timestamped backup at
~mastodon/live/.env.production.pre-r2-migration.20260517-110003 was the
source for the revert after the brief midday outage). rclone sync
continues in tmux session migration; ~28% through 797.5 GiB at time of
cutover. When sync completes:

-   Run a final rclone sync to catch the delta of new uploads written to
    DO during the run
-   rclone check spaces-old:yttrx r2:yttrx-media — verify zero
    mismatches
-   Edit .env.production to point Mastodon writes at R2 (template is the
    timestamped backup's opposite; new backup before this change)
-   docker compose restart web sidekiq streaming
-   Decommission cdn-digitalocean (it's still in sites-available/ as the
    documented rollback target)
-   Eventually drop the @s3_fallback block from cdn-migration once R2 is
    100% authoritative

------------------------------------------------------------------------

2026-05-17 — New mastodon-cleanup.sh replacing dead purge-media.sh

Audited ~mastodon/bin/purge-media.sh (in place since 2025-02-20). It was
dead code: not scheduled anywhere, hasn't run, and wouldn't have worked
if it had — two bugs:

1.  export $PATH=~mastodon/bin:$PATH — the literal $ made bash expand
    $PATH on the LHS, producing a syntax error.
2.  Called tootctl directly. tootctl doesn't exist on the host; it lives
    inside the live-web-1 container. The PATH hack referenced
    ~mastodon/bin/tootctl (a docker exec -it wrapper), but -it requires
    a TTY and doesn't work under cron anyway.

Wrote ~mastodon/bin/mastodon-cleanup.sh (copy in
scripts/mastodon-cleanup.sh) as a from-scratch replacement:

-   Uses docker compose exec -T web bin/tootctl directly. The -T
    disables TTY allocation — works in cron.
-   set -euo pipefail. Verifies live-web-1 is running before touching
    anything.
-   flock lock at /tmp/mastodon-cleanup.lock to block concurrent runs.
-   Each step is timed and logged with start/end markers.
-   Owner-runnable by mastodon (already in the docker group), so it can
    be scheduled in the mastodon user crontab without sudo.

Scope intentionally narrower than the old script — two steps, no
time-based pruning:

    tootctl accounts prune
    tootctl media remove-orphans

Both are content-preserving: they never touch anything a local user
posted or follows. accounts prune removes dormant federated accounts
(never followed locally, not seen for a long time); media remove-orphans
then sweeps any media records left without an owning account. Order
matters — pruning can produce orphans, so the sweep comes second.

The deliberately-omitted retention-based commands
(statuses remove --days N, preview_cards remove --days N,
media remove --days N) are more aggressive and case-by-case; they can be
run by hand when needed rather than on every cron tick.

Not scheduled yet. The old purge-media.sh was never on cron so there's
no migration; deliberately holding off on adding a crontab entry until
the R2 migration is done — running remove-orphans during the rclone sync
would churn deletions across both buckets unnecessarily. Sibling
trim-storage.sh (which runs tootctl media remove --days 14 and logs to
/tmp/media_remove.log) is left alone for now; will fold it in or retire
it after the migration.

Verified docker compose exec -T web bin/tootctl version returns 4.5.9
from the mastodon user, confirming the invocation pattern works. The old
purge-media.sh is left in place but vestigial; safe to delete once the
new script has been exercised at least once.

------------------------------------------------------------------------

2026-05-17 — CDN migration to Cloudflare R2 (in progress)

Started the long-pending DO Spaces → Cloudflare R2 CDN migration
documented in cdn-s3-migration.md. Migration is still running at time of
writing; this entry captures the prep work that's done and the
operational findings that should outlive the migration itself.

What got done

-   R2 bucket created: yttrx-media (Standard storage, region automatic).
    Account ID b10d4c19446fc73dcd3af1145490c01b.
-   R2 API token created: yttrx-media R2 Read/Write, Object R/W, scoped
    to yttrx-media. Token is a Cloudflare User API token (raw value
    prefixed cfut_). Real credentials live in the private secondbrain
    note yttrx-r2-credentials.md and in /root/.config/rclone/rclone.conf
    on mammut — never in this repo.
-   rclone installed and configured on mammut at
    /root/.config/rclone/rclone.conf (mode 0600), with [spaces-old] and
    [r2] remotes. Both verified with rclone lsd.
-   Three nginx site files staged at /etc/nginx/sites-available/:
    -   cdn — original single-backend, untouched
    -   cdn-digitalocean — copy of cdn with header comment; rollback
        target during migration
    -   cdn-migration — dual-backend (R2 primary, DO Spaces fallback
        on 404) None enabled yet. See cdn-site.md for the architecture.

Real bucket size

rclone size spaces-old:yttrx against the actual bucket (took 9m41s to
list):

-   1,978,158 objects · 797.5 GiB

This is significantly larger than two other measurements taken earlier
in the day:

-   tootctl media usage: ~737 GB total (sum of categories)
-   DO Spaces billing: 465.16 GiB billable for the 16-day partial month

The DO bill is computed on average storage across the period, not peak —
and a 332 GiB gap between billed-average and actual-now suggests rapid
growth this month (~20 GiB/day) or accumulated orphans/versioning that
the billing amortizes differently. For migration planning, trust
rclone size, not tootctl or the bill.

Throughput findings

Initial sync with --transfers 8 --checkers 16 averaged ~120 MiB/min over
the first ~25 minutes (mix of small avatars/emoji and some real
attachments). Extrapolating: ~113 hours / ~4.7 days for full sync.
Unacceptable for a "maybe-take-the-site-down" migration.

Killed and restarted with --transfers 64 --checkers 128. rclone is
idempotent — already-copied objects are skipped — so the restart only
costs a few minutes of relisting.

The bottleneck is small-file overhead, not bandwidth. mammut's 1 Gbps
link is barely tickled even at high parallelism. The ~2M objects (avg
~400 KB each) mean rclone is spending most of its time on per-object
HTTP roundtrips. Higher parallelism is the right knob.

Brief outage and the "DO writes during sync" pattern

Initially planned Path B (cold cutover): stop nginx + Mastodon, let sync
complete with zero new writes, then flip everything to R2 in one shot.
Took down systemctl stop nginx + docker compose down accordingly, and
pre-emptively rewrote .env.production to point Mastodon at R2.

Realized once the real bucket size landed (797 GiB, multi-day sync at
any reasonable parallelism) that the site can't stay down for the
duration. Switched to Path B-with-deferred-cutover:

1.  Reverted .env.production from the timestamped backup (md5 verified)
    so Mastodon would continue writing to DO Spaces on restart.
2.  docker compose up -d — containers healthy in seconds.
3.  systemctl start nginx — site live again.
4.  rclone keeps running in tmux on mammut, syncing DO → R2 in the
    background.

Site total downtime: ~30 minutes during the prep work. Once rclone exits
and rclone check passes, the real cutover is a short maintenance window:
brief nginx stop, final rclone sync to catch the delta of anything
written to DO since the main sync, swap .env.production to R2, enable R2
anonymous reads, flip files.yttrx.com DNS, restart containers, restart
nginx.

Backups taken

-   ~mastodon/live/.env.production.pre-r2-migration.20260517-110003 —
    pre-edit copy. Restore:
    sudo -u mastodon cp <that file> ~mastodon/live/.env.production.

Operational gotchas to remember

1.  rclone on Debian bookworm (1.60.1) wants provider = DigitalOcean,
    not DigitalOceanSpaces (the latter raises "provider not known").
    Older docs use the wrong name.
2.  R2 cfut_ tokens must be SHA-256 hashed before use as an S3 Secret
    Access Key. The dashboard's "use the token value above" wording is
    misleading. Recipe: printf '%s' 'cfut_...' | sha256sum.
3.  Bucket-scoped R2 tokens need no_check_bucket = true in rclone —
    they're authorized for object R/W but not for HeadBucket. Without
    this rclone fails at startup.
4.  Mastodon .env.production does NOT use the same credentials path as
    rclone. rclone reads /root/.config/rclone/rclone.conf. Changing one
    doesn't affect the other.
5.  S3_ALIAS_HOST=files.yttrx.com decouples Mastodon's URL generation
    from the bucket location — that's what makes the eventual DNS
    cutover transparent to users.

Remaining work (post-migration)

-   Final delta rclone sync after Mastodon writes are cut over
-   rclone check zero-mismatch verification
-   Enable R2 anonymous reads (likely "Connect Custom Domain →
    files.yttrx.com", or pub-*.r2.dev if keeping mammut on the path)
-   Apply R2 values to .env.production, restart containers
-   Flip files.yttrx.com DNS to R2 (if going custom-domain route)
-   Update Step 7 of cdn-s3-migration.md to reflect actual decision on
    architecture

------------------------------------------------------------------------

2026-05-05 — Firewall: block external access to PostgreSQL port 5432

PostgreSQL was listening on all interfaces (listen_addresses = '*') with
no firewall, leaving port 5432 reachable from the internet. pg_hba.conf
was rejecting the connection attempts, but bots were still making TCP
connections several times per hour (visible in the logs:
FATAL: no pg_hba.conf entry for host "179.43.186.223" etc.).

The Mastodon containers connect to PostgreSQL using DB_HOST=144.76.4.67
(the host's public IP), so the fix couldn't simply be binding to
localhost — it needed to allow the Docker bridge subnets through.

Installed iptables-persistent and added three INPUT rules:

    -A INPUT -s 127.0.0.0/8   -p tcp --dport 5432 -j ACCEPT   # localhost
    -A INPUT -s 172.16.0.0/12 -p tcp --dport 5432 -j ACCEPT   # all Docker subnets (172.16–172.31)
    -A INPUT -p tcp --dport 5432 -j DROP                       # everything else

Docker networks in use at time of change:

  Network                 Subnet
  ----------------------- ---------------
  bridge (default)        172.17.0.0/16
  live_external_network   172.18.0.0/16
  live_internal_network   172.19.0.0/16
  finger_default          172.20.0.0/16

Rules saved to /etc/iptables/rules.v4 and load at boot via
netfilter-persistent.service. Mastodon health check confirmed OK after
rules were applied.

Note: Docker was already blocking external access to ports 3000, 4000,
and 9200 via raw PREROUTING rules it manages itself — 5432 was the only
gap.

Rollback

    iptables -D INPUT -s 127.0.0.0/8 -p tcp --dport 5432 -j ACCEPT
    iptables -D INPUT -s 172.16.0.0/12 -p tcp --dport 5432 -j ACCEPT
    iptables -D INPUT -p tcp --dport 5432 -j DROP
    netfilter-persistent save

------------------------------------------------------------------------

2026-05-05 — PostgreSQL tuning

Reviewed PostgreSQL logs and found six misconfigurations on mammut, all
stemming from the default postgresql.conf being unchanged since
installation despite the server having 62 GB RAM and NVMe RAID storage.

File: /etc/postgresql/15/main/postgresql.conf

  Parameter              Before           After   Reason
  ---------------------- ---------------- ------- -----------------------------------------------------------------------
  shared_buffers         128MB            8GB     Default is for tiny servers; 8GB gives PostgreSQL its own buffer pool
  effective_cache_size   4GB              40GB    Planner hint; now reflects actual available RAM (~57 GB free)
  random_page_cost       4.0              1.1     4.0 assumes spinning disks; server has NVMe RAID 1
  work_mem               4MB              32MB    More memory per sort/hash operation
  maintenance_work_mem   64MB             512MB   Faster autovacuum and index builds
  wal_buffers            -1 (auto ~4MB)   32MB    Explicit sizing for the WAL write buffer

shared_buffers and wal_buffers require a full restart; the others only
need a reload. PostgreSQL was restarted with
systemctl restart postgresql. Mastodon reconnected automatically (health
check confirmed OK immediately after).

Rollback

    # Revert postgresql.conf
    sed -i \
      -e 's/^shared_buffers = 8GB/shared_buffers = 128MB/' \
      -e 's/^work_mem = 32MB/#work_mem = 4MB/' \
      -e 's/^maintenance_work_mem = 512MB/#maintenance_work_mem = 64MB/' \
      -e 's/^wal_buffers = 32MB/#wal_buffers = -1/' \
      -e 's/^random_page_cost = 1.1/#random_page_cost = 4.0/' \
      -e 's/^effective_cache_size = 40GB/#effective_cache_size = 4GB/' \
      /etc/postgresql/15/main/postgresql.conf
    systemctl restart postgresql

------------------------------------------------------------------------

2026-05-03 — nginx caching & real IP passthrough

nginx config backup

Before making any changes, a full backup of all nginx configs was
created at:

    /etc/nginx/backup-20260503-164014/

Contains: nginx.conf, sites-available/*, sites-enabled/*.

To roll back any site config:

    cp /etc/nginx/backup-20260503-164014/sites-available/<name> /etc/nginx/sites-available/
    nginx -t && systemctl reload nginx

Access pattern analysis

Reviewed /var/log/nginx/*.log* across all vhosts. Top findings:

  URL                                      200 hits   Notes
  ---------------------------------------- ---------- --------------------------------------
  /                                        190k       coefficiencies.com homepage
  /finger/waffles@yttrx.com                115k       98.2% from 6,652 Fediverse instances
  /posts/index.xml                         101k       RSS feed, static file
  /Bingham.json                            73k        json.tommertron.com data file
  /help/2023/03/12/welcome-to-yttrx.html   56k        static HTML

Traffic is flat ~50k req/hr with no peak window. Server has 960MB RAM,
362MB available, no swap.

nginx.conf — gzip, proxy cache, open file cache

File: /etc/nginx/nginx.conf

Replaced the sparse commented-out gzip block with full settings, and
added two new cache sections:

    gzip on;
    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 5;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/javascript
               text/xml application/xml application/xml+rss text/javascript
               image/svg+xml application/manifest+json;

    proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=yttrx_cache:5m
                     max_size=50m inactive=5m use_temp_path=off;

    open_file_cache          max=2000 inactive=60s;
    open_file_cache_valid    120s;
    open_file_cache_min_uses 2;
    open_file_cache_errors   on;

-   gzip: previously only text/html was compressed (the default); JSON,
    XML, SVG, webmanifest were not.
-   proxy_cache_path: 50MB on-disk cache at /var/cache/nginx, 5MB
    in-memory key zone (yttrx_cache). Used by the finger proxy (see
    below). Cache directory created and owned by www-data.
-   open_file_cache: keeps file descriptors and stat() results in worker
    memory for static files, avoiding a filesystem call per request.

finger.yttrx.com — proxy response cache

File: /etc/nginx/sites-available/finger

Added proxy caching to the location / block inside the HTTPS server. The
finger service (:5000, Python/Werkzeug) was receiving 115k hits/day,
almost entirely from Mastodon instances polling on staggered schedules.

    proxy_cache            yttrx_cache;
    proxy_cache_valid      200 30s;
    proxy_cache_use_stale  error timeout updating http_500 http_502 http_503 http_504;
    proxy_cache_lock       on;
    add_header             X-Cache-Status $upstream_cache_status;

-   30-second TTL keeps data near-realtime while cutting upstream hits
    significantly.
-   X-Cache-Status header added so cache behaviour is observable
    (MISS/HIT/EXPIRED).

Static asset cache headers — tommertron, help, coefficiencies

Files: sites-available/tommertron, sites-available/help,
sites-available/coefficiencies

Added location blocks before the catch-all location / in each HTTPS
server block:

    # Content-hashed JS/CSS — safe to cache forever
    location ~* \.(css|js)$ {
        try_files $uri =404;
        expires 1y;
        add_header Cache-Control "public, immutable";
        access_log off;
    }

    # Images, fonts, icons — 30 days
    location ~* \.(png|jpg|jpeg|gif|ico|svg|webp|woff|woff2|ttf|webmanifest)$ {
        try_files $uri =404;
        expires 30d;
        add_header Cache-Control "public";
        access_log off;
    }

    # XML feeds (RSS, sitemap) — 1 hour  [tommertron/help only]
    location ~* \.xml$ {
        try_files $uri =404;
        expires 1h;
        add_header Cache-Control "public";
    }

JS/CSS filenames on these sites are content-hashed (e.g.
appearance.min.8a082f81...js), so immutable + 1-year expiry is safe —
browsers and Cloudflare will never revalidate them unnecessarily.

Note: coefficiencies.com app routes (/packing, /mortgage, /charades,
/api) proxy to :8081 and are intentionally not cached.

Real IP passthrough (Cloudflare)

File: /etc/nginx/conf.d/cloudflare-real-ip.conf

The file already existed but had two issues:

1.  Missing the 173.245.48.0/20 Cloudflare range
2.  Missing real_ip_recursive on — without this, nginx only strips one
    proxy hop; Cloudflare can add multiple, leaving a CF IP in
    $remote_addr

Updated file now includes all 15 IPv4 ranges and 7 IPv6 ranges from
https://www.cloudflare.com/ips-v4 / ips-v6, plus:

    real_ip_header    CF-Connecting-IP;
    real_ip_recursive on;

All access logs going forward show actual client IPs. Historical logs
(pre-reload) still contain Cloudflare proxy IPs.