The Afternoon A Full /var Took The Service Down
#debugging#ops
The Bug That Refused To Exist Under strace
#debugging#ops
The Bug That Only Existed When I Was Not Looking
#debugging#ops
A Systemd Unit That Refused To Stay Dead
#linux#sysadmin#ops
The Day It Was, In Fact, DNS
#debugging#ops
The systemd Unit That Refused to Stay Dead
#linux#sysadmin#ops
Cache Misses, and Why the Fast Version Was Slow
#performance#ops
A Memory Leak That Was a Map I Never Cleared
#debugging#ops
Moving Root Onto ZFS
#linux#sysadmin#ops
The Service That Wouldn't Stay Stopped
#linux#sysadmin#ops
A Kernel Panic I Could Actually Reproduce
#linux#sysadmin#ops
Three Days Lost to a Race I Couldn't Reproduce
#debugging#ops
A Runaway Process, And The cgroup That Caught It
#linux#sysadmin#ops
The Off-by-One That Three of Us Approved
#debugging#ops
When logrotate Wins the Rotation but Loses the Logs
#linux#sysadmin#ops
cgroups v2 and a Runaway Process
#linux#sysadmin#ops
The Bug Was in My Head, Not the Function
#debugging#ops
It Was the MTU. It's Always the MTU.
#debugging#ops
tcpdump Saved Me Again
#debugging#ops
The Off-By-One That Three of Us Approved
#debugging#ops
When the Code Was Right and I Was the Bug
#debugging#ops
The Bug Was in My Assumptions, Not the Code
#debugging#ops
the leak was a map that only ever grew
#debugging#ops
the query that was fine until it wasn't
#performance#ops
when in doubt, watch the wire
#debugging#ops
the fast version was slower, and the cache was why
#performance#ops
what a syscall actually costs, with numbers
#performance#ops
the cron job that fired twice and told no one
#debugging#ops
how cgroups v2 caught a runaway before it took the box down
#linux#sysadmin#ops
making peace with rootless podman
#linux#sysadmin#ops
how much swap, and why i stopped arguing about it
#linux#sysadmin#ops
a kernel panic that finally held still long enough to catch
#linux#sysadmin#ops
three days lost to a gap between two lines
#debugging#ops
when journald quietly ate forty gigabytes
#linux#sysadmin#ops
moving root onto zfs and not regretting it
#linux#sysadmin#ops
the day journald quietly ate forty gigabytes
#linux#sysadmin#ops
when journald quietly ate twelve gigs
#linux#sysadmin#ops
the cron job that ran twice and told no one
#debugging#ops
how much swap, finally answered for my own boxes
#linux#sysadmin#ops
moving root onto zfs without losing my nerve
#linux#sysadmin#ops
the cron job that fired twice and told no one
#debugging#ops
it was the mtu, it's always the mtu
#debugging#ops
perf top on a box that had no business being busy
#performance#ops
the year i stopped writing iptables rules by hand
#linux#sysadmin#ops
what a syscall actually costs you, with numbers
#performance#ops
how much does crossing into the kernel actually cost
#performance#ops
the leak was a cache that only ever grew
#debugging#ops
when the writeback storm hits and everything stalls
#linux#sysadmin#ops
the average is lying to you, look at p99
#performance#ops
watching syscalls without a debugger holding the door open
#performance#ops
the query that wasn't slow, it was just run a million times
#performance#ops
the process that ate a box and the cgroup that didn't
#linux#sysadmin#ops
stopping journald from eating the disk
#linux#sysadmin#ops
the service that wouldn't die, and the restart loop that hid it
#linux#sysadmin#ops
chasing a race condition for three days
#debugging#ops
the fast version was slower, and the cache told me why
#performance#ops
three days for a missing lock
#debugging#ops
the average response time that hid a fire
#performance#ops
the query that was quietly killing us
#performance#ops
three days chasing a bug that only existed sometimes
#debugging#ops
the cost of a syscall, measured
#performance#ops
the day dns took down everything
#debugging#ops
poking at io_uring for an afternoon
#linux#sysadmin#ops
moving root onto zfs without losing my nerve
#linux#sysadmin#ops
the query that was fine until it wasn't
#performance#ops
the rollback that turned a bad upgrade into a non-event
#linux#sysadmin#ops
the cron job that ran twice and told no one
#debugging#ops
taming journald disk usage
#linux#sysadmin#ops
the backup that overlapped itself and nobody noticed
#debugging#ops
how much swap, and the answer nobody on the internet agrees on
#linux#sysadmin#ops
running containers rootless without it ruining my week
#linux#sysadmin#ops
the leak was a map, and the map was me
#debugging#ops
putting a runaway process back in its box with cgroups v2
#linux#sysadmin#ops
chasing a race condition for three days
#debugging#ops
the bug that only existed when nobody was watching
#debugging#ops
the code was fine, i was the bug
#debugging#ops
the fast version was slower, and the cache told me why
#performance#ops
small packets fine, large packets gone: it was the mtu
#debugging#ops
it was never the database, it was dns
#debugging#ops
when the box freezes for a second, look at dirty_ratio
#linux#sysadmin#ops
the leak was a map, and the map was me
#debugging#ops
when the writeback stalls everything
#linux#sysadmin#ops
the cron job that ran twice and told no one
#debugging#ops
a kernel panic i could actually reproduce
#linux#sysadmin#ops
cache misses, and why the fast version was slower
#performance#ops
perf top on a box that should have been idle
#performance#ops
the leak was a map i forgot to empty
#debugging#ops
it was never the network, it was always dns
#debugging#ops
the outage was a full disk, the disk was full of logs about the outage
#debugging#ops
the code was fine, my mental model was the bug
#debugging#ops
the backup that ran on two boxes and nobody noticed
#debugging#ops
when journald ate half my root partition
#linux#sysadmin#ops
a kernel panic i could actually reproduce
#linux#sysadmin#ops
the flamegraph that pointed at the wrong hero
#performance#ops
the query that wasn't slow, it was just always running
#performance#ops
io_uring, and the joy of not making a syscall per read
#linux#sysadmin#ops
the leak was a map i forgot to delete from
#debugging#ops
the bug that only happened when nobody was watching
#debugging#ops
rootless containers, and making peace with subuid
#linux#sysadmin#ops
it's always dns, and this time it really was
#debugging#ops
how a single full filesystem took down a perfectly healthy service
#debugging#ops
a database query that was quietly killing us
#performance#ops
perf top on a box that shouldn't be busy
#performance#ops
the cron job that ran twice and told nobody
#debugging#ops
it was the mtu, it's always the mtu
#debugging#ops
the process that ate the box, and the cgroup that fenced it in
#linux#sysadmin#ops
when the logs lie, the wire doesn't
#debugging#ops
the query that was fine until it wasn't
#performance#ops
the app that kept writing to a deleted log file
#linux#sysadmin#ops
nftables instead of iptables, finally
#linux#sysadmin#ops
when the app, the logs and the dashboards all lied, tcpdump didn't
#debugging#ops
the cron job that ran twice and told no one
#debugging#ops
the average latency was fine, which is why everyone was angry
#performance#ops
yes i still run swap, even with plenty of ram
#linux#sysadmin#ops
when the page cache fights back
#linux#sysadmin#ops
the outage that was just a disk quietly filling up
#debugging#ops
how btrfs snapshots gave me my weekend back
#linux#sysadmin#ops
btrfs snapshots saved my weekend
#linux#sysadmin#ops
rewriting the firewall in nftables
#linux#sysadmin#ops
the box that was busy doing nothing
#performance#ops
when a busy box stalls every thirty seconds, look at dirty_ratio
#linux#sysadmin#ops
the bug that only existed when nobody was watching
#debugging#ops
when the app won't reopen its logs
#linux#sysadmin#ops
the idle box that wasn't
#performance#ops
i finally moved the firewall to nftables
#linux#sysadmin#ops
the runaway process that cgroups v2 quietly contained
#linux#sysadmin#ops
the day journald ate the root partition
#linux#sysadmin#ops
the systemd unit that would not stay dead
#linux#sysadmin#ops
the map that ate all the memory
#debugging#ops
the off-by-one three of us missed
#debugging#ops
the loop that processed every day except the last one
#debugging#ops
the off-by-one three people signed off on
#debugging#ops
the flame graph that pointed at a function i'd forgotten existed
#performance#ops
perf top, and an idle box that wasn't
#performance#ops
putting root on zfs, and why i would do it again
#linux#sysadmin#ops
the service that would not die
#linux#sysadmin#ops
putting a leash on a process that ate the box
#linux#sysadmin#ops
the night a logfile took the service down
#debugging#ops
the rare gift of a kernel panic that reproduces on demand
#linux#sysadmin#ops
the connection that hung at exactly the wrong size
#debugging#ops
when logrotate quietly stops working because the app won't reopen
#linux#sysadmin#ops
rootless podman, and the uid map that finally clicked
#linux#sysadmin#ops
when logrotate and your app disagree about SIGHUP
#linux#sysadmin#ops
what a syscall actually costs you
#performance#ops
how much swap does a homelab actually need
#linux#sysadmin#ops
the leak that was just a map nobody ever emptied
#debugging#ops
when logrotate and a deaf daemon disagree
#linux#sysadmin#ops
i was certain it was a race condition
#debugging#ops
a btrfs snapshot bought back my saturday
#linux#sysadmin#ops
when the write cache fights back: tuning dirty_ratio
#linux#sysadmin#ops
the flamegraph that pointed at logging, of all things
#performance#ops
stop letting journald eat your root partition
#linux#sysadmin#ops
how much does a syscall actually cost?
#performance#ops
the service that would not die
#linux#sysadmin#ops
flamegraphs and a hot path i never suspected
#performance#ops
the cost of a syscall, measured
#performance#ops
the average latency that lied to my face
#performance#ops
the bug that only existed when nobody was watching
#debugging#ops
rootless podman without the rage
#linux#sysadmin#ops
the flamegraph that pointed at the wrong file entirely
#performance#ops
the optimisation that made everything slower
#performance#ops
the leak was a map nobody ever deleted from
#debugging#ops
stopping journald from eating the disk
#linux#sysadmin#ops
when one process ate the box, and cgroups v2 finally fenced it in
#linux#sysadmin#ops
the off-by-one that three of us missed
#debugging#ops
the flamegraph pointed somewhere stupid
#performance#ops
btrfs snapshots saved my weekend
#linux#sysadmin#ops
a fence round the process that kept eating the box
#linux#sysadmin#ops
it was the mtu, it's always the mtu
#debugging#ops
moving root onto zfs and not regretting it
#linux#sysadmin#ops
the service that resurrected itself every ten seconds
#linux#sysadmin#ops
the connection that worked until the payload got big
#debugging#ops
the log file that grew forever because nobody told the app
#linux#sysadmin#ops
moving root onto zfs and not regretting it
#linux#sysadmin#ops
i finally moved off iptables
#linux#sysadmin#ops
small packets fine, big packets gone
#debugging#ops
the code was correct, my mental model wasn't
#debugging#ops
the afternoon one PHP worker tried to eat the box
#linux#sysadmin#ops
the average was fine and the customers were furious
#performance#ops
logrotate did its job, the app kept writing to a deleted file
#linux#sysadmin#ops
the query that was fine until it wasn't
#performance#ops
when in doubt, watch the wire
#debugging#ops
i finally rewrote my firewall in nftables
#linux#sysadmin#ops
rootless containers without losing my mind
#linux#sysadmin#ops
the runaway process that cgroups v2 caught for me
#linux#sysadmin#ops
the optimisation that made everything slower
#performance#ops
the average latency is lying to you
#performance#ops
the query that only got slow once it mattered
#performance#ops
the log that wouldn't rotate
#linux#sysadmin#ops
the journal grew until the disk noticed
#linux#sysadmin#ops
the outage that was just a full /var
#debugging#ops
the rollback that turned a disaster into a five minute job
#linux#sysadmin#ops
stop journald from eating the disk
#linux#sysadmin#ops
the cost of a syscall, measured
#performance#ops
rootless containers, and the few things that bite
#linux#sysadmin#ops
ebpf, or how i stopped guessing and watched the kernel
#performance#ops
taming journald disk usage
#linux#sysadmin#ops
the swap debate, settled for my homelab
#linux#sysadmin#ops
getting rootless containers working without losing my mind
#linux#sysadmin#ops
the outage that was just /var filling up
#debugging#ops
i finally moved off iptables, and it was overdue
#linux#sysadmin#ops
latency, p99, and the averages that lie to you
#performance#ops
when the box freezes for a second every minute
#linux#sysadmin#ops
three days for a bug that only happened when nobody was looking
#debugging#ops
it wasn't the database, it wasn't the network, it was dns again
#debugging#ops
caging a runaway process with cgroups v2
#linux#sysadmin#ops
ebpf, or finally being able to ask the kernel a question
#performance#ops
how much swap, and why i stopped arguing about it
#linux#sysadmin#ops
a btrfs snapshot turned a wrecked upgrade into a five-second rollback
#linux#sysadmin#ops
the app that politely ignored logrotate
#linux#sysadmin#ops
the night /var filled and took everything with it
#debugging#ops
the flamegraph pointed at the one function i'd ruled out
#performance#ops
the off-by-one we all signed off on
#debugging#ops
tcpdump saved me again
#debugging#ops
taming writeback with dirty_ratio
#linux#sysadmin#ops
the rare luxury of a kernel panic that came back on demand
#linux#sysadmin#ops
the outage was just /var being full
#debugging#ops
the off-by-one we all read and nobody saw
#debugging#ops
the bits of root-on-zfs nobody warns you about
#linux#sysadmin#ops
moving root onto zfs and not regretting it
#linux#sysadmin#ops
when the application logs lie, the wire doesn't
#debugging#ops
the code was fine, i was wrong
#debugging#ops
the off-by-one four of us read and none of us saw
#debugging#ops
the cron job that ran twice and told nobody
#debugging#ops
three days for a bug that lived in a missing word
#debugging#ops
the rewrite that was meant to be faster, and was not
#performance#ops
how much swap, and why i stopped arguing about it
#linux#sysadmin#ops
how much swap, and why i finally stopped arguing about it
#linux#sysadmin#ops
when logrotate works but the app keeps writing to the deleted file
#linux#sysadmin#ops
when the logs lie, tcpdump tells the truth
#debugging#ops
the query that was quietly killing us
#performance#ops
the fast version was slower, and the cache told me why
#performance#ops
bpftrace, and finally being able to ask the kernel a question
#performance#ops
the long version: why your write-heavy box stalls, and how to stop it
#linux#sysadmin#ops
the page cache was lying to me about disk writes
#linux#sysadmin#ops
the average is fine, the p99 is on fire
#performance#ops
the cron job that ran twice and told no one
#debugging#ops
small packets fine, big packets gone
#debugging#ops
the bug that only happened when nobody was watching
#debugging#ops
the query that was fine until it wasn't
#performance#ops
when a snapshot turned a ruined saturday into a five-minute rollback
#linux#sysadmin#ops
putting root on zfs, and what i actually gained
#linux#sysadmin#ops
the service that wouldn't stay stopped
#linux#sysadmin#ops
the idle box that was pegging a core
#performance#ops
when journald quietly ate a third of the disk
#linux#sysadmin#ops
i finally made my peace with swap
#linux#sysadmin#ops
a fencepost error nobody saw because it looked right
#debugging#ops
the off-by-one that three of us signed off on
#debugging#ops
perf top on a box that should have been idle
#performance#ops
it was dns, it is always dns
#debugging#ops
the runaway process, revisited: cpu and io weights in cgroups v2
#linux#sysadmin#ops
cgroups v2 and a runaway process
#linux#sysadmin#ops
the fencepost that three of us read and none of us saw
#debugging#ops
small packets fine, big packets gone, and a tunnel in the middle
#debugging#ops
the flamegraph that pointed at the one function i trusted
#performance#ops
when nothing was down except the names
#debugging#ops
the bug that only existed when nobody was looking
#debugging#ops
an idle box at forty percent cpu, and what perf top told me
#performance#ops
when the writeback stalls everything
#linux#sysadmin#ops
the cron job that ran twice and told nobody
#debugging#ops
half the requests worked, which is how i knew it was the mtu
#debugging#ops
a runaway process and the cgroup that caught it
#linux#sysadmin#ops
what does a syscall actually cost?
#performance#ops
the disk wasn't full, /var was
#debugging#ops
capping a runaway with one cgroups v2 line
#linux#sysadmin#ops
watching syscalls without a debugger in sight
#performance#ops
a kernel panic that did me the courtesy of being repeatable
#linux#sysadmin#ops
chasing a race condition for three days
#debugging#ops
i finally moved my firewall to nftables
#linux#sysadmin#ops
the query that ate a node every afternoon
#performance#ops
the rollback that turned a disaster into a footnote
#linux#sysadmin#ops
the "optimised" loop that ran slower than the naive one
#performance#ops
the log that kept growing after rotation
#linux#sysadmin#ops
how much swap, and the answer i finally stopped arguing about
#linux#sysadmin#ops
three days for a bug that only existed when I wasn't looking
#debugging#ops
when logrotate wins and the app keeps writing to the old file
#linux#sysadmin#ops
three days for a missing mutex
#debugging#ops
the average is lying to you
#performance#ops
when in doubt, put it on the wire and watch
#debugging#ops
it's always dns, and this time it really was
#debugging#ops
the disk wasn't full, only the partition that mattered
#debugging#ops
the bug that only existed when nobody was watching
#debugging#ops
io_uring, first impressions
#linux#sysadmin#ops
when nothing made sense, the wire did
#debugging#ops
the code was fine, i wasn't
#debugging#ops
the panic that turned up on demand
#linux#sysadmin#ops
the outage caused by a full /var
#debugging#ops
the query that was bleeding us for months
#performance#ops
stop the kernel hoarding dirty pages
#linux#sysadmin#ops
rootless containers, and the subuid rabbit hole
#linux#sysadmin#ops
the night a build job ate the whole machine
#linux#sysadmin#ops
when nobody believes the network, run tcpdump
#debugging#ops
three days, one race, and a log line that lied to me
#debugging#ops
perf top on a box that shouldn't be busy
#performance#ops
a kernel panic I could actually reproduce
#linux#sysadmin#ops
perf top on a box that shouldn't be busy
#performance#ops
watching the kernel work, finally, with ebpf
#performance#ops
the off-by-one that three of us approved
#debugging#ops
perf top on a box that had no business being busy
#performance#ops
it's never dns, until it is
#debugging#ops
moving the firewall to nftables, at last
#linux#sysadmin#ops
when in doubt, put it on the wire
#debugging#ops
the night a forgotten log file took down the lot
#debugging#ops
three days hunting a race condition that only existed under load
#debugging#ops
the unit that would not die
#linux#sysadmin#ops
the day /var filled up and took everything with it
#debugging#ops
a database query that was quietly killing us
#performance#ops
a systemd unit that refused to stay dead
#linux#sysadmin#ops
the day dns took down everything
#debugging#ops
the query that worked fine until it didn't
#performance#ops
the average is fine, which is exactly the problem
#performance#ops
who is eating all the cpu on an idle server
#performance#ops
the query that wasn't slow, just slow ten thousand times
#performance#ops
the outage caused by a full /var
#debugging#ops
the leak was a map i forgot to ever delete from
#debugging#ops
the orm that hid a thousand queries
#performance#ops
how much does a syscall actually cost
#performance#ops
i finally moved off iptables, and i'm not going back
#linux#sysadmin#ops
when a box stalls every thirty seconds, look at dirty_ratio
#linux#sysadmin#ops
the idle server that wasn't idle
#performance#ops
when in doubt, look at the wire
#debugging#ops
the flamegraph that pointed at the wrong thing, then the right one
#performance#ops
the off-by-one three of us read and approved
#debugging#ops
putting root on zfs without regretting it
#linux#sysadmin#ops
yes, i still give my servers swap
#linux#sysadmin#ops
the day i could finally watch the kernel work
#performance#ops
putting root on zfs, and why i finally bothered
#linux#sysadmin#ops
three days hunting a bug that only happened when i wasn't looking
#debugging#ops
the one panic that didn't waste my week
#linux#sysadmin#ops
when /var fills up and everything gets weird
#debugging#ops
the panic that only showed up under load
#linux#sysadmin#ops
a first poke at io_uring
#linux#sysadmin#ops
the average that hid the outage
#performance#ops
the service that kept coming back
#linux#sysadmin#ops
watching the kernel without a debugger
#performance#ops
how much swap, then? what i actually do on my own boxes
#linux#sysadmin#ops
the systemd unit that would not stay dead
#linux#sysadmin#ops
the outage that was just a full /var
#debugging#ops
the average is lying to you, look at p99
#performance#ops
the report that doubled and nobody noticed
#debugging#ops
eBPF, or finally being able to ask the kernel what it's doing
#performance#ops
the bug that only existed when nobody was looking
#debugging#ops
why the average latency was lying to me
#performance#ops
perf top on a box that should have been asleep
#performance#ops
when a write-heavy box stalls every thirty seconds
#linux#sysadmin#ops
ebpf, and finally seeing what the kernel sees
#performance#ops
io_uring, first impressions from the bleeding edge
#linux#sysadmin#ops
three days lost to a bug that only happened when i wasn't looking
#debugging#ops
poking at io_uring on a fresh kernel
#linux#sysadmin#ops
the rare gift of a kernel panic you can reproduce
#linux#sysadmin#ops
how much swap, and the answer i stopped arguing about
#linux#sysadmin#ops
the cron job that ran twice and said nothing
#debugging#ops
switching to nftables, at last
#linux#sysadmin#ops
the leak was a map i kept adding to and never deleting from
#debugging#ops
i moved a firewall to nftables and stopped flinching
#linux#sysadmin#ops
perf top on a box that was meant to be idle
#performance#ops
when logrotate wins the battle and the app keeps writing to the old file
#linux#sysadmin#ops
it's never dns, until it's the only resolver in the house
#debugging#ops
the swap question, and why zram quietly won me over
#linux#sysadmin#ops
the bug that fixed itself the moment i looked at it
#debugging#ops
yes i still run swap, and here is the one setting that matters
#linux#sysadmin#ops
the log file that grew forever no matter what logrotate did
#linux#sysadmin#ops
the slow leak that was just a map nobody deleted from
#debugging#ops
the rare gift of a kernel panic that happened on demand
#linux#sysadmin#ops
moving the firewall to nftables, at long last
#linux#sysadmin#ops
the unit that came back from the dead, repeatedly
#linux#sysadmin#ops
eBPF, or finally seeing what the kernel sees
#performance#ops
journald ate the disk again, so i bounded it
#linux#sysadmin#ops
i finally moved a box from iptables to nftables
#linux#sysadmin#ops
the flamegraph that pointed at the wrong thing, which was the right thing
#performance#ops
a day lost to packets that almost made it
#debugging#ops
when the journal quietly ate the disk
#linux#sysadmin#ops
the off-by-one four of us nodded straight past
#debugging#ops
how much swap, and why i stopped arguing about it
#linux#sysadmin#ops
when a busy box stalls every thirty seconds, look at dirty_ratio
#linux#sysadmin#ops
ebpf, and finally seeing what the kernel sees
#performance#ops
what a syscall actually costs you
#performance#ops
moving root onto zfs, and the boot snag nobody warns you about
#linux#sysadmin#ops
cgroups v2 caught the thing cgroups v1 kept letting through
#linux#sysadmin#ops
the bug that only existed when nobody was watching
#debugging#ops
io_uring is the async i/o interface linux always needed
#linux#sysadmin#ops
the idle server that was burning a whole core
#performance#ops
the service i could not convince systemd to stop restarting
#linux#sysadmin#ops
moving my root filesystem onto zfs without losing my nerve
#linux#sysadmin#ops
the night /var filled up and took the whole box with it
#debugging#ops
when too much page cache becomes a problem
#linux#sysadmin#ops
the query that was bleeding us a hundred milliseconds at a time
#performance#ops
the service that wouldn't stay stopped
#linux#sysadmin#ops
the off-by-one four of us read and none of us saw
#debugging#ops
it was the mtu. it's always the mtu
#debugging#ops
the bug that only existed when nobody was watching
#debugging#ops
the outage that wasn't down, it was just lying
#debugging#ops
the service that came back from the dead every ten seconds
#linux#sysadmin#ops
three days for one missing lock
#debugging#ops
when the write-back stalls: tuning dirty_ratio on a busy box
#linux#sysadmin#ops
the slow leak that was just a map nobody ever emptied
#debugging#ops
the flamegraph that pointed at the one function i'd never have profiled
#performance#ops
the outage caused by a full /var
#debugging#ops
the bug that only existed when nobody was watching
#debugging#ops
when logrotate and a stubborn daemon disagree
#linux#sysadmin#ops
a kernel panic with the decency to be repeatable
#linux#sysadmin#ops
the flamegraph that pointed at the last function I'd have guessed
#performance#ops
journald ate the disk, here is the one line that stops it
#linux#sysadmin#ops
how much swap, asked for the hundredth time
#linux#sysadmin#ops
a botched upgrade, and the btrfs snapshot that undid it in seconds
#linux#sysadmin#ops
when the box freezes for five seconds and writeback is to blame
#linux#sysadmin#ops
the N+1 that hid behind a fast endpoint
#performance#ops
it was the mtu, it's always the mtu
#debugging#ops
the outage that was just a full /var
#debugging#ops
the backup that ran twice and corrupted itself
#debugging#ops
a vpn that throttled itself, and yes it was the mtu
#debugging#ops
three days lost to a goroutine that started too early
#debugging#ops
a botched upgrade, and the snapshot that undid it
#linux#sysadmin#ops
penning in a runaway process with cgroups v2
#linux#sysadmin#ops
the join that got expensive when nobody was looking
#performance#ops
stopping journald from eating the disk
#linux#sysadmin#ops
tuning dirty_ratio on a box that writes a lot
#linux#sysadmin#ops
a kernel panic i could actually reproduce
#linux#sysadmin#ops
a map with no exit, and the eviction i should have written first
#debugging#ops
the hot path was in the logging
#performance#ops
the unbounded map, and how i finally went looking for it
#debugging#ops
how much does crossing into the kernel actually cost
#performance#ops
it wasn't the network, it was the names
#debugging#ops
the cache that grew until the box fell over
#debugging#ops
ebpf, and finally seeing what the kernel sees
#performance#ops
the overlapping cron job that ate its own tail
#debugging#ops
when a job fired twice because two clocks disagreed
#debugging#ops
switching root onto zfs, carefully
#linux#sysadmin#ops
the outage caused by a full /var
#debugging#ops
ebpf, or how i stopped guessing about a slow box
#performance#ops
when writeback stalls everything, look at dirty_ratio
#linux#sysadmin#ops
the slow leak that was a cache i forgot to evict
#debugging#ops
it was dns, it is always dns
#debugging#ops
the outage where the disk was full and nobody had noticed
#debugging#ops
the day dns took down everything, again
#debugging#ops
the outage nobody saw coming, because /var was full
#debugging#ops
the outage caused by a full /var
#debugging#ops
instrumenting the kernel without rebooting it
#performance#ops
it was never the network, it was the resolver
#debugging#ops
the flamegraph that pointed at the one line i'd never have guessed
#performance#ops
the outage that was just a full /var
#debugging#ops
moving the firewall to nftables, finally
#linux#sysadmin#ops
the night /var filled up and took the app with it
#debugging#ops
small packets fine, big packets gone: the mtu strikes again
#debugging#ops
the outage that wasn't the database, it was dns again
#debugging#ops
what a syscall actually costs, with numbers
#performance#ops
it was the mtu, it's always the mtu
#debugging#ops
the cron job that ran twice and never told me
#debugging#ops
a systemd unit that refused to stay dead
#linux#sysadmin#ops
when logrotate and a stubborn daemon disagree
#linux#sysadmin#ops
moving a small firewall from iptables to nftables
#linux#sysadmin#ops
three days inside a race i couldn't reproduce
#debugging#ops
when logrotate rotates the file and the app keeps writing to nowhere
#linux#sysadmin#ops
one rollback and the weekend was mine again
#linux#sysadmin#ops
it was never the app, it was always dns
#debugging#ops
moving root onto zfs, and why i'd do it again
#linux#sysadmin#ops
your average latency is lying to you
#performance#ops
two boxes, one cron line, and a backup that ran in stereo
#debugging#ops
the map that grew until the process didn't
#debugging#ops
how much swap, and why I stopped arguing about it
#linux#sysadmin#ops
the average latency is lying to you
#performance#ops
the cron job that ran twice and told no one
#debugging#ops
what a syscall actually costs you
#performance#ops
the day /var quietly filled up
#debugging#ops
the btrfs snapshot that turned a bad upgrade into a five-minute fix
#linux#sysadmin#ops
your average latency is lying to you
#performance#ops
getting the systemd journal to stop eating the disk
#linux#sysadmin#ops
the job that fired twice because two boxes thought they were in charge
#debugging#ops
the cache that only ever grew
#debugging#ops
the service that kept coming back from the dead
#linux#sysadmin#ops
when logrotate and a stubborn daemon disagree
#linux#sysadmin#ops
how much swap, and why i stopped arguing about it
#linux#sysadmin#ops
putting root on zfs and not regretting it
#linux#sysadmin#ops
the leak was a map i kept adding to and never pruned
#debugging#ops
everything broke because /var was full
#debugging#ops
the log file that grew to forty gigabytes
#linux#sysadmin#ops
the leak was a map, and the map was me
#debugging#ops
moving a box from iptables to nftables at last
#linux#sysadmin#ops
the query that was quietly killing us
#performance#ops
a syscall is about 80 nanoseconds, so stop making so many
#performance#ops
the night /var filled up and took the lot with it
#debugging#ops
the flamegraph found the thing I'd sworn was fine
#performance#ops
how much does a syscall actually cost?
#performance#ops
stopping journald from eating the disk
#linux#sysadmin#ops
the leak was a map i forgot to empty
#debugging#ops
moving my root filesystem onto zfs
#linux#sysadmin#ops
it was dns, it is always dns
#debugging#ops
the cron job that ran twice and told no one
#debugging#ops
moving root onto zfs without losing my nerve
#linux#sysadmin#ops
the day journald quietly ate the disk
#linux#sysadmin#ops
the flamegraph that pointed at the one function i'd never have guessed
#performance#ops
the hot path was in the logging
#performance#ops
flamegraphs and a hot path i never suspected
#performance#ops
logrotate vs the app that wouldn't let go
#linux#sysadmin#ops
the flamegraph that pointed at the wrong file entirely
#performance#ops
the leak was a map i forgot to empty
#debugging#ops
the weekend btrfs snapshots earned their keep
#linux#sysadmin#ops
moving a root filesystem onto zfs on linux
#linux#sysadmin#ops