bgp in the homelab, because why not

A datacentre aisle with rows of equipment

I run BGP at home now, and before you close the tab: it's not because I have a second autonomous system and a transit provider in the garage. It's because BGP turned out to be the cleanest way to solve a problem I actually had, which is getting a service IP to follow a service around between hosts without a pile of fragile glue.

The problem, concretely. I have a handful of containers and VMs spread across a couple of nodes, and some of them present services I want reachable on a stable IP regardless of which node they happen to be running on. The traditional answers are static routes (which you then have to remember to change by hand, which means you won't) or keepalived doing VRRP for a floating IP (which works but only floats between a fixed pair, and only does layer-3 reachability, it doesn't know whether the service is actually up). I wanted the IP to be advertised only while the thing behind it was genuinely healthy, and to move on its own when it wasn't.

That is, almost exactly, what BGP does for a living.

the shape of it

The setup is small. Each host that wants to advertise a service IP runs a routing daemon and peers with the router over BGP. The host advertises a /32 (or /128) for the service address. The router learns it, installs the route, and now traffic to that IP goes to whichever host is currently advertising it. When the service stops being healthy, the host withdraws the route, and within a couple of seconds the router has forgotten that path entirely. No floating, no failover script, no "is the other node alive" guesswork. The route exists if and only if the service does.

I'm using FRR on the hosts and BGP on the router (an OPNsense box running the FRR plugin, which is the same FRR underneath). The hosts sit in a private ASN, the router in another, a plain eBGP session between them.

A trimmed host config looks roughly like this:

! /etc/frr/bgpd.conf
router bgp 65010
 bgp router-id 10.0.0.10
 neighbor 10.0.0.1 remote-as 65000
 !
 address-family ipv4 unicast
  network 10.0.0.50/32
 exit-address-family

The 10.0.0.50/32 is the service IP. The trick that makes it actually useful, rather than just a static route in a trench coat, is to only have that route present in the host's routing table when the service is healthy. I do that by binding the service IP to a loopback (or dummy) interface and tearing it down when the health check fails, with FRR configured to only redistribute connected routes that are actually present. When the dummy interface loses the address, the network statement has nothing to advertise, the route withdraws, done.

why this beats the alternatives

The honest comparison.

Versus static routes: static routes don't react to anything. The whole point here is reactivity, the route appearing and disappearing with the service.
Versus VRRP/keepalived: VRRP floats a single IP between a fixed set of nodes and is fundamentally about the node being up, not the service. BGP lets any number of hosts advertise, lets me weight and prefer paths, and the route is tied to service health rather than node liveness. It also scales sideways: adding a third or fourth node is just another peer, not a reshuffle.
Versus ECMP load-balancing: I get that almost for free. If two hosts advertise the same /32 with equal preference, the router can hash flows across both. Withdraw one and traffic shifts to the other in seconds. That's a load balancer made of routing table, which is a deeply satisfying thing to have.

the bits that bit me

It wasn't all smooth, and the failures were instructive.

First, timers. The default BGP hold timer is generous (it's tuned for the internet, where flapping is expensive and you don't want to overreact to a blip). At home I want fast withdrawal, so I dropped the keepalive and hold timers right down. Be careful here: too aggressive and a momentary hiccup tears the session down and takes all your routes with it, which is worse than the problem you were solving. I settled on values that react in a few seconds, not sub-second. This isn't a trading floor.

Second, and this is the one that cost me an evening: I advertised a /32 that overlapped with a subnet the router already had a connected route for, and the connected route won on longest-prefix-match grounds, so my lovingly-crafted BGP route did precisely nothing. The service IPs need to live somewhere that isn't already directly attached to the router, or you spend an hour wondering why the routing table looks right and the traffic still goes the wrong way. Read the actual FIB, not the BGP table, when something doesn't add up. show ip route 10.0.0.50 told me the truth the moment I bothered to ask it.

Third, the obligatory warning: this is more moving parts than a homelab strictly needs, and if BGP itself wedges you've now made your network's control plane depend on a daemon you have to understand. I think that's a fair trade for what it buys, but it is a trade. Keep the router's own management reachable by something dumb and static so you can always get in to fix BGP when BGP is the thing that's broken.

was it worth it

Yes, and not only for the result. The result is good: service IPs that follow their services, sub-flap failover, sideways scaling, all with config I can read. But the better return was that running BGP on hardware I own, where I can break it and watch it recover, taught me more about how the protocol actually behaves than years of reading about it ever did. The homelab earns its keep as a place to be wrong cheaply.

So: BGP in the homelab, because why not. Turns out "why not" had a perfectly good answer, which was "because it's the right tool," I just had to build it to find out. A fine note to end the year on.