A while back I sat down and actually timed a getppid() loop to put a number on "syscalls are expensive." On the box I tested it came out around 80ns a call. Not nothing, but not the monster the folklore makes it.
The number isn't the point, and I keep having to remind myself of that. Eighty nanoseconds next to a single disk seek or a network hop is a rounding error. The damage is always multiplication. This week I found a tool reading a file one byte at a time with an unbuffered read in the loop, which is a syscall per byte, and on a few hundred megabytes that's the program's entire runtime spent crossing into the kernel and back.
The fix was the boring one it always is: wrap the reader in a buffer so it pulls a big chunk per syscall and hands out bytes from memory. Same logic, same output, a couple of orders of magnitude fewer trips across the boundary. So: measure the syscall once so you know roughly what it costs, then spend your effort making sure you're not making a million of them in a hot loop. That's the whole lesson.