Migrating a small Postgres workload to managed RDS
We moved a ~20 GB Postgres workload from a self-hosted instance running on a single VM to AWS RDS. Single-day cutover, no production downtime visible to users. Here are the bits that didn't make it into the post-incident review document because nothing actually went wrong, but were nevertheless surprising.
Backup-and-restore was the easy part
pg_dump → S3 → RDS restore via pg_restore worked first try. The annoying parts came after.
The annoying parts
Extensions. Our self-hosted instance had pg_trgm, citext, and a custom function that depended on plpython3u. RDS supports the first two natively, but plpython3u required a workaround (we ended up rewriting the function in plpgsql, which was fine but unplanned).
Connection counts. The default RDS max_connections for our instance size was 199, which sounded high until our connection-pooling library decided to spin up 220 of them on cold start. We learned about RDS's parameter groups in the most embarrassing way possible. Bumped via custom parameter group, applied, restart-after-Apply.
Logging. RDS's slow-query log went to CloudWatch instead of disk. Our existing log shipper assumed files. Wrote a small Lambda to forward CloudWatch logs to our existing pipeline — annoying but fine.
The surprises (good)
Automated backups + point-in-time recovery, both built in, removed two cron jobs and a backup-monitoring alert from our infrastructure. The S3 cost of the backups is genuinely negligible at our scale.
Replica creation for read-heavy reporting workloads is a one-click operation. We used to plan it as a weekend project.
What we'd do differently
Test the connection-pooling behavior earlier. Don't rely on the default parameter group. Confirm extension support BEFORE the cutover, not during.
Net: would do this again, would not be smug about it being "just a managed database".