NetBox Labs·21 days ago
NetBox Labs is seeking a Senior DevOps Engineer to join our Cloud Delivery team.
Cloud Delivery owns the infrastructure and platform that powers NetBox Cloud, our managed SaaS product. We run multi-tenant EKS clusters on AWS, operate the deployment and provisioning systems that serve thousands of customer instances, and own the reliability bar for the platform. This is a hands-on senior role: you'll lead technical work end to end, shape how the team operates, and raise the bar for the engineers around you.
This role is ideal for someone who thrives in fast-paced environments, enjoys solving infrastructure challenges at scale, and treats internal platforms like products - with velocity in mind.
Own and evolve the AWS infrastructure underpinning NetBox Cloud: EKS clusters, VPC design, IAM, RDS, and supporting services.
Lead infrastructure projects from design through delivery, scoping work into clear milestones and distributing tasks across the team.
Work with Product teams to deliver successful customer launches reliably and at scale.
Drive improvements to deployment automation and CI/CD pipelines (GitHub Actions), with a focus on reliability, speed, AI enablement and developer self-service.
Identify and address technical debt before it becomes a reliability or security risk.
Set and own SLOs for the systems you're responsible for; reduce toil through automation.
Contribute to SOC 2 compliance, security controls, and IAM governance.
Mentor engineers through code review, pairing, and design discussions.
Participate in on-call rotation and lead incident response end to end, including postmortems.
7+ years in DevOps, SRE, or platform engineering roles.
3+ years at a B2B SaaS company.
Strong AWS experience across EKS/Kubernetes, VPC, IAM, RDS, and cost management.
Deep IaC experience with Terraform and Helm.
Strong CI/CD experience, ideally with GitHub Actions.
Proficiency in Python and Go, with shell scripting.
Track record of scoping and delivering complex infrastructure projects across a team.
Strong communication skills: you can write a clear design doc and hold your own in an architecture discussion.
Experience with security and compliance in a production environment (SOC 2, IAM governance).
Experience with AI tools (Copilot, Cursor, Claude) as part of a daily engineering workflow.
Experience with observability tooling: Grafana, Prometheus, Loki, OpenTelemetry
Familiarity with the NetBox ecosystem or network automation.
Open-source contributions.