Staff Software Engineer Infrastructure

Remote Lead 09.06.2026
Software Engineer backend-developer DevOps Software Engineer
Auf einen Blick

Staff Software Engineer Infrastructure bei Docker: Baue Self-Service Platform-Systeme für Multi-Region Deployment mit Terraform/Argo CD/EKS. Global remote, Staff-Level, On-Call erforderlich.

💰 $200.000–260.000/Jahr 📊 Senior 🕒 Vollzeit 🌍 Remote 🗺️ Worldwide
  • Staff-Level Engineering
  • Terraform/GitOps Expertise
  • Kubernetes/EKS
  • Platform-Thinking
Terraform Kubernetes/EKS GitOps/Argo CD Networking Go system design on-call
✅ Geeignet für
  • Staff/Senior Platform Engineers
  • Infrastructure Architects
  • DevOps Leaders
🚫 Weniger geeignet
  • Junior/Mid-Level Engineers
  • On-call-skeptische Kandidaten
  • Non-Platform-Fokussierte
💡 Gut zu wissen
  • On-call Rotation nach Onboarding/Shadowing erforderlich
  • Team: 4 jetzt, wachsend auf 7 (1 dieser Hires)
  • Multi-Region EKS Komplexität, keine einfache Legacy-Migration
  • AI-Assisted Ops: Alerting, Diagnosis, Onboarding Automation im Fokus

Docker is one of the most loved brands in developer tooling, trusted by over 20 million monthly users and more than 20 billion container image pulls. From solo founders to the world's largest companies, developers rely on Docker to build, share, and run their applications using products like Docker Desktop, Docker Hub, and Docker Scout. As a remote-first, globally distributed team, we’re defining how software is built and delivered. As AI agents reshape software development, Docker is at the forefront of this transformation, offering sandboxed environments, verified images, and secure infrastructure that make autonomous workflows trustworthy by default. Docker will release a wave of new products this year, powered by R&D efforts that will likely lead to even more innovation. We're therefore heavily investing in the infrastructure platform that supports these products. This platform enables hundreds of developers across multiple teams and supports high-scale production workloads and data transfers daily. While it has grown rapidly, its foundations need reinforcement. This year, our focus is closing that gap. Today, many processes depend on a few experts manually unblocking provisioning and operational workflows. The top priority for this role is to transition away from this reliance to self-service systems characterized by clear ownership, safe defaults, and measurable adoption. The vision is to have an engineer-trusted platform that just works—freeing teams to concentrate on their own products rather than ours. A specific example involves reducing the time needed to establish a new global region or application environment from days to hours. Achieving this requires building robust foundations, like a true multi-region, cross-account network architecture and trustworthy testing and continuous-deployment pipelines, topped by a self-service layer. As the "container company" crafting its own platform, we hold ourselves to high standards for making the easy path also the safe path. You’ll join a growing team of four (expanding to seven this year) and serve as a Staff Engineer to guide technical direction and oversee the adoption of these initiatives in production environments. Responsibilities In this Staff-level role, success is assessed through leverage—the effectiveness of your team and the systems you help build—rather than just your individual contributions. While remaining hands-on in the codebase, you’ll also set direction, align teams on practical standards, and drive platform investments toward adoption. Specifically, you will: - Tackle ambiguous infrastructure problems by creating clear proposals, guiding them through RFCs and architecture reviews involving multiple teams. - Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations, ensuring they have clear contracts and documentation that teams use. - Establish standards for implementation using tools like Terraform, GitOps with Argo CD, progressive deployment methodologies, and robust testing frameworks, including building the missing continuous-deployment pipeline. - Enhance EKS-based multi-tenant infrastructure for better reliability, security, scalability, and cost management (e.g., Envoy Gateway ingress, traffic routing, multi-region, cross-account connectivity). - Strengthen SLOs, alerting mechanisms, and post-incident analysis (using tools like Grafana Cloud) to ensure safer and more dependable production environments. The effectiveness of this work will be evaluated based on outcomes observable to the teams who rely on this infrastructure. Operational metrics include provisioning speed, ease of delivery, and reliability. AI-assisted Operations We are incorporating AI-assisted and autonomous workflows to reduce routine operational tasks. Emphasis is placed on keeping these automated processes safe, auditable, and human-reviewed. Early AI initiatives include: - Alert enrichment and incident context-gathering, designed to provide on-call engineers with immediate, actionable context. - Runbook-guided diagnostics and remediation suggestions, ensuring human involvement in production-altering decisions. - Onboarding and readiness tools to automate the assistance currently provided by internal experts. If you have experience with operational automation and a discerning perspective on its appropriate use, this role offers an opportunity to shape where these tools fit—and where they don’t. On-call Responsibilities Operational ownership is part of this role. You’ll join the on-call rotation post-onboarding and shadowing. As a Staff Engineer, you’ll also focus on improving the on-call process itself by refining alerts, enhancing runbooks, minimizing toil, and leading blameless postmortems aimed at prevention. Qualifications: - 8+ years of professional experience in backend, infrastructure, or platform engineering. - Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience. - Strong software development expertise in Go or similar languages, with proficiency in design, testing, debugging, and long-term maintenance. - Proven experience in designing, deploying, and managing production cloud services or infrastructure platforms. - Specialized expertise in at least one domain (e.g., Kubernetes, networking, cloud platforms, reliability engineering, or developer platforms) combined with a strong foundation in Linux, networking, and production operations. - Experience defining technical direction and leading initiatives requiring cross-team collaboration. - Excellent written and verbal communication skills in a remote work environment (e.g., writing RFCs, architecture proposals, post-incident reports). Nice to Have: - Familiarity with the following: EKS, networking (e.g., ingress, CNI, service meshes), observability stacks (OpenTelemetry, Prometheus, Grafana), CI/CD tools (GitHub Actions, Argo CD, canary deployments), or experience leading cross-team migrations/adoption projects. What to Expect - First 30 days: Build context, meet partnering teams, implement your first change, and shadow on-call handlers. - First 90 days: Take ownership of a strategic platform challenge and lead it from planning to production. - One-year outlook: Lead major cross-team initiatives (e.g., self-service provisioning of regions and environments, or multi-region networking/CD foundations) and establish foundational patterns that redefine how Docker engineers build and operate services. Perks: - Flexibility to adjust work around personal life. - Quarterly Whaleness Days and an end-of-year Whaleness break. - Support for home office setup to work comfortably. - 16 weeks of paid parental leave (after six months of employment). - A $100/month technology stipend. - Generous PTO, training resources, and an equity plan to share in the company’s growth. - Medical benefits, retirement contributions, and vacation vary by geography. - A remote-first company culture, with offices in Seattle and Paris. Docker is committed to diversity and inclusion. We strive to build a team that reflects various backgrounds, perspectives, and skills. Join us to make our company and products even greater.