Staff Site Reliability Engineer
Headquarters: San Francisco, CA
View all CircleCI jobs →
CircleCI is seeking a Staff Site Reliability Engineer to work closely with our Software Engineers to deliver and manage the high-performance and scalable infrastructure underlying our multi-tenant Cloud offering as well as our Server-installed, on-premises solution. You will not only have the chance to automate and optimize infrastructure through the construction of appropriate tooling, but you will help software engineers through the design phase to optimize their services for scale in our production environment.
The CircleCI SRE team is globally distributed and remote-friendly. We take advantage of multiple timezones to manage a platform for our global customer base.
Velocity is critical for software teams in today's competitive landscape, but maintaining speed can be difficult as apps and systems grow larger and more complex. CircleCI’s platform allows developers to rapidly release code (for web and mobile apps) they trust by automating the build, test, and deploy process. CircleCI enables developers to detect and fix bugs before they even reach customers. Thousands of leading companies including Facebook, Kickstarter, Shyp and Spotify rely on CircleCI to accelerate delivery of their code and enable developers to focus on creating business value fast.
CircleCI is a Bay Area Best Places to Work 2016 award winner. Founded in 2011 and headquartered in beautiful downtown San Francisco with a global remote workforce, CircleCI is venture backed by Scale Venture Partners, DFJ, Baseline Ventures and Harrison Metal Capital.
What will make you successful:
- Experience managing a container-based microservice architecture, including orchestration, service-discovery, monitoring, and debugging
- Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP, the OSI Model, Subnetting, and Load Balancing
- In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.).
- Proficiency in one or more of: C, C++, Java, Python, Go
- Comprehensive knowledge of the internal workings of at least one of Postgres, Mongo, Redis
- Systematic problem solving approach, coupled with a strong sense of ownership and drive
What you will do:
- Design and deliver solutions to improve the availability, scalability, latency, and efficiency of CircleCI’s services.
- Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
- Diagnose and resolve production issues in conjunction with software engineering teams
- Architect and implement shared infrastructure used by all services within the CircleCI platform, for both SaaS and on-prem configurations
- Support and advise software engineering teams in the design of scalable services
- Build and maintain tools for deployment, monitoring, and debugging
- Plan and execute disaster recovery drills
- Participate in rotating on-call duties, including incident management
If you’re interested in joining the team at CircleCI, please send a resumé and let us know why you’d be a great fit for our team. If you contribute to an open source project, write a blog, or have a presence on the web (Twitter, GitHub, LinkedIn, etc.) we would love to hear about it.
We care deeply about diversity and inclusivity. We’re hiring at all experience levels, and seek talented teammates from a wide variety of backgrounds and experiences who are equally committed to cultivating a work environment of respect and kindness.
We carefully consider every applicant that takes the time to apply.