See more DevOps & Sysadmin jobs →

← Back to all jobs

Staff Site Reliability Engineer

Posted

CircleCI
Headquarters: San Francisco, CA
http://www.circleci.com
View all CircleCI jobs →

About the role

CircleCI is seeking a Staff Site Reliability Engineer to work closely with our Software Engineers to deliver and manage the high-performance and scalable infrastructure underlying our multi-tenant Cloud offering as well as our Server-installed, on-premises solution. You will not only have the chance to automate and optimize infrastructure through the construction of appropriate tooling, but you will help software engineers through the design phase to optimize their services for scale in our production environment.
 
The CircleCI SRE team is globally distributed and remote-friendly. We take advantage of multiple timezones to manage a platform for our global customer base.

What will make you successful:

  • Experience managing a container-based microservice architecture, including orchestration, service-discovery, monitoring, and debugging
  • Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP, the OSI Model, Subnetting, and Load Balancing
  • In-depth knowledge of operating systems (processes, threads, IPC, concurrency, locks, mutexes, semaphores, etc.).
  • Proficiency in one or more of: C, C++, Java, Python, Go
  • Comprehensive knowledge of the internal workings of at least one of Postgres, Mongo, Redis
  • Systematic problem solving approach, coupled with a strong sense of ownership and drive
  • Track-record of working cooperatively with software engineering teams
  • Focus on security in the delivery of all levels of a system
  • Passion for modern software development and operation, including agile, CI/CD, and infrastructure-as-code
  • Desire to learn and grow
  • 6+ years of experience
  • Design and deliver solutions to improve the availability, scalability, latency, and efficiency of CircleCI’s services.
  • Engage in service capacity planning and demand forecasting, anticipating performance bottlenecks
  • Diagnose and resolve production issues in conjunction with software engineering teams
  • Architect and implement shared infrastructure used by all services within the CircleCI platform, for both SaaS and on-prem configurations
  • Support and advise software engineering teams in the design of scalable services
  • Build and maintain tools for deployment, monitoring, and debugging
  • Plan and execute disaster recovery drills
  • Participate in rotating on-call duties, including incident management


      Help us maintain the quality of jobs posted on We Work Remotely. Let us know if this job isn’t really remote.

      Apply for this position

      If you’re interested in joining the team at CircleCI, please send a resumé and let us know why you’d be a great fit for our team. If you contribute to an open source project, write a blog, or have a presence on the web (Twitter, GitHub, LinkedIn, etc.) we would love to hear about it. Send resume to brian@circleci.com