Cloud Reliability Engineer
Posted Mar 3
Headquarters: London, England
We are looking for talented engineers who are excited about open source cloud computing and are ready to join a globally distributed team charged with delivering world class services to our customers. Our Cloud Reliability Engineers (CREs) are responsible for operating cloud services for both Canonical and our customers and distilling that knowledge into our open source operations toolset. CREs also act as the escalation point for our support and operations teams and provide assistance in the form of operational expertise, engineering support, training, and mentoring.
KEY RESPONSIBILITIES & ACCOUNTABILITIES
- Extend Canonical’s Operations tools to automate operations and make our managed services more stable, predictable, and more capable of running the latest cloud technologies
- Deploy and operate production OpenStack clouds for Canonical and its clients.
- Evaluate, learn, adapt tooling, train team members, and assist with the addition of emergent technologies into our managed service offering
- Participate in troubleshooting, capacity planning, and performance analysis activities.
- Proactively update documentation, troubleshooting playbooks, policies and procedures
- Provide assistance and guidance as the escalation point for the Support and Operations teams
- Work collaboratively with globally distributed peers on engineering, operations, and support tasks
- Ensure all service level agreements are met
- Prioritize work appropriately to consistently achieve departmental and company goals
- Participate in an on call rotation based on region
REQUIRED SKILLS & EXPERIENCE
- Systems administration experience in a high availability environment managing large deployments
- Extensive knowledge of cloud computing concepts and technologies
- Experience with OpenStack in a production environment
- Experience with Software Defined Networks
- Practical knowledge of IP networking, routing and firewalls
- Strong scripting skills with Python and Bash
- Experience administering infrastructure services such as DNS, DHCP, TFTP, HTTP, etc.
- Able to take ownership of unfamiliar tasks and problems and see them through to completion
- Able to communicate clearly and effectively in English, especially using email and IRC
- Self-driven, able to troubleshoot and willing ask others when appropriate to find answers
- Capable of working from home full time and remain highly motivated, productive, and organized
- Familiarity with Ubuntu or Debian
DESIRED SKILLS & EXPERIENCE
- Experience with Juju and MAAS
- Experience with LXC, LXD, KVM
- Experience in a service provider environment
- Prior experience working from home full time and/or working with a globally distributed team
- Experience in a customer facing role
Canonical is an equal opportunity employer.