See more Programming jobs →

← Back to all jobs

Cloud Reliability Engineer

Posted

Canonical
Headquarters: London, England
https://www.canonical.com/careers/all-vacancies
View all Canonical jobs →

Canonical delivers and manages OpenStack and Kubernetes for leading companies around the world. Our Cloud Developers are responsible for operating cloud services for both Canonical and our customers. They are software engineers with a particular focus on operations, automation, rapid analysis and problem resolution. We pride ourselves on having world-leading operations agility and quality, and we distill that knowledge into our open source operations toolset.

This role is ideal for skilled software engineers with a passion for distributed systems and an interest in the entire Linux stack - from kernel to networking to virtualization and containers. It is a demanding role that requires rigor in both code and customer interactions.

KEY RESPONSIBILITIES AND ACCOUNTABILITIES:

  • Understand and operate cloud and container technology from kernel to dashboard - OpenStack and Kubernetes
  • Automate operations for reuse across the worlds largest companies, taking into consideration the complexities of distributed systems
  • Demonstrate expertise in both the technology and industry operations standards
  • Implement new features and improve the resilience and scalability of the existing cloud and container portfolio at Canonical
  • Automate testing and benchmarking capabilities for low-level and high-level software Operate production OpenStack clouds for Canonical and its clients
  • Operate production Kubernetes clusters for Canonical and its clients
  • Develop skills in troubleshooting, capacity planning, and performance analysis
  • Collaborate on documentation, playbooks, policies and procedures
  • Provide assistance and guidance to Canonical’s Support and Operations teams
  • Collaborate with globally distributed engineering, operations, and support peers
  • Ensure service level agreements are met
  • Carry final responsibility for time-critical escalations

REQUIRED SKILLS AND EXPERIENCE:

  • Engineering degree, preferably in computer science or software engineering
  • Python software development experience, with large projects
  • Extensive knowledge of cloud computing concepts and technologies
  • Practical knowledge of Linux networking, routing, and firewalls
  • Hands-on experience administering Linux servers for personal use
  • Able to communicate clearly and effectively in English over email, IRC, and in person
  • Self-driven, able to troubleshoot from kernel to web, and willing ask others when appropriate
  • Highly motivated, productive, organized and capable of working from home full time
  • Familiar with Ubuntu or Debian

Help us maintain the quality of jobs posted on We Work Remotely. Let us know if this job isn’t really remote.