Phaidra

Site Reliability Engineer

Job Description

Posted on: 
June 21, 2024

Phaidra is looking for a driven Site Reliability Engineer to be a part of our engineering team. You are bold and creative, and have deep empathy for customers who may not be tech-savvy. You will work in the Infrastructure Engineering team to build and maintain world class infrastructure. You will have the opportunity to make an immediate impact with your work and guide the product and team as we grow.

Responsibilities

The ideal candidate has expertise building and managing cloud infrastructure on AWS, GCP or Azure and has good knowledge of Kubernetes, CI/CD and Observability. Your responsibilities will include flavors of Infrastructure Engineering, MLOps, SRE and DevOps. As a Site Reliability Engineer, it will be expected of you to be an Individual Contributor (IC) in the Infrastructure Engineering team.

You will help build and maintain infrastructure for:
Large-scale data ingestion and processing.
Distributed model training, evaluation and inference.
Automating the end-to-end system for continuous improvement and deployment.
Developer environments and build systems.
Multi-cloud deployments.
You will work with cloud services like AWS, Azure, GCP.
You will work with Cloud Native technologies like Kubernetes, Prometheus and gRPC.
You will help build CI/CD infrastructure, pipelines and take part in DevOps duties.
You will apply SRE principles for observability, SLOs, automation and change management.
You will write and maintain tooling and documentation for infrastructure, supported applications and processes.
Build and maintain cross-functional relationships with internal teams to drive initiatives.

Job Requirements

5+ years of work experience.
Bachelors or Masters in Computer Science, or equivalent experience.
Proven experience automating Cloud and Networking infrastructure on AWS, GCP or Azure.
Good understanding of Linux-based Operating Systems, Containerisation and Orchestration technologies like Docker and Kubernetes.
Experience with Terraform or other configuration management tools like Jsonnet, Kapitan, Helm or Kustomize.
Experience with Monitoring stacks such as Prometheus, Influx, Stackdriver or Zabbix.
Programming experience, ideally with Python, Go or Bash scripting.
Experience with writing Kubernetes Operators.
Good understanding of DevOps, SRE principles and Platform Engineering.
Share our company values: curiosity, ownership, transparency & directness, outcome-based performance, and customer empathy.

Apply now

More job openings