Modern Health

Staff Site Reliability Engineer

Job Description

Posted on: 
May 15, 2024

In this role, you'll be given lots of responsibility and the opportunity to have true ownership as we build out the product. This is a unique opportunity to use your engineering powers to make a direct impact in people's lives. We need a Staff Site Reliability Engineer who is enthusiastic about building reliable, scalable, and flexible systems to support our growing team, product, and user base. You'll work with other engineers to reliably release and maintain services, and help define and meet internal and customer-facing SLA's and SLO's.

Responsibilities

Manage and orchestrate Cloud Resource (AWS) configuration using Infrastructure As Code (Terraform) to empower engineering staff to embrace a DevOps culture of Self Service Ownership
Develop and govern Observability (Datadog) best practices for tracking platform performance and health trends to meet customer SLAs and lead technical decisions with strong supporting evidence
Create solutions that dynamically scale based on demand with enough flexibility to pivot for fast changing project requirements while maintaining a balance of good versus perfect
Provide strong and consistent communication updates on technical progress or blockers to keep stakeholders informed while additionally creating appropriate documentation on technical design to spread knowledge and reduce information silos
Participate and respond to 24/7 on-call critical alerts and follow documented incident investigation procedures to reestablish customer facing feature availability
Maintain HIPAA, GDPR, SOC-2 compliance and general security through best practice implementation

Job Requirements

At least 8+ years of experience in software engineering with 4+ years experience in DevOps
Cloud Provider (AWS, GCP, Azure) experience on managing resources through Infrastructure As Code (Terraform)
Container Orchestration (ECS or K8s) experience to confidently build, test, and release containerized applications for multiple environments and regions
Knowledge of Observability best practices across common cloud resources (EC2, ECS, RDS, DynamoDB, S3, SQS, Eventbridge) with experience on rolling out enhancements across a distributed platform with scale in mind
Experience with shell scripting for *nix systems
Experience with Networking for web applications
Effective at communicating ideas through writing and diagramming
Comfortable working with a distributed development and ops team
Familiarity with AWS: ECS and cloud hosting, Gitlab: CI/CD, Python: Django, Flask, aiohttp, Bash, Data: PostgreSQL, Redis, Monitoring: Datadog and Sentry, IaC: Terraform, Packer

Apply now

More job openings