

Senior Software Engineer
Job Description
We’re looking for a Software Engineer to join our Data Infrastructure team. This person will have an opportunity to meaningfully contribute to the vision, scope, and structure of the team and the architecture and capabilities that it builds. This person should have a strong background in Data Engineering, but also experience as a Software Engineer who understands foundational best practices like testing strategies, code reviews, etc.. They should be interested in directly leading projects that will have a material impact on the company’s ability to build and train models at scale.
Responsibilities
Building / contributing to Data Platforms for our Research Team i.e. managing Airflow, BigQuery, Dataproc, Dataflow, etc.
Building highly scalable data pipelines on distributed computing platforms on GCP
Contributing to building our multimedia AI Lakehouse
Contributing to improving our Data Lineage System
Building internal tooling to help other teams to visualize, use, and understand large data sets
Building guardrails to optimize cost, data quality, usability, and speed
Job Requirements
5+ years of software engineering experience in production settings writing clean, maintainable, and well-tested code
3+ years of professional experience working as a Data Engineer or similar position
Experience with BigTable, BigQuery, Dataproc, Dataflow, Dataplex, and Cloud Composer and other GCP services
Familiarity with distributed data processing frameworks such as Apache Beam and Apache Spark, and a deep understanding of both batch and stream processing
Experience with Airflow or other managed solutions such as Composer, Astronomer, etc.
Fluency in Python and SQL
Experience building internal applications and developer / researcher tools
Experience with Building Data Lineage systems
Experience working with Terraform, Docker, Kubernetes, CI/CD
Knowledge of GCP IAM patterns and best practices
Experience with Mage or Prefect is a plus