

Staff Engineer
Job Description
As a Staff Engineer for the Capacity and Performance Engineering (CPE) Team, you will be working on efforts to improve the scalability and efficiency of Cruise’s infrastructure. You will need to work cross-functionally with engineering and various teams everyday to drive efficiency initiatives and develop tooling to automate this. You will be responsible for collaborating with engineering for building scalable and efficient platforms, and optimizing our existing platforms. Your day will involve working across teams such as AI, Product, and Infrastructure to develop the structure for cost showback and self-service analysis, leading the CPE team for cloud efficiency discovery and execution, and working on strategic projects that will shape the future of Cruise. In particular, we’re looking for someone with familiarity with high performance compute clusters, scheduler logic, and resource contention tradeoff experience. We’re looking for an engineer that has a proven ability to make logic gate tradeoffs to represent efficiency targets without going too far and is comfortable investing the team in deep understanding to find novel improvements backed by data.
Responsibilities
Provide deepest visibility to what is going on for all products: Run capacity and performance experiments to determine scaling and utilization parameters for various service tiers.
Proactively identify gaps in infrastructure efficiency and workflow efficiencies, with being a key contributor in driving proposals to results
Partner with engineering teams, Compute/Simulation platform in particular, to conduct capacity and performance experiments to resolve potential performance bottlenecks
Work closely with software engineers to reduce their consumption of cloud resources and improve their performance
Work with cloud service providers to proactively negotiate and retain necessary SKUs and capabilities required for efficient scale and capacity readiness
Work frequently with other teams to coordinate major changes to cross-system architectures, influencing upstream or downstream for the most efficient solution.
Present efficiency opportunities and project cost savings to Cruise executive team
Design, develop and lead automation to help capacity plan for both near and long term
Job Requirements
Familiarity with HPC clusters, Compute Platforms
8+ years experience in capacity or performance engineering role
5+ years experience managing teams
Expert application performance experience
Expert knowledge of various public cloud providers
Expert with data modeling for public cloud
Expert with budgeting and capacity planning experience
Expert in SQL, Python, scripting, and building automation tools
Self-disciplined and thrives in fast moving environment
Excellent communication skills