CoreWeave

https://www.coreweave.com/

Private

101-250

Software, Security & Developer Tools

CoreWeave is a specialized cloud provider focused on GPU accelerated use cases including VFX, AI/ML, Batch Processing and Real Time Experiences. We support countless AI/ML services in the text to image, NLP and broader AI/ML space, reducing client’s infrastructure management requirements with our Kubernetes based serverless GPU cloud offerings.

Job Description

About the role:

The Developer Productivity Team functions as the lubricant that keeps CoreWeave’s gears of innovation turning fast and friction-free. This team is responsible for the development, integration, and operation of platforms central to the engineering experience with the ultimate objective of enabling engineers across CoreWeave to do more, better. Central to the Developer Productivity mission is the operation of our development environment, onboarding toolchain and CI/CD which leverage CoreWeave’s deep investment in the Kubernetes ecosystem. Engineers on this team will endeavor to discover and remove engineer friction across CoreWeave’s engineering teams through the development of boilerplate, integrations, automation and the operation of shared platforms.

We are seeking a Senior Site Reliability Engineer who can help us execute on the mission of making developers’ lives easier. This individual will work with a team of 4-6 mixed-specialization engineers and have the opportunity to work on the full gamut of rewarding challenges that come with the business of building a cloud in a communicative, supportive, and high-performing environment. As a member of the Developer Productivity Team you would have the opportunity to:

Design and implement services and tools to reduce friction and toil in the lives of our engineering and operations.
Improve the performance, security, reliability, and scalability of our CI/CD platform, on-call tooling and participate in the team's on-call rotation.
Enable and evangelize the practice of reliability engineering across CoreWeave’s engineering teams.
Create and operate kubernetes native multi regional deployment patterns
Develop tools that streamline the onboarding process, ensuring a more seamless experience for new team members.
Collect and analyze data related to developer productivity, tool usage to identify areas for improvement and measure the impact of implemented changes.
Grow, change, invest in your teammates, be invested-in, share your ideas, listen to others, be curious, have fun, and, above all, be yourself.

Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are some qualities we’ve found compatible with our team. If a portion of this resonates with you, we’d love to talk.

You have one or more years of experience in a software or infrastructure engineering industry
You enjoy helping your colleagues achieve more with less effort.
You have experience working with CI/CD platforms like Github, ArgoCD, Flux etc
You have experience operating services in production and at scale and are interested in learning more about reliability engineering concepts such as the different types of testing, progressive deployments, error budgets, staging environments and fault-tolerant design.
You’re familiar with Kubernetes and have interest or experience with using it for event-driven and/or stateful orchestration
You’re comfortable with the idea of using Go as your primary programming language.
You know your way around a Linux distro, shell scripting, and/or the Linux storage and networking stacks.
You’re excited about being part of a team of diverse perspectives and backgrounds that believe in tackling challenges, growing hand in hand, and winning together.

About the role:

Design and implement services and tools to reduce friction and toil in the lives of our engineering and operations.
Improve the performance, security, reliability, and scalability of our CI/CD platform, on-call tooling and participate in the team's on-call rotation.
Enable and evangelize the practice of reliability engineering across CoreWeave’s engineering teams.
Create and operate kubernetes native multi regional deployment patterns
Develop tools that streamline the onboarding process, ensuring a more seamless experience for new team members.
Collect and analyze data related to developer productivity, tool usage to identify areas for improvement and measure the impact of implemented changes.
Grow, change, invest in your teammates, be invested-in, share your ideas, listen to others, be curious, have fun, and, above all, be yourself.

You have one or more years of experience in a software or infrastructure engineering industry
You enjoy helping your colleagues achieve more with less effort.
You have experience working with CI/CD platforms like Github, ArgoCD, Flux etc
You have experience operating services in production and at scale and are interested in learning more about reliability engineering concepts such as the different types of testing, progressive deployments, error budgets, staging environments and fault-tolerant design.
You’re familiar with Kubernetes and have interest or experience with using it for event-driven and/or stateful orchestration
You’re comfortable with the idea of using Go as your primary programming language.
You know your way around a Linux distro, shell scripting, and/or the Linux storage and networking stacks.
You’re excited about being part of a team of diverse perspectives and backgrounds that believe in tackling challenges, growing hand in hand, and winning together.

Apply Now Apply Now

Recommended Jobs