CoreWeave

https://www.coreweave.com/

Private

101-250

Software, Security & Developer Tools

CoreWeave is a specialized cloud provider focused on GPU accelerated use cases including VFX, AI/ML, Batch Processing and Real Time Experiences. We support countless AI/ML services in the text to image, NLP and broader AI/ML space, reducing client’s infrastructure management requirements with our Kubernetes based serverless GPU cloud offerings.

Job Description

CoreWeave is seeking a highly skilled and motivated Senior Linux Engineer to join our Kernel HAVOCK Team, reporting into the Director of Compute Architecture. In this role, you will play a crucial part in the design, development, and optimization of our bare-metal systems from POST through joining a Kubernetes cluster. The team’s primary responsibilities include maintaining a custom Linux kernel, various OS images (Ubuntu-based), the virtualization stack (kubevirt/qemu/vfio), and the container/pod runtime stack (containerd/nydus/kubelet). You will collaborate closely with cross-functional teams, up stack engineering teams, and stakeholders to ensure the successful delivery of highly performant and reliable software solutions.

Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet

Our Team’s Stack:

Linux Kernel (custom build, currently tracking Ubuntu HWE)
Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
KubeVirt, QEMU, SR-IOV, vfio-pci
Ubuntu 22.04
Containerd, Kubelet

Responsibilities:

Develop and maintain tooling to build custom Linux kernels and stateless OS images
Automate packaging of critical components (drivers, microcode, components with out-of-tree patches, etc)
Serve as a senior point of contact for hardware issue escalation and troubleshooting
Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
Analyze and optimize the performance of bare-metal and virtualized systems, identify bottlenecks, and propose improvements for enhanced efficiency

Requirements:

Must have at least 5 years of professional experience maintaining large fleets of Linux servers
Deep professional experience with troubleshooting and debugging hardware, OS, and kernel issues
History of improving system efficiency within different subsystems (network, storage, security)
Strong familiarity with sysctls, cgroups, iommu, init systems, seccomp/apparmor
Ability to effectively prioritize and communicate proposed features and fixes
Strong passion for automation, with a commitment to automating processes comprehensively
Excellent documentation skills and attention to detail
Strong analytical and problem-solving abilities

Nice-to-haves:

Experience with kexec, kpatch, kdump
Experience building CI/CD pipelines (GitHub or GitLab)
Opinions about software version control and team collaboration
Experience writing software tests

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $130,000/year in our lowest geographic market up to $210,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.

Hybrid Workplace

Successful candidates will be expected to attend onboarding training at our NJ Headquarters within their first several weeks of employment, with subsequent quarterly travel requirements of 1 week duration.

If you reside within a 30-mile radius of our New Jersey, New York, or Philadelphia offices, we're excited for you to join us at the office at least three times a week, recognizing the significance we place on fostering connections, collaboration, and creativity within our office culture. Our commitment to operating as a hybrid workplace underscores our dedication to enabling our employees to tailor their work-life balance to their individual preferences.

CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.

Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet

Our Team’s Stack:

Linux Kernel (custom build, currently tracking Ubuntu HWE)
Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
KubeVirt, QEMU, SR-IOV, vfio-pci
Ubuntu 22.04
Containerd, Kubelet

Responsibilities:

Develop and maintain tooling to build custom Linux kernels and stateless OS images
Automate packaging of critical components (drivers, microcode, components with out-of-tree patches, etc)
Serve as a senior point of contact for hardware issue escalation and troubleshooting
Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
Analyze and optimize the performance of bare-metal and virtualized systems, identify bottlenecks, and propose improvements for enhanced efficiency

Requirements:

Must have at least 5 years of professional experience maintaining large fleets of Linux servers
Deep professional experience with troubleshooting and debugging hardware, OS, and kernel issues
History of improving system efficiency within different subsystems (network, storage, security)
Strong familiarity with sysctls, cgroups, iommu, init systems, seccomp/apparmor
Ability to effectively prioritize and communicate proposed features and fixes
Strong passion for automation, with a commitment to automating processes comprehensively
Excellent documentation skills and attention to detail
Strong analytical and problem-solving abilities

Nice-to-haves: