Systems Engineer (Kernel/Linux)
![CoreWeave](https://cdn.prod.website-files.com/63a2d70ffe5e82865b028e60/6512c26078ced671ca6bb55c_image.webp)
CoreWeave is a specialized cloud provider focused on GPU accelerated use cases including VFX, AI/ML, Batch Processing and Real Time Experiences. We support countless AI/ML services in the text to image, NLP and broader AI/ML space, reducing client’s infrastructure management requirements with our Kubernetes based serverless GPU cloud offerings.
Job Description
CoreWeave is seeking a highly skilled and motivated Senior Linux Engineer to join our Kernel HAVOCK Team, reporting into the Director of Compute Architecture. In this role, you will play a crucial part in the design, development, and optimization of our bare-metal systems from POST through joining a Kubernetes cluster. The team’s primary responsibilities include maintaining a custom Linux kernel, various OS images (Ubuntu-based), the virtualization stack (kubevirt/qemu/vfio), and the container/pod runtime stack (containerd/nydus/kubelet). You will collaborate closely with cross-functional teams, up stack engineering teams, and stakeholders to ensure the successful delivery of highly performant and reliable software solutions.
Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet
Our Team’s Stack:
- Linux Kernel (custom build, currently tracking Ubuntu HWE)
- Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
- KubeVirt, QEMU, SR-IOV, vfio-pci
- Ubuntu 22.04
- Containerd, Kubelet
Responsibilities:
- Develop and maintain tooling to build custom Linux kernels and stateless OS images
- Automate packaging of critical components (drivers, microcode, components with out-of-tree patches, etc)
- Serve as a senior point of contact for hardware issue escalation and troubleshooting
- Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
- Analyze and optimize the performance of bare-metal and virtualized systems, identify bottlenecks, and propose improvements for enhanced efficiency
Requirements:
- Must have at least 5 years of professional experience maintaining large fleets of Linux servers
- Deep professional experience with troubleshooting and debugging hardware, OS, and kernel issues
- History of improving system efficiency within different subsystems (network, storage, security)
- Strong familiarity with sysctls, cgroups, iommu, init systems, seccomp/apparmor
- Ability to effectively prioritize and communicate proposed features and fixes
- Strong passion for automation, with a commitment to automating processes comprehensively
- Excellent documentation skills and attention to detail
- Strong analytical and problem-solving abilities
Nice-to-haves:
- Experience with kexec, kpatch, kdump
- Experience building CI/CD pipelines (GitHub or GitLab)
- Opinions about software version control and team collaboration
- Experience writing software tests
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $130,000/year in our lowest geographic market up to $210,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.
Hybrid Workplace
Successful candidates will be expected to attend onboarding training at our NJ Headquarters within their first several weeks of employment, with subsequent quarterly travel requirements of 1 week duration.
If you reside within a 30-mile radius of our New Jersey, New York, or Philadelphia offices, we're excited for you to join us at the office at least three times a week, recognizing the significance we place on fostering connections, collaboration, and creativity within our office culture. Our commitment to operating as a hybrid workplace underscores our dedication to enabling our employees to tailor their work-life balance to their individual preferences.
CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.
CoreWeave is seeking a highly skilled and motivated Senior Linux Engineer to join our Kernel HAVOCK Team, reporting into the Director of Compute Architecture. In this role, you will play a crucial part in the design, development, and optimization of our bare-metal systems from POST through joining a Kubernetes cluster. The team’s primary responsibilities include maintaining a custom Linux kernel, various OS images (Ubuntu-based), the virtualization stack (kubevirt/qemu/vfio), and the container/pod runtime stack (containerd/nydus/kubelet). You will collaborate closely with cross-functional teams, up stack engineering teams, and stakeholders to ensure the successful delivery of highly performant and reliable software solutions.
Kernel Hardware - Acceleration - Virtualization - Operating Systems - Containerization - Kubelet
Our Team’s Stack:
- Linux Kernel (custom build, currently tracking Ubuntu HWE)
- Intel/AMD CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs
- KubeVirt, QEMU, SR-IOV, vfio-pci
- Ubuntu 22.04
- Containerd, Kubelet
Responsibilities:
- Develop and maintain tooling to build custom Linux kernels and stateless OS images
- Automate packaging of critical components (drivers, microcode, components with out-of-tree patches, etc)
- Serve as a senior point of contact for hardware issue escalation and troubleshooting
- Collaborate with cross-functional teams to define Linux and OS requirements, specifications, and system architecture
- Analyze and optimize the performance of bare-metal and virtualized systems, identify bottlenecks, and propose improvements for enhanced efficiency
Requirements:
- Must have at least 5 years of professional experience maintaining large fleets of Linux servers
- Deep professional experience with troubleshooting and debugging hardware, OS, and kernel issues
- History of improving system efficiency within different subsystems (network, storage, security)
- Strong familiarity with sysctls, cgroups, iommu, init systems, seccomp/apparmor
- Ability to effectively prioritize and communicate proposed features and fixes
- Strong passion for automation, with a commitment to automating processes comprehensively
- Excellent documentation skills and attention to detail
- Strong analytical and problem-solving abilities
Nice-to-haves:
- Experience with kexec, kpatch, kdump
- Experience building CI/CD pipelines (GitHub or GitLab)
- Opinions about software version control and team collaboration
- Experience writing software tests
Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $130,000/year in our lowest geographic market up to $210,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.
Hybrid Workplace
Successful candidates will be expected to attend onboarding training at our NJ Headquarters within their first several weeks of employment, with subsequent quarterly travel requirements of 1 week duration.
If you reside within a 30-mile radius of our New Jersey, New York, or Philadelphia offices, we're excited for you to join us at the office at least three times a week, recognizing the significance we place on fostering connections, collaboration, and creativity within our office culture. Our commitment to operating as a hybrid workplace underscores our dedication to enabling our employees to tailor their work-life balance to their individual preferences.