Foresite Labs Fl2024 006·about 5 hours ago
Staff Engineer, High Performance Data & Algorithm Infrasturcture
Location: San Diego, CA
Job Type: Full-Time
Pay Range: $175k - $185k, bonus, equity
About Us
We are a venture-backed, stealth-stage biotechnology company based in San Diego, focused on developing novel technologies that will redefine how disease is detected, characterized, and managed with a novel approach to clinical genomics. Our mission is to fundamentally transform healthcare through a convergence of innovation across multiple scientific disciplines.
Founded by industry veterans with decades of experience in life sciences tools and diagnostics, our leadership team brings a proven track record of translating scientific insight into successful commercial products. Our investors include some of the most respected names in healthcare and deep tech.
Position Overview
We are looking for a Senior Staff Software Engineer with deep expertise in high-performance computing (HPC), Linux systems, and GPU-accelerated data pipelines. This is a highly technical, hands-on role focused on extracting maximum performance from modern CPUs, GPUs, memory subsystems, and high-speed networks. You will work close to the hardware and operating system, tuning kernels, BIOS settings, and drivers, while also designing and implementing low-latency data processing pipelines that include real-time signal processing. If you enjoy profiling, tuning, and eliminating bottlenecks across the full stack— from BIOS to CUDA kernels to network offload—this role is for you.
Key Responsibilities
High-Performance System Engineering
Design, build, and optimize high-throughput, low-latency compute pipelines
Profile and tune performance across CPUs, GPUs, memory, storage, and
networking
Identify and eliminate bottlenecks in data movement and computation
Work directly with hardware and OS configuration to achieve deterministic,
repeatable performance
Linux Systems & Kernel Expertise
Configure and tune Linux systems for high-performance workloads
Customize and tune Linux kernel parameters (scheduler, NUMA, IRQs, huge
pages, IOMMU, etc.)
Tune CPU and BIOS parameters (power states, frequency scaling, SMT, NUMA,
memory timing)
Manage and optimize DMA paths between devices and system memory
Minimize context switches, cache misses, and system jitter
GPU & CUDA Programming (Critical)
Develop and optimize GPU-accelerated compute pipelines using CUDA
Optimize memory transfers between host and GPU (pinned memory, zero-copy,
GPUDirect where applicable)
Tune kernel launches, memory access patterns, and occupancy
Configure and manage GPU drivers, runtime, and system-level settings for
maximum throughput
Profile GPU workloads using tools such as Nsight Systems and Nsight Compute
Data Movement & Networking
Optimize high-speed data ingestion and offload to HPC systems
Work with low-latency and high-bandwidth networking technologies (e.g.,
RDMA, InfiniBand, high-speed Ethernet)
Minimize data transfer latencies across network, PCIe, and memory boundaries
Design zero-copy or near-zero-copy data paths where possible
Signal Processing & Algorithms
Implement and optimize digital signal processing algorithms, including:
FFTs
Deconvolution
Thresholding and detection algorithms
Optimize DSP workloads for CPU vectorization and GPU acceleration
Balance numerical accuracy, latency, and throughput constraints
Qualifications
Education:
BS/MS in Computer Science or Engineering
Required:
Experience & Technical Skills
7+ years of professional software engineering experience (or equivalent depth)
Strong background in high-performance computing or performance-critical
systems
Expert-level Linux experience, including kernel and system tuning
Deep experience with GPU computing and CUDA (required)
Strong systems programming skills in C/C++ (and/or Rust)
Solid understanding of computer architecture:
CPU caches, NUMA, memory hierarchies
PCIe and DMA
GPU architectures
Performance & Debugging Skills
Extensive experience profiling and tuning complex systems
Comfortable using tools such as perf, ftrace, eBPF, valgrind, Nsight, and similar
Ability to reason quantitatively about latency, bandwidth, and throughput
DSP & Mathematical Foundations
Practical experience implementing DSP algorithms in production systems
Strong understanding of FFTs, convolution/deconvolution, filtering, and
thresholding
Ability to optimize numerical algorithms for real-time or near-real-time
constraints
Preferred:
Experience with RDMA, GPUDirect RDMA, or other hardware offload
technologies
Experience with custom kernel builds or kernel module development
Familiarity with real-time or low-latency Linux variants
Experience deploying HPC workloads at scale
Background in scientific computing, signal processing, or computational physics
What Success Looks Like
Data pipelines consistently hit performance targets with headroom
Latency and throughput are predictable, measurable, and well understood
GPUs and CPUs are efficiently utilized with minimal idle time
System-level bottlenecks are identified early and resolved decisively
Why This Role is Interesting
You will work on problems where performance truly matters
You will operate across the full stack, from BIOS and kernel settings to CUDA kernels and DSP algorithms
Your optimizations will have immediate, measurable impact
You will have the freedom to deeply understand and tune the system, not just work around it
Why Join Us
Work in a dynamic, collaborative environment where innovation and scientific rigor are
deeply valued.
Join a seasoned and multidisciplinary team tackling high-impact problems at the
intersection of science and engineering.
Competitive compensation and equity package, comprehensive benefits, and flexibility to support work-life integration.
Radian is an equal opportunity employer. We thrive on diversity and collaboration.