Senior Parallel Programming Expert (CUDA/AVX)

Paris, Grenoble, Sophia-Ant...
Full-Time
Apply Now

About

About EXXA

At EXXA, we are building the most cost-efficient, high-throughput AI infrastructure for large-scale, asynchronous workloads. Our mission is to balance Gen-AI demand and processing supply by leveraging idle GPUs, optimizing batch inference, and pushing AI models' inference efficiency.
If you are passionate about open-source AI, obsessed with performance, and love tackling complex technical challenges, we want to hear from you!

Who we are

We are an early-stage, fast-growing startup, backed by top tech investors and part of Station F’s Future 40 program. Our founding team has deep expertise in AI research and infrastructure, and we are on a mission to make open-source AI more accessible by championing delayed processing for massive workloads. Our unique approach dramatically reduces waste in Gen-AI, unlocking new possibilities for developers and companies alike.

Why you should join us

🚀 Technical innovation
We are tackling massive technical challenges to make Gen-AI inference infrastructure more efficient and push throughput-optimized computing.

🌐 Remote first
We are a fully distributed team. Work from anywhere in European timezones.

💸 Competitive compensation and benefits
Competitive salary
Early-stage stock options
Private health insurance
30+ paid holidays
Top notch hardware and equipment

🙏 Backed by the best
We are funded by leading VCs and top business angels (announcement coming soon).

Job Description

EXXA is hiring a Senior Parallel Programming Expert (CUDA/AVX) to co-lead the development of EXXA inference engine, focusing on batch processing and throughput rather than low-latency constraints.

Key responsibilities:

Co-lead the development of EXXA inference engine with our CTO
Contribute to the architecture and the implementation of the inference engine
Profile and optimize the inference engine
Design and implement efficient inference kernels for GPU and CPU
Benchmark and validate performance improvements

Preferred Experience

Qualifications:

Proven expertise in parallel programming, using CUDA and/or SIMD instructions
Strong background in performance optimization and profiling
Experience with Triton kernel language and/or MLIR/XLA intermediate representation is a plus
Advanced proficiency in C++ or Rust
Knowledge of Python ML stack (PyTorch, HuggingFace, etc.)
3-5+ years of experience in high-performance computing or similar field

Recruitment Process

Expect to have at least:

An intro call with one of our founders
A technical interview
A final meeting with the team

Additional Information

Contract Type: Full-Time
Location: Paris, Grenoble, Sophia-Antipolis
Experience: > 3 years
Possible full remote

Apply Now

See Other Exxa Job Listings