Senior Parallel Programming Expert (CUDA/AVX)

About

About EXXA

At EXXA, we are building the most cost-efficient, high-throughput AI infrastructure for large-scale, asynchronous workloads. Our mission is to balance Gen-AI demand and processing supply by leveraging idle GPUs, optimizing batch inference, and pushing AI models' inference efficiency.
If you are passionate about open-source AI, obsessed with performance, and love tackling complex technical challenges, we want to hear from you!

Who we are

We are an early-stage, fast-growing startup, backed by top tech investors and part of Station F’s Future 40 program. Our founding team has deep expertise in AI research and infrastructure, and we are on a mission to make open-source AI more accessible by championing delayed processing for massive workloads. Our unique approach dramatically reduces waste in Gen-AI, unlocking new possibilities for developers and companies alike.

Why you should join us

🚀 Technical innovation
We are tackling massive technical challenges to make Gen-AI inference infrastructure more efficient and push throughput-optimized computing.

🌐 Remote first
We are a fully distributed team. Work from anywhere in European timezones.

💸 Competitive compensation and benefits
Competitive salary
Early-stage stock options
Private health insurance
30+ paid holidays
Top notch hardware and equipment

🙏 Backed by the best
We are funded by leading VCs and top business angels (announcement coming soon).

Job Description

EXXA is hiring a Senior Parallel Programming Expert (CUDA/AVX) to co-lead the development of EXXA inference engine, focusing on batch processing and throughput rather than low-latency constraints.

Key responsibilities:

  • Co-lead the development of EXXA inference engine with our CTO

  • Contribute to the architecture and the implementation of the inference engine

  • Profile and optimize the inference engine

  • Design and implement efficient inference kernels for GPU and CPU

  • Benchmark and validate performance improvements

Preferred Experience

Qualifications:

  • Proven expertise in parallel programming, using CUDA and/or SIMD instructions

  • Strong background in performance optimization and profiling

  • Experience with Triton kernel language and/or MLIR/XLA intermediate representation is a plus

  • Advanced proficiency in C++ or Rust

  • Knowledge of Python ML stack (PyTorch, HuggingFace, etc.)

  • 3-5+ years of experience in high-performance computing or similar field

Recruitment Process

Expect to have at least:

  • An intro call with one of our founders

  • A technical interview

  • A final meeting with the team

Additional Information

  • Contract Type: Full-Time
  • Location: Paris
  • Experience: > 3 years
  • Possible full remote