Lead - Site Reliability Engineer

Paris
Full-Time
Apply Now

About

Welcome to the Jungle is shaping the future of work, leveraging content and technology to revolutionize every aspect of the employee experience. We assist companies in providing a better, more sustainable workplace experience.

Today, over 5,500 companies across various industries showcase their workplace culture on Welcome to the Jungle through exclusive content, team photos, interviews, and key statistics. Our objective is to enable companies to present themselves with increased transparency, allowing candidates to gain a deeper understanding of their people and culture.

Nearly 2.5 million unique visitors actively explore companies, apply for jobs, and engage with our original media and content each month. Our impact extends beyond the platform, resonating with a community of 1.7 million followers across various social media platforms.

Job Description

We’re hiring a Lead Site Reliability Engineer (SRE) to revitalize and lead the SRE function at Welcome to the Jungle, a ~300-person scale-up. You'll be stepping into a critical role not starting from scratch, but rather building upon and evolving existing foundations.

As the first new hire in this transition, you’ll be a hands-on technical leader responsible for continuing to define and implement our SRE strategy, ensuring the reliability, performance, and scalability of our platform.

You'll collaborate closely with engineering, product, and security teams to improve automation, observability, and infrastructure practices in a modern, cloud-native stack. The ideal candidate brings deep technical expertise, strong leadership, and the ability to both stabilize and grow a function in flux.

Key Responsibilities

Strategic & Technical Leadership

Define the vision, standards, and roadmap for Site Reliability Engineering at Welcome to the Jungle.
Lead the design and implementation of scalable, secure infrastructure in AWS using IaC (Terraform, Terragrunt).
Champion GitOps and CI/CD best practices via ArgoCD and CircleCI.
Own the development and enforcement of service-level objectives (SLOs) and indicators (SLIs).
Drive observability across the stack using OpenTelemetry and Datadog to ensure proactive issue detection and resolution.
Establish disaster recovery strategies, high-availability design patterns, and cost-effective infrastructure choices.

Operational Ownership & Automation

Lead incident response processes, postmortems, and on-call rotation design.
Build and maintain operational documentation and automation to reduce manual toil.
Ensure robust alerting, logging, and telemetry across all environments.
Proactively identify and remove bottlenecks in the infrastructure and deployment workflows.
Improve platform performance and reliability through rigorous monitoring, testing, and system design.

Cross-team Collaboration & Knowledge Sharing

Collaborate with development teams to ensure new services are production-ready and follow reliability best practices.
Partner with Security and DevOps to ensure infrastructure meets compliance and security standards.
Mentor developers and influence reliability-focused engineering culture across the company.
Lead internal knowledge sharing and help scale the SRE mindset organization-wide.
Act as a trusted advisor to engineering leadership on system reliability, scalability, and tooling.

Preferred Experience

You have at least 6 years of infrastructure/systems engineering experience and want to maintain a strong hands-on technical focus.
You're comfortable:
- Building and maintaining large-scale distributed systems.
- Managing incident response according to SLA.
- Implementing automation and self-healing systems.
- Developing utility scripts and functions.
- Working in both French and English, in a remote context.
It's not required, but having experience with our tech stack (Elixir, React.js) is a significant advantage.
You have strong problem-solving skills and can troubleshoot complex systems issues.
You're reliability-focused: passionate about building resilient systems, measuring and improving reliability through data-driven approaches, and establishing sustainable operational practices.
You demonstrate excellent communication skills and can effectively collaborate with various technical and non-technical stakeholders.

Deep dive in our stack:
- Our main cloud provider is AWS ;
- We use Kubernetes as our container orchestrator ;
- Our Infrastructure-as-Code is managed with Terraform and Terragrunt ;
- We use ArgoCD and CircleCI as our integration and deployment tools ;
- We use OpenTelemetry & Datadog to monitor our platforms ;
- Our applications runs on GNU/Linux systems, like Debian
And if you're not expert in all of those previous fields, you can still join us, we love sharing our knowledge.

Recruitment Process

An initial conversation with Fattoum, Talent Acquisition Manager
A take-home case, followed by a live Expertise Interview with the tech Team
And finally, two competency interviews based on our company values

Additional Information

Contract Type: Full-Time
Location: Paris
Possible full remote
Salary: between 70000€ and 90000€ / year

Apply Now

See Other Welcome to the Jungle Job Listings