As our Site Reliability Engineer you are responsible for implementing and maintaining scalable infrastructure and systems that ensure the reliability, performance, and security of our production environments. This hands-on position bridges the gap between development and operations, applying software engineering principles to infrastructure and operational challenges. This role involves close collaboration with Development teams, Security teams, and other stakeholders to build and maintain robust systems, implement automation, and support operational excellence through SLOs (Service Level Objectives) and observability. Additionally, you will contribute to incident management, capacity planning, and implementing infrastructure as code practices across the organization.
You will report to the Platform Engineering Manager and you are integrated within the Platform Team.
Key Responsibilities
Technical Leadership & System Design
Collaborate with Development teams on infrastructure architecture, deployment strategies, and operational requirements.
Design and implement monitoring, alerting, and observability solutions.
Contribute to infrastructure as code initiatives and maintain deployment automation pipelines.
Implement security best practices in context and maintain compliance requirements.
Design and maintain disaster recovery and backup strategies.
Operational Excellence & Process Implementation
Contribute to incident response efforts and drive resolution of technical issues.
Develop and maintain runbooks and documentation for operational procedures.
Ensure proper logging and monitoring across all systems.
Increase automation initiatives to reduce manual operations.
Maintain and improve SRE practices across the organization.
Cross-team Collaboration & Knowledge Sharing
Work with development teams to implement operational readiness requirements.
Collaborate with Security teams on infrastructure security measures.
Provide technical mentorship to developers on operational practices.
Lead knowledge sharing sessions and documentation efforts.
Partner with Engineering Managers to improve development workflows and tools.