Role Summary We are seeking a highly skilled and motivated Lead Site Reliability Engineer (SRE) with strong AWS expertise to lead our Service Operations team. You will be responsible for driving SRE practices, ensuring the scalability, reliability, and performance of mission-critical systems for our digital banking clients. This role requires balancing technical depth with leadership capability — setting direction, mentoring engineers, and ensuring service reliability at scale across multiple teams and clients. Sign-on Bonus: Eligible for candidates who are currently employed elsewhere and able to join GFT within 30 days of offer acceptance. Key Responsibilities Leadership & Mentorship: Lead a team of SREs, providing technical guidance, coaching, and fostering a culture of reliability and continuous improvement. SRE Practices: Define and mature SRE practices, including SLIs/SLOs, error budgets, and incident response processes across production systems. Architecture & Automation: Own the design and evolution of automated cloud operations, driving adoption of Infrastructure-as-Code (Terraform, CloudFormation) and CI/CD pipelines. Incident Management: Lead major incident responses, ensuring rapid resolution, root cause analysis, and implementation of preventive measures. Collaboration: Work closely with Development, DevOps, and Cloud Engineering teams to ensure reliability and resilience are built into every stage of delivery. Operational Excellence: Establish and track key reliability metrics (availability, latency, error rates) and drive initiatives to continuously improve them. Innovation & Tooling: Evaluate and implement AWS-native and third-party tools to improve monitoring, alerting, and automation. Stakeholder Engagement: Act as the primary contact point for Service Reliability topics with clients, ensuring transparency and alignment on reliability goals. Governance: Ensure compliance with industry standards and internal policies around security, audit, and operational risk.