
ED/SVP, Head of Observability and Automation (O&A), SRE & Governance, Group Technology
- Singapore
- Permanent
- Full-time
Group Technology enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group Tech, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.ResponsibilitiesAs the Head of Observability and Automation (O&A), will be responsible for leading and overseeing the team to ensure the stability, reliability, and scalability of our banking platforms and services. This role requires a visionary leader with a deep understanding of SRE principles, a strategic mindset, and a hands-on approach to problem-solving. The ideal candidate will have a strong background in software engineering, infrastructure management, and a proven track record in driving reliability improvements in complex technical environments.Key Responsibilities:· Leadership & Strategy:
- Develop and execute the overall O&A strategy in alignment with the bank's business goals and technological vision.
- Lead, mentor, and manage a team of engineers, fostering a culture of collaboration, innovation, and continuous improvement.
- Define and implement the O&A framework and policy, standardizing reliability practices across the organization.
- Establish and lead a Centre of Excellence (COE) for O&A, promoting best practices and continuous learning adhering to the SRE principles and practices.
- Design, implement, and maintain robust monitoring and alerting, to ensure high availability and performance of banking services.
- Drive the adoption of best practices in system design, capacity planning, and performance optimization.
- Identify and mitigate potential risks to system reliability, proactively addressing issues before they impact customers.
- Develop and implement monitoring & analysis strategies to proactively identify and address potential issues within the technology infrastructure.
- Utilize data-driven insights to optimize system performance, reliability, and scalability.
- Collaborate with cross-functional teams to establish monitoring tools and metrics, ensuring alignment with business objectives and goals.
- Champion automation efforts to streamline operational processes, reduce manual intervention, and increase system efficiency.
- Lead and implement AI/ML initiatives for transformation of our O&A landscape
- Work closely with the application & infrastructure teams to ensure that reliability is built into the architecture and design of new features and services.
- Communicate reliability goals, progress, and challenges to executive leadership and other stakeholders.
- Promote a culture of transparency and accountability within the team and across the organization.
- Manage relationships with external vendors and service providers, ensuring their services align with the bank’s reliability and performance standards.
- Negotiate contracts and service level agreements (SLAs) with vendors to secure favourable terms and ensure accountability.
- Continuously evaluate vendor performance, addressing any issues and exploring new opportunities to optimize service delivery.
- Collaborate with vendors to stay abreast of the latest O&A technologies and tools that can enhance the bank’s infrastructure and reliability.
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Minimum 10 years of experience in software engineering, infrastructure management, or a related technical field.
- Minimum 5 years of experience in a leadership role within an SRE or DevOps team, preferably in the banking or financial services industry.
- Proficiency in programming languages such as Python, Go, Java, or similar.
- Strong understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Experience with CI/CD pipelines.
- Deep knowledge of monitoring and observability tools (Prometheus, Grafana, ELK stack, etc.)
- Excellent leadership, mentoring, and team-building skills.
- Strong problem-solving and analytical abilities.
- Effective communication and interpersonal skills, with the ability to convey complex technical concepts to non-technical stakeholders.
- Strategic thinking and a proactive approach to identifying and addressing potential issues.