
Dir/VP, Tech Ops & Service Mgmt, UOBAM
- Singapore
- Permanent
- Full-time
- Monitor and support production applications to ensure optimal performance and availability.
- Lead incident response efforts, including communication and coordination across teams.
- Conduct problem management and root cause analysis to prevent recurring issues.
- Coordinate with development teams to address and resolve bugs efficiently.
- Monitor vendor performance and ensure compliance with service level agreements (SLAs).
- Maintain system reliability, availability, and uptime through proactive measures.
- Implement automated healing, monitoring, and alerting solutions.
- Perform capacity planning and system performance tuning to support scalability.
- Conduct regular service operational reviews and audits to ensure adherence to standards.
- Maintain and regularly test Business Continuity Plans (BCP) and Disaster Recovery (DR) strategies.
- Monitor data pipelines to ensure data integrity and timely delivery.
- Manage data-related incidents and maintain lineage tracking for transparency and traceability.
- Provide input on operational requirements during design
- Participate in non-functional testing (e.g., performance, failover)
- Define SLOs, SLIs, and error budgets (SRE)
- Collaborate on automation and observability tooling
- Work with Data Engineering to define data pipelines and quality checks
- Define service requirements and SLAs during vendor onboarding
- Contribute to BCP/DR planning
- Provide feedback on incident trends to improve design
- Integrate AI into service desk operations for intelligent ticket routing, resolution suggestions, and trend analysis.
- Use AI to analyze incident trends and recommend design improvements or automation opportunities.
- Monitor and manage on-premises network infrastructure including routers, switches, firewalls, and load balancers.
- Ensure network performance, availability, and security through proactive maintenance and configuration management.
- Conduct regular network health checks, firmware updates, and hardware lifecycle planning.
- Respond to and resolve network-related incidents, including root cause analysis and documentation.
- Implement and maintain network segmentation, access controls, and compliance with internal security policies.
- Education: Bachelor’s degree in Computer Science, IT, or a related field.
- Experience: 5+ years in IT operations, service management, or related roles, ideally in high-growth or startup environments.
- Ops Leadership: Proven ability to lead cross-functional teams across Production Support, SRE, DevOps, and Service Management.
- Frameworks & Best Practices: Experience with ITIL framework and service management best practices.
- Incident & Problem Management: Strong track record in leading incident response and driving root cause analysis.
- Performance & Reliability: Experience in capacity planning, performance tuning, and building reliable systems.
- BCP & DR: Hands-on experience with Business Continuity and Disaster Recovery planning and execution.
- Entrepreneurship Mindset: You’re resourceful, collaborative, and thrive in environments where you can make a big impact quickly.