
IT Manager, Monitoring and Insights (Observability)
- Singapore
- Permanent
- Full-time
- Lead the implementation and optimization of observability platforms (e.g., ITRS Geneos, Dynatrace, etc).
- Define and maintain SLIs, SLOs, and error budgets in collaboration with internal application and infrastructure teams.
- Develop dashboards, alerts, and telemetry pipelines to ensure real-time visibility into system health and performance.
- Drive adoption of observability best practices across IT teams.
- Automate routine operational tasks to reduce manual intervention and human error.
- Integrate observability tools with CI/CD pipelines and incident management platforms (e.g., ServiceNow).
- Promote the use of Infrastructure as Code (IaC) and configuration management tools (e.g., Ansible, Terraform).
- Collaborate with IT security teams to identify and remediate vulnerabilities through observability insights.
- Conduct security assessments and support remediation efforts.
- Promote secure configurations and compliance with standards such as SOX and PCI DSS.
- Stay current with industry trends, emerging technologies, and best practices in observability and IT operations.
- Lead initiatives to improve system reliability, reduce MTTR (Mean Time to Recovery), and enhance incident response capabilities.
- To work closely with Sands global Observability team to foster a culture of learning, collaboration, and operational excellence.
- Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent practical experience).
- Preferred certifications: Azure Architect, CISSP, or equivalent.
- Minimum 8 years of experience in IT operations, systems monitoring, or infrastructure engineering.
- Familiarity with Site Reliability Engineering (SRE) principles and DevSecOps practices.
- Ability to identify and remediate vulnerabilities using observability insights.
- Knowledge of security frameworks and compliance standards (e.g., ISO 27001, PCI DSS).
- Experience with security monitoring, anomaly detection, and automated alerting.
- Proven experience with observability and monitoring tools such as Dynatrace,ITRS Geneos or similar.
- Strong understanding of telemetry pipelines, log aggregation, metrics collection, and distributed tracing.
- Proficiency in scripting and automation (e.g. PowerShell) to support monitoring and alerting automation.
- Solid grasp of cloud infrastructure (Azure or Alicloud), network protocols, and operating systems (Linux/Windows).
- Working knowledge of Kubernetes (Azure Stack/Red Hat Open Shift) services would be beneficial
- Experience integrating observability with CI/CD pipelines and incident management platforms (e.g., ServiceNow).
- Experiences with relational databases (eg. SQL,MySQL, Oracle)
- Strong analytical and problem-solving skills with a proactive, data-driven mindset.
- Excellent communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.
- Demonstrated leadership in cross-functional teams and the ability to drive adoption of observability practices.
- Familiarity with ITIL processes and experience managing service requests, incidents, problems, and changes.