
Site Reliability Engineer - Observability
- Singapore
- Permanent
- Full-time
- Deploy and manage Observability platforms and agents for ingesting metrics, logs, and traces from various sources.
- Parse and organize logs to extract relevant fields and data for processing and filtering.
- Assist developers in instrumenting application code to collect custom Application Performance Monitoring (APM) data, record, script, and manage synthetic monitors for testing purposes.
- Capture user sessions and data for real user monitoring (RUM). set up alerts and notifications for proactive monitoring.
- Generate dashboards, visualizations, and reports to provide actionable insights.
- Participate in and support root cause analysis (RCA) and application/service profiling sessions.
- Educate and assist teams in leveraging observability tools effectively.
- Diploma or Degree in Computer Science, Information Technology, or related disciplines, at least 2-5 years of experience working with modern observability platforms.
- Familiarity with observability concepts and standards such as OpenTelemetry.
- Experience with observability tools like the Elastic Stack for monitoring cloud infrastructure and application performance.
- Knowledge of developing, instrumenting, and profiling applications to enhance performance and reliability.
- Observability Certifications- Elastic Certified Observability Engineer, Dynatrace Associate/Professional, Splunk O11y Cloud Certified Metrics User.
- Cloud/Developer Certifications - AWS Developer Associate, Azure Developer Associate.