
Site Reliability Engineer Intern, Data Infra (Aug - Dec 2025)
- Singapore
- Training
- Full-time
- Support daily operations of big data platforms, including monitoring, troubleshooting, and routine maintenance.
- Write and optimize Shell scripts to automate operational workflows and system tasks.
- Assist in system health checks, log analysis, and reliability improvements.
- Participate in building tools to enhance the observability and automation of data services.
- Document standard operating procedures and support knowledge sharing across the team.
- Currently pursuing a Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field.
- Strong understanding of Linux operating systems and command-line tools.
- Proficiency in Shell scripting (bash, sh, etc.).
- Clear interest in large-scale systems and reliability engineering.
- Willingness to learn, take initiative, and work collaboratively.
- Familiarity with Python for automation or internal tooling.
- Experience with web platform development (e.g., using Flask, FastAPI, or similar frameworks).
- Exposure to big data and storage engines such as: HDFS, Apache Ozone, Alluxio
- Understanding of monitoring or alerting tools (e.g., Prometheus, Grafana, ELK).
- Knowledge of Git and basic CI/CD workflows.
- Real-world experience in operating and improving a production-grade big data platform.
- Exposure to SRE practices including automation, fault-tolerance, and observability.
- Mentorship from experienced infrastructure engineers.
- Potential opportunity for full-time conversion based on performance and graduation timeline.