
Machine Learning Platform R&D Expert Engineer (Parameter Server Direction) - EGO Team
- Singapore
- Permanent
- Full-time
- Develop distributed Parameter Server (PS) systems for large-scale sparse model training and inference platforms in the search, advertising, and recommendation domains. The system should support high-throughput parameter read/write and update operations, handle hundreds of billions of features and TB-level sparse models, enable online real-time learning, and meet algorithmic needs such as feature admission and expiration.
- Participate in the development of the one-stop machine learning platform, integrating the PS system into the platform to provide a user-friendly, stable, high-performance, and platform-level distributed parameter service system. Enhance the platform's efficiency and usability, accelerating the model iteration process for algorithm teams.
- Bachelor's degree or above in Computer Science, Electronics, Automation, Software Engineering, or related fields
- At least 3 years of relevant hands-on experience
- Proficient in C++ programming with strong low-level technical skills; adept at multi-threaded programming, lock optimisation, memory pool, thread pool, template programming, GDB debugging, performance tuning, and RPC frameworks.
- Familiarity with distributed PS systems, distributed system backend optimization, high-performance in-memory KV systems, KV storage systems based on NVMe-SSD, and high-performance client-server architecture systems is a plus.
- Highly passionate about computer technology, proactive in learning, with a strong spirit of in-depth research and hands-on practice. Maintains high standards and strict requirements for delivered code; works with rigor and attention to detail.
- Strong team player with excellent continuous learning ability.