Sales - AI QA Engineer

Apple

  • Singapore
  • Permanent
  • Full-time
  • 8 hours ago
Imagine what you could do here. At Apple, new ideas have a way of becoming outstanding products, services, and customer experiences very quickly. Bring passion and dedication to your job, and there's no telling what you could accomplish. Apple's Sales organization generates the revenue needed to fuel our ongoing development of products and services. This, in turn, enriches the lives of hundreds of millions of people around the world. We are, in many ways, the face of Apple to our largest customers. Apple's Decision Intelligence (DI) team is looking for a talented individual who is passionate about crafting, implementing, and operating AI solutions that have a direct and measurable impact on Apple Sales and its customers.DescriptionWe're looking for a hands-on AI QA Engineer to ensure the quality, reliability, and safety of GenAI-powered tools and intelligent agents before they reach our internal teams. You'll develop test strategies and evaluation protocols that go beyond functional QA-covering hallucination risk, grounding accuracy, latency, and prompt output consistency. This role will operate in both capacities, to augment existing AI and Software roadmaps, as well as innovate and trailblaze new frontier tech projects, crafting AI experiences that reduce time to insights and catalyze decision-making. AI is a team sport, and in your role, you will be key in leading and influencing teams on the translation of business problems and questions into GenAI solutions.Responsibilities
  • Design automated and manual test plans for GenAI agents, summarization modules, and LLM outputs.
  • Develop and maintain unit and integration test suites for GenAI pipelines and agent workflows
  • Develop evaluation scripts and tools to validate factual accuracy, token usage, and latency.
  • Collaborate with prompt engineers and data scientists to red-team and edge-test agent behavior.
  • Create regression testing pipelines for model updates and prompt changes.
  • Support telemetry reviews and UX teams with actionable insights from QA cycles.
  • Implement regression testing and failure detection for backend services and APIs.
  • Support CI/CD quality gates and automated test triggers for model or prompt updates.
  • Help define and monitor QA metrics such as error rates, fallbacks triggered, and invalid outputs.
Minimum Qualifications
  • 4+ years of experience in QA, testing engineering, ML, data engineering, governance, or backend development, with recent focus on GenAI and LLMs.
  • We're looking for someone with an eagerness and ability to learn new skills and solve dynamic problems in an encouraging and expansive environment.
  • Experience testing ML, NLP, or GenAI products (especially RAG and prompt-driven systems).
  • Comfort with ambiguity. Ability to audit a full orchestrator and business context layer for sales.
  • Proficiency in Python (FastAPI, LangChain, or similar frameworks), prompt engineering, and RESTful API design.
  • Hands-on experience with LLM APIs, embeddings, vector databases, and RAG workflows.
  • Experience working with monitoring and observability tools (e.g., Prometheus, OpenTelemetry, Weights & Biases).
  • Familiarity with telemetry and evaluation frameworks for AI agents.
  • Experience working with data science teams on insights generation leveraging LLMs.
  • Knowledge of project management, productivity, and design tools such as Wrike and Sketch.
  • Strong time management skills with the ability to collaborate across multiple teams globally.
  • Able to balance competing priorities, long-term projects, and ad hoc requirements.
  • Ability to work in a fast-paced, dynamic, constantly evolving business environment.
  • B.S. Degree in Computer Science/Engineering, or equivalent work experience.
Preferred Qualifications
  • Strong experience articulating and translating business questions into AI solutions.
  • Communicate results and insights effectively to partners and senior leaders, as well as both technical and non-technical audiences.
  • Familiarity with LangSmith, Trulens, Weights & Biases, or other LLMOps tools.
  • Experience with hallucination detection, chain-of-thought reasoning QA, or trust scoring.
  • Understanding of both backend API testing and UI/UX acceptance criteria.
  • Other complementary technologies for distributed systems architecture and asynchronous messaging, agent communication, and catching like RabbitMQ, Redis, and Valkey are preferred.
  • Advanced Degree (MS or Ph.D.) in Economics, Electrical Engineering, Statistics, Data Science, or a similar quantitative field is preferred.

Apple