עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
abra R&D is looking for a QA Engineer We’re looking for a QA Engineer to take part in building a next-generation agentic analytics platform a real-time database optimized for AI agents at scale. In this role, you’ll define how AI agents are measured, validated, and improved in production. This is not a traditional QA position, it sits at the intersection of AI, data, and engineering, combining evaluation research with production-grade systems. You will design evaluation methodologies, build LLM-as-a-judge systems, and develop agent-based testing frameworks to ensure the correctness, robustness, and reliability of complex multi-agent workflows operating on real-time data. What you’ll do: Design and implement evaluation frameworks for AI agents and multi-agent systems Build LLM-as-a-judge pipelines to evaluate correctness, reasoning, and output quality Develop agent-based evaluation systems (agents evaluating agents) Define metrics, benchmarks, and methodologies for performance and reliability Build data-driven evaluation pipelines using synthetic and real-world datasets Analyze failure modes, edge cases, and non-deterministic behaviors Improve agent reliability, robustness, and consistency in production Work with tools such as Google ADK, Opik, and similar evaluation frameworks Collaborate closely with AI, platform, and database team 4–8+ years of experience in software engineering, AI systems, or evaluation/QA engineering Strong programming skills in Python Hands-on experience working with LLMs in production environments Experience building evaluation systems, automation frameworks, or testing infrastructure Strong understanding of prompt engineering, tool use, and agent behavior Ability to think in terms of metrics, correctness, and system reliability Strong Plus: Experience with LLM evaluation frameworks (Opik, LangSmith, etc.) Experience with Google ADK / agent frameworks Experience implementing LLM-as-a-judge or ranking systems Background in data systems, analytics, or real-time pipelines Experience with multi-agent systems Familiarity with statistical evaluation methods or experimentation (A/B testing, scoring systems)
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
ערב
הרצליה