עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לחפש לבד בין מאות מודעות – תנו ל-Jobify לנתח את קורות החיים שלכם ולהציג לכם רק הזדמנויות שבאמת שוות את הזמן שלכם מתוך מאגר המשרות הגדול בישראל.
השימוש חינם, ללא עלות וללא הגבלה.
Description:
Company Overview:
Cellebrite’s (Nasdaq: CLBT) mission is to enable its global customers to protect and save lives by enhancing digital investigations and intelligence gathering to accelerate justice in communities around the world. Cellebrite’s AI-powered Digital Investigation Platform enables customers to lawfully access, collect, analyze and share digital evidence in legally sanctioned investigations while preserving data privacy. Thousands of public safety organizations, intelligence agencies and businesses rely on Cellebrite’s digital forensic and investigative solutions—available via cloud, on-premises and hybrid deployments—to close cases faster and safeguard communities. To learn more, visit us at www.cellebrite.com, https://investors.cellebrite.com/investors and find us on social media @Cellebrite.
About the Job
We are looking for an AI Quality & Evaluation Engineer to own the quality planning and execution of an AI-powered chat application operating over complex law enforcement and mobile device data.
This is a highly hands-on role focused on execution rather than high-level QA strategy. You will design, build, and run automated and semi-automated tests for LLM-driven workflows, create evaluation datasets, and continuously stress the system with realistic and extreme investigative scenarios.
What you will be doing
• Design, plan, and execute quality tests for an AI chat application built on LLMs and investigative data.
• Build and maintain automation frameworks for prompt regression testing, multi-turn conversations, and model upgrades.
• Create and curate evaluation datasets used for regression testing, benchmarking, and model comparison.
• Design complex investigative scenarios including ambiguous, incomplete, or conflicting datasets.
• Execute manual exploratory testing to uncover hallucinations, reasoning failures, and edge cases.
• Work closely with engineering, product, and data teams as part of the development lifecycle.
• Validate release readiness and identify regressions related to prompts, models, or data pipelines.
What makes this role different
• You are evaluating AI behavior rather than fixed expected outputs.
• You help define what correctness and quality mean for AI reasoning over sensitive data.
• You actively invent scenarios to stretch and break the system.
• Your work directly impacts trust, reliability, and investigative confidence.
Requirements:
What you should bring
•5+ years of experience in QA, test automation, or validation engineering.
• Strong hands-on experience building automated tests.
• Experience testing complex, data-heavy systems.
• Familiarity with API testing tools (e.g., Postman).
• Strong analytical, debugging, and problem-solving skills.
• High attention to detail with the ability to see the bigger picture.
• Excellent English, written and spoken.
Nice to have
• Experience testing AI, ML, or LLM-based systems.
• Experience with prompt testing or NLP evaluation techniques.
• Experience building synthetic or semi-synthetic datasets.
• Experience working with databases (SQL).
Personal Characteristics:
במקום לחפש לבד בין מאות מודעות – תנו ל-Jobify לנתח את קורות החיים שלכם ולהציג לכם רק הזדמנויות שבאמת שוות את הזמן שלכם מתוך מאגר המשרות הגדול בישראל.
השימוש חינם, ללא עלות וללא הגבלה.