עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.

מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.

Quality & Evaluation Engineer - GenAI Innovation Team

Cellebrite

פתח תקווה

Quality & Evaluation Engineer - GenAI Innovation Team

Cellebrite

פתח תקווה
היברידית
18,000-26,000 ₪ (הערכה מבוססת AI)
זוהי הערכת טווח שכר מבוססת AI ולא פרסום של המעסיק

Company Overview:

Cellebrite’s (Nasdaq: CLBT) mission is to enable its global customers to protect and save lives by enhancing digital investigations and intelligence gathering to accelerate justice in communities around the world. Cellebrite’s AI-powered Digital Investigation Platform enables customers to lawfully access, collect, analyze and share digital evidence in legally sanctioned investigations while preserving data privacy. Thousands of public safety organizations, intelligence agencies and businesses rely on Cellebrite’s digital forensic and investigative solutions—available via cloud, on-premises and hybrid deployments—to close cases faster and safeguard communities. To learn more, visit us at www.cellebrite.com, https://investors.cellebrite.com/investors and find us on social media @Cellebrite.

About The Job

We are looking for an AI Quality & Evaluation Engineer to own the quality planning and execution of an AI-powered chat application operating over complex law enforcement and mobile device data.

This is a highly hands-on role focused on execution rather than high-level QA strategy. You will design, build, and run automated and semi-automated tests for LLM-driven workflows, create evaluation datasets, and continuously stress the system with realistic and extreme investigative scenarios.

What You Will Be Doing

Design, plan, and execute quality tests for an AI chat application built on LLMs and investigative data.
Build and maintain automation frameworks for prompt regression testing, multi-turn conversations, and model upgrades.
Create and curate evaluation datasets used for regression testing, benchmarking, and model comparison.
Design complex investigative scenarios including ambiguous, incomplete, or conflicting datasets.
Execute manual exploratory testing to uncover hallucinations, reasoning failures, and edge cases.
Work closely with engineering, product, and data teams as part of the development lifecycle.
Validate release readiness and identify regressions related to prompts, models, or data pipelines.

What Makes This Role Different

You are evaluating AI behavior rather than fixed expected outputs.
You help define what correctness and quality mean for AI reasoning over sensitive data.
You actively invent scenarios to stretch and break the system.
Your work directly impacts trust, reliability, and investigative confidence.

Requirements:

What You Should Bring