עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
:Responsibilities
Design, develop and apply state-of-the-art techniques for evaluating and validating AI agents and/or workflows.
Develop and implement LLM-as-a-Judge (or similar) for different tasks and roles for GenAI systems and tools.
Design and implement evaluation pipelines and benchmark datasets for evaluating model quality, relevance and system consistency for various applications.
Optimize and maintain judge LLMs to evaluate outputs for different use cases such as chatbots, RAG systems, cybersecurity experts and investigators.
Define evaluation KPIs and metrics for both models, systems and tools.
Validate and optimize datasets for various use cases.
Ensure the reliability, efficiency, and scalability of evaluation tools and pipelines for both online and offline use cases.
Work closely with AI/ML engineers to make evaluations a part of the production pipelines of GenAI applications.
Collaborate with cross-functional teams including product, research and data science.
Stay up to date with the latest developments in AI, machine learning, focusing on LLMs, exploring how emerging technologies can be applied to improve our evaluation and validation pipelines.
Advanced knowledge and experience in NLP and use of LLMs for GenAI applications in production at scale.
Strong experience in designing end-to-end R&D plans for GenAI including evaluation and validation lifecycle and benchmarking.
Strong proficiency in Python
Solid understanding of Data Science and Machine Learning lifecycle and best practices evaluating and validating AI systems at scale.
Excellent problem-solving abilities, coupled with a creative and strategic mindset.
Proven ability to work effectively in a team setting.
Advantages:
Experience with EDD (evaluation driven development) for GenAI applications.
Familiarity with cybersecurity applications of GenAI.
Advanced skills in performance optimization for high throughput systems.
Tech Stack:
Python, Langchain, Langgraph (or other agentic frameworks), Langfuse/LangSmith (or other observability and tracing tools), HuggingFace, Mlflow, MongoDB
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
ערב