MLOps Team Lead

עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.

מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.

AI21

תל אביב - יפו

AI21

תל אביב - יפו
מלאה, היברידית
45,000-70,000 ₪ הערכה מבוססת AI ולא שכר שהתקבל מהמעסיק
הערכה מבוססת AI ולא שכר של המעסיק

Description:

We are looking for an exceptional MLOps Team Lead to own, build, and scale the infrastructure and automation that powers AI21 Labs’ state-of-the-art Large Language Models (LLMs) and AI systems.

This is a technical leadership role that blends hands-on engineering with strategic vision. You will define MLOps best practices, build high-performance ML infrastructure, and lead a world-class team working at the intersection of AI research and production-grade ML systems.

You will work closely with LLM Algorithm Researchers, ML Engineers, and Data Scientists to enable fast, scalable, and reliable ML workflows – covering everything from distributed training to real-time inference optimization.

If you have deep technical expertise, thrive in high-scale AI environments, and want to lead the next generation of MLOps, we want to hear from you.

Role and Responsibilities:

MLOps Infrastructure & Automation

Architect and maintain scalable, self-service ML pipelines, CI/CD workflows, and orchestration frameworks (Kubeflow, MLflow, Airflow).
Design high-scale distributed training environments, leveraging multi-GPU/TPU clusters and parallelization strategies.
Optimize ML workflows for speed, scalability, and cost efficiency across cloud (AWS/GCP) and on-prem environments.

Model Deployment & Real-Time Inference

Build ultra-low-latency, high-throughput inference architectures optimized for LLMs at scale.
Implement A/B testing, canary releases, and rollback mechanisms for model deployment.
Develop robust monitoring, logging, and alerting solutions for model performance, drift detection, and reliability.

Cloud & Compute Optimization

Lead the design and scaling of multi-cloud ML infrastructure using Kubernetes, Terraform, and ArgoCD.
Optimize GPU/TPU utilization, autoscaling, and resource allocation to maximize efficiency.
Build and manage feature stores, data pipelines, and large-scale storage solutions.

Leadership & Cross-Team Collaboration

Work closely with LLM researchers, ML engineers, and platform teams to align MLOps infrastructure with cutting-edge AI research and real-world deployment needs.
Define and enforce best practices for model governance, security, and compliance.

Mentor and grow a high-performing MLOps team, driving a culture of technical excellence, automation, and continuous improvement.

Requirements:

3+ years of experience in MLOps, ML infrastructure, or AI platform engineering.
2+ years of hands-on experience in ML pipeline automation, large-scale model deployment, and infrastructure scaling.
Expertise in deep learning frameworks (like PyTorch, TensorFlow, JAX) and MLOps platforms (like Kubeflow, MLflow, TFX).
Proven track record of building production-grade ML systems that scale to billions of predictions daily.
Deep knowledge of Kubernetes, cloud-native architectures (AWS/GCP), and infrastructure as code (Terraform, Helm, ArgoCD).
Strong software engineering skills in Python, Bash, and Go, with a focus on writing clean, maintainable, and scalable code.
Experience with observability & monitoring stacks (Prometheus, Grafana, Datadog, OpenTelemetry).
Strong background in security, compliance, and model governance for AI/ML systems.

Leadership & Execution

Proven ability to lead high-impact engineering teams in a fast-paced AI environment.
Ability to drive technical strategy while remaining hands-on in critical areas.
Strong cross-functional collaboration skills, working closely with research and engineering teams.
Passion for automation, efficiency, and designing scalable self-service MLOps solutions.
Experience in mentoring and coaching engineers, fostering a culture of innovation and continuous learning.

It Would Be Great If You Have:

Experience working with LLMs and large-scale generative AI models in production.
Expertise in optimizing model inference latency and cost at scale.
Contributions to open-source MLOps tools or AI infrastructure projects.