עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

במקום לחפש לבד בין מאות מודעות – תנו ל-Jobify לנתח את קורות החיים שלכם ולהציג לכם רק הזדמנויות שבאמת שוות את הזמן שלכם מתוך מאגר המשרות הגדול בישראל.
השימוש חינם, ללא עלות וללא הגבלה.

הגישו קו”ח דרך Jobify

Cloud & MLOps Engineer

Mentaily

הגישו קו”ח דרך Jobify

Mentaily

אור יהודה

Cloud & MLOps Engineer

Mentaily

הגישו קו”ח דרך Jobify

Mentaily

אור יהודה

Location: Park Naimi (https://www.parknaimi.co.il/) - Hybrid

About Mentaily

Mentaily is an early-stage GenAI mental health startup on a mission to revolutionize how people access emotional support. Backed by industry veterans and top-tier tech, we are redefining digital mental health with personalized, empathetic AI agents, combining cutting-edge LLM technologies with clinical best practices.

About the Role:

We are looking for a Cloud & MLOps Engineer who will be responsible for maintaining, evolving, and scaling Mentaily’s platform infrastructure. Your role includes managing cloud environments, overseeing CI/CD processes, and leading the deployment of machine learning models.

Responsibilities:

Maintain Mentaily’s cloud infrastructure and manage scaling needs
Build MLOps pipelines for training and deploying models securely and reproducibly
Manage IaC (Terraform), cloud budgeting, and cost optimization
Set up monitoring and logging systems (Datadog, Prometheus, etc.)

Requirements:

3+ years of experience in DevOps/Cloud engineering
Expertise in cloud environments (preferably Azure), Terraform, Kubernetes, Docker
Experience with model deployment workflows (e.g.: Vertex AI, Sagemaker, MLflow)

Bonus:

Understanding of healthcare cloud compliance (HIPAA, SOC2)
Background in ML infrastructure scaling
Familiarity with mental health tech or related sensitive domains

Job Title: Site Reliability Engineer (SRE)

Location: Park Naimi (https://www.parknaimi.co.il/) - Hybrid

About Mentaily

About the Role

We are looking for a Site Reliability Engineer (SRE) who thrives in startup environments and is passionate about building and maintaining reliable, secure, and scalable systems. As our first dedicated SRE, you will own the uptime, performance, and incident response across our GenAI systems, data pipelines, and cloud infrastructure.

You’ll work closely with backend engineers, ML engineers, and product teams to ensure that our LLM-driven systems are resilient and performant in production — across real-time and batch use cases.

Key Responsibilities

Design and implement robust monitoring, alerting, and incident response systems across all infrastructure components (API, LLMs, databases, message queues).
Build and maintain CI/CD pipelines, infrastructure as code (IaC), and automated deployments.
Ensure high availability and scalability of LLM workloads (e.g., OpenAI, Azure OpenAI, or custom-hosted models).
Partner with engineering and ML teams to define and uphold SLOs/SLAs for core services.
Lead root cause analysis and postmortem processes, and drive reliability-focused improvements.
Champion security best practices, secrets management, and compliance efforts (HIPAA/GDPR alignment if applicable).
Manage cloud environments (preferably Azure) and Kubernetes (or serverless if used).

Requirements

3+ years experience in DevOps / SRE / Infrastructure Engineering roles.
Strong experience with cloud providers (preferably Microsoft’s Azure).
Proficiency in Terraform, Kubernetes, Docker, and CI/CD tooling (e.g., GitHub Actions, ArgoCD, CircleCI).
Strong background in system monitoring and observability (e.g., Prometheus, Grafana, Datadog, ELK).
Familiarity with deploying and scaling LLMs or ML services.

Bonus:

Experience with GenAI / NLP workloads and GPU infrastructure.
Familiarity with data privacy and security practices in healthcare or mental health tech.
Ability to work independently and in close collaboration with product and engineering teams.
Comfortable in a fast-paced startup environment with frequent iteration.

הגישו קו”ח דרך Jobify