עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
Keep the AI running. Through every spike, every release, every Monday morning.
Somewhere on the planet right now, someone is paying real money to use software built by the company hiring for this role. Some of that software is powered by large language models — and when those models go down, get slow, or burn budget, customers feel it instantly. We're looking for a senior production engineer with the SRE instincts to keep AI dependable at scale — someone who's run hardware-accelerated workloads in production and has shipped LLM-powered systems through the highs and lows of real traffic.
This isn't a research role. It isn't a model-training role. It's the SRE / production-reliability side of AI Engineering: GPUs, Kubernetes, SLOs, observability, incident response — applied to large language models in production. You've kept high-scale systems alive on hardware before; we're hiring you to do it for the AI layer of our product.
What you'll own:
- The reliability and scalability of LLM serving in production — uptime, latency percentiles, cost per million tokens
- Operating modern AI infrastructure on GPUs and Kubernetes with real SRE discipline (capacity planning, autoscaling, blast-radius control)
- SLOs, observability (TTFT, tokens/sec, error budgets), load testing, and incident response for AI workloads
- Hardening the serving stack (vLLM / TensorRT-LLM / Triton) against traffic spikes, noisy neighbors, and the rough edges of GPU operations
- Partnering with product and engineering teams to ship AI features that stay up under real load
What we're looking for:
- Senior SRE / production-engineering background — strong track record of running services at scale through the messy reality of incidents and growth
- Hands-on experience with hardware-accelerated workloads in production — GPUs (NVIDIA), distributed training/serving infrastructure, or equivalent (TPUs, accelerators)
- Real LLM context — you've shipped or operated LLM-powered systems and you understand how they fail differently from a normal service
- Production cloud + Kubernetes at scale, with the observability and capacity-planning chops to match
- Judgment — you've made the calls on architecture, SLOs, and trade-offs before and you've been right more than wrong
Nice to have:
- Direct experience with modern LLM serving stacks (vLLM, TensorRT-LLM, Triton, Ray Serve)
- Multi-GPU / multi-node serving experience; familiarity with quantization, batching, and inference cost optimization at the operational layer
- Prior staff/lead SRE or platform-leadership experience
- Open-source contributions to AI infrastructure
Come keep the production layer of the AI revolution running. Real users. Real scale. Real pagers (and real on-call rotation discipline).
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.