LLM Inference Engineer

עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.

מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.

Majestic Labs

תל אביב - יפו

Majestic Labs

תל אביב - יפו
מלאה
25,000-40,000 ₪ הערכה מבוססת AI ולא שכר שהתקבל מהמעסיק
זוהי הערכת טווח שכר מבוססת AI ולא שכר שהתקבל מהמעסיק

The Role

In this high-impact role, you are the bridge between cutting-edge custom silicon and production-grade AI. You will own the end-to-end LLM serving stack on Majestic hardware, architecting everything from serving APIs down to KV cache management, batching, and scheduling. Your primary mission is to port leading frameworks like vLLM and SGLang to our accelerator and optimize them for peak performance. Because our architecture offers memory headroom, you won't just match traditional GPUs; you will shatter their limits on throughput, batch sizes, and context lengths. As you hunt down bottlenecks, your insights will directly steer our future kernel, compiler, and hardware development.

What You'll Own

The serving stack, end to end — bring up and adapt a modern inference framework (vLLM, SGLang, or similar) to run on Majestic hardware.
The runtime hot path — continuous batching, the scheduler, paged KV cache, and prefill/decode disaggregation.
Distributed inference at scale — tensor, pipeline, and expert parallelism across accelerators, wired into our collective communication library (CCL).
The multi-modal pipeline — image, audio, and video preprocessing, encoder integration, and mixed-modality batching.
Inference-time techniques — speculative decoding, prefix caching, and structured decoding.
End-to-end performance — profile, benchmark, and hunt down bottlenecks across the full serving path, feeding findings back to the kernel, compiler, and hardware teams.

Requirements:

What We're Looking For

3+ years building or operating production LLM inference and serving systems (5+ preferred).
Deep, hands-on work with a modern inference framework vLLM, SGLang, TensorRT-LLM, Fireworks, or similar including its scheduler, paged attention / KV cache, model executor, and backend integration points.
Strong Python and C++, with the ability to move fluidly between the two.
A real grasp of transformer inference the prefill/decode split, KV cache behavior, and how batching dynamics shape latency and throughput.
Distributed inference experience tensor and pipeline parallelism across multiple devices.
An instinct for performance you can profile an end-to-end stack and chase a regression from the serving API all the way down to the kernel.

שאלות ותשובות עבור משרת LLM Inference Engineer

מהו התפקיד המרכזי של מהנדס/ת LLM Inference ב-Majestic Labs בפיתוח חומרת AI מותאמת אישית?

מהנדס/ת LLM Inference ב-Majestic Labs יגשר/תגשר בין חומרת סיליקון מותאמת אישית לבין AI ברמת ייצור. התפקיד כולל בעלות על ערימת שירותי ה-LLM מקצה לקצה בחומרת Majestic, אדריכלות הכל החל מממשקי API ועד לניהול מטמון KV, אצווה ותזמון, עם משימה עיקרית של התאמת אופטימיזציה של פריימוורקים מובילים כמו vLLM ו-SGLang למאיץ החברה.

אילו טכניקות אופטימיזציה וביצועים יטופלו על ידי מהנדס/ת LLM Inference בפרויקט שירותי ה-LLM ב-Majestic Labs?

מהם הכישורים הנדרשים לתפקיד מהנדס/ת LLM Inference ב-Majestic Labs, במיוחד בהקשר של מערכות הסקה ושרתים?

משרות נוספות מומלצות עבורך

לכל המשרות של מהנדס למידת מכונה

עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

LLM Inference Engineer