Jobify - Senior Inference Systems Engineer – KV Cache Optimization משרות דרושים

עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!

במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.

מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.

Lightbits Labs

כפר סבא

Lightbits Labs

כפר סבא
מלאה
30,000-45,000 ₪ הערכה מבוססת AI ולא שכר שהתקבל מהמעסיק
הערכה מבוססת AI ולא שכר של המעסיק

Lightbits is seeking an exceptional Senior Inference Systems Engineer to build advanced infrastructure that improves LLM inference performance through KV cache optimization, offloading, streaming, compression, and scheduling.

In this role, you will work at the intersection of CUDA, GPU architecture, transformer inference, Rust systems programming, and large-scale AI serving platforms. You will design and build systems that intelligently manage KV cache placement across GPU, CPU, storage, and remote memory tiers while maximizing throughput, minimizing latency, and reducing infrastructure costs.

This is a highly hands-on position for someone who enjoys solving deep performance challenges, optimizing every layer of the inference stack, and turning low-level innovations into customer-facing product value. Position based in Israel.

Responsibilities

Design and implement KV cache offloading, streaming, and memory management infrastructure for large-scale LLM serving.
Build cache-aware scheduling systems that determine when to keep, evict, prefetch, stream, compress, decompress, or recompute KV cache blocks.
Optimize inference runtimes such as vLLM and SGLang, including paged attention, prefix caching, schedulers, and cache management systems.
Develop mechanisms that overlap IO operations with attention execution to maximize GPU utilization and minimize latency.
Build high-performance components in Rust, C++, and CUDA for scheduling, cache coordination, telemetry, and inference optimization.
Profile and eliminate bottlenecks across GPU, CPU, memory, networking, storage, and runtime layers.
Design benchmark frameworks and performance tests for long-context, streaming, multi-turn, and high-concurrency workloads.
Measure and improve key inference metrics including TTFT, TBT/ITL, GPU utilization, cache hit rates, and cost per token.
Collaborate closely with Product, Platform, ML, and Engineering teams to deliver production-ready optimization capabilities.

Qulifications and Experience

Strong hands-on experience with CUDA programming and GPU performance optimization.
Deep understanding of transformer inference, attention mechanisms, KV cache architecture, batching, streaming generation, prefill, and decode.
Experience with vLLM, SGLang, TensorRT-LLM, Triton Inference Server, or similar LLM serving frameworks.
Experience designing or optimizing KV cache systems, including cache reuse, eviction, prefix caching, radix caching, or cache offloading.
Strong systems programming skills in Rust, C++, or both.
Strong Python skills for experimentation, benchmarking, and performance analysis.
Experience building performance-sensitive schedulers, async IO systems, or distributed infrastructure.
Strong debugging and profiling skills using tools such as Nsight, CUDA profiling tools, or custom telemetry systems.
Experience with GPUDirect, RDMA, NVMe, cache compression, FlashAttention, paged attention, or distributed inference architectures is a strong advantage.
Bachelor’s or Master’s degree in Computer Science, Software Engineering, Electrical Engineering, or a related field.

שאלות ותשובות עבור משרת Senior Inference Systems Engineer – KV Cache Optimization

מהו התפקיד המרכזי של מהנדס/ת מערכות היסק בכיר/ה – אופטימיזציית מטמון KV ב-Lightbits Labs?

התפקיד המרכזי של מהנדס/ת מערכות היסק בכיר/ה ב-Lightbits Labs הוא לבנות תשתית מתקדמת לשיפור ביצועי היסק של מודלי שפה גדולים (LLM) באמצעות אופטימיזציית מטמון KV, כולל פריקה, הזרמה, דחיסה ותזמון. המטרה היא למקסם תפוקה, למזער חביון ולהפחית עלויות תשתית על ידי ניהול חכם של מיקום מטמון KV על פני GPU, CPU, אחסון וזיכרון מרוחק.

אילו טכנולוגיות וכישורים נדרשים לתפקיד Senior Inference Systems Engineer המתמקד באופטימיזציית מטמון KV?

כיצד תורם מהנדס/ת מערכות היסק בכיר/ה לאופטימיזציית מטמון KV למוצר הסופי ב-Lightbits Labs?

משרות נוספות מומלצות עבורך

Senior Inference Systems Engineer – KV Cache Optimization
- תל אביב - יפו
LightBits

לכל המשרות של Senior Inference Systems Engineer