עדיין מחפשים עבודה במנועי חיפוש? הגיע הזמן להשתדרג!
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
AI Accelerator Software Engineer – Silicon Software & Low-Level AI
Most GPU engineers work within the limits of what NVIDIA decided.
Here, you decide the limits.
GSI Technology (NASDAQ: GSIT) is developing Gemini2 — an Associative Processing Unit built for ultra-low latency, high-parallelism AI execution. We're not building on top of someone else's stack. We're building the stack — and we need engineers who've been waiting for exactly this kind of problem.
🔬 The gap you'll close
Between modern AI models and novel compute-in-memory hardware lies a space that PyTorch can't see and CUDA can't reach — memory access patterns, DMA flows, instruction scheduling, and execution strategies that simply don't have a reference implementation yet.
That's your domain.
⚙️ What you'll build
Highly optimized compute kernels for Transformer inference, LLM/VLM execution, FFTs, OpenCV pipelines, and Edge AI workloads
Memory access patterns, DMA utilization, and instruction scheduling — tuned for silicon that didn't exist two years ago
Performance analysis pipelines using profilers, traces, and hardware analyzers — and then fix what you find
Benchmarking infrastructure, internal tooling, and testing frameworks
Work directly with Architecture, Compiler, and AI teams — your kernel-level decisions shape how the next version of the chip gets designed
✅ What we need
B.Sc./M.Sc. in CS, EE, or equivalent
6+ years in low-level C/C++: embedded, firmware, accelerator, systems, or performance-critical software
Deep understanding of:
Memory hierarchies, caches, DMA, and bandwidth optimization
Parallel execution and performance-critical code
Hardware-aware algorithm optimization
Bit-level and systems-oriented reasoning
⭐ Strong bonus if you bring
GPU / NPU / DSP / FPGA or custom accelerator programming
Assembly or low-level programming experience
Compute kernel, firmware, or driver development
AI inference optimization or deep learning infrastructure
Profiling, tracing, and performance-debug experience
🎯 You're likely a strong fit if you've ever...
Written CUDA or HIP kernels — and wanted to go deeper than the driver allows
Spent days hunting a 3% latency regression in embedded firmware and felt satisfied when you found it
Looked at a DMA controller spec and felt curious, not scared
Worked on DSP algorithms and wondered what it'd feel like to do it for AI workloads
Had opinions about both sides of a hardware/software interface
📍 Tel Aviv, Ramat Hahayal | Full-Time | Hybrid
💰 Competitive compensation + (NASDAQ: GSIT)
Not sure if your background is the right fit? Reach out— we'd rather have the conversation.
במקום לעבור לבד על אלפי מודעות, Jobify מנתחת את קורות החיים שלך ומציגה לך רק משרות שבאמת מתאימות לך.
מעל 80,000 משרות • 4,000 חדשות ביום
חינם. בלי פרסומות. בלי אותיות קטנות.
תל אביב - יפו
בוקר