Computer Vision Researcher (VLMs, Open Vocabulary, CLIP)
Location: Hybrid (Israel-based preferred)
Job Type: Part-Time / Flexible (with potential for full-time)
Overview
Wwe’re seeking a generative natural-language computer vision researcher with deep hands-on expertise in VLMs like CLIP, BLIP, LLaVA and Flamingo, a strong foundation in embedding-space design, contrastive learning and transformer-based encoder/decoder architectures, proven experience developing prompt-to-detect pipelines and benchmarking zero-shot, few-shot and open-vocabulary detection on video data, proficiency in Python and PyTorch/TensorFlow alongside LLM API integration for multimodal workflows
Perfect for advanced students (M.Sc. or Ph.D. candidates) or AI researchers eager to dive into real-world applications and make an impact fast.
What You’ll Do
Research:
· Read, distill, and apply academic and industry research (CVPR, ICCV, NeurIPS, arXiv)
· Research and prototype new ways to connect vision models with LLMs.
· Contribute to IP, publications, research papers, and internal innovation.
· Collaborate with our founding tech team to take ideas from paper to product.
· Analyze and benchmark zero-shot, few-shot, and open-vocabulary detection methods
· Build tools for semantic video search
· Develop object grounding, re-identification, and NLP-related CV technologies.
· Experiment with vision-language models, embeddings, and generative CV.
· Run benchmarks and test ideas in real-world environments, on real-world data.
Requirements
· 6+ years of background in computer vision, ML, AI, & multimodal learning
· Hands-on experience with generative AI, LLMs, VLMs, algorithms and coding.
· Curiosity and passion - you love exploring, building, and learning, even on your free time.
· M.Sc. or Ph.D preferred - academic background in CS/EE/Math/related.
· Bonus: Experience with NVIDIA Jetson, edge-AI, or real-world CV/CCTV systems.
Required Knowledge
· Vision-Language Models (CLIP, BLIP, LLaVA, etc.)
· Embedding models & semantic search in video
· Prompt-to-detect, zero/few-shot detection
· Scene understanding, re-identification, multi-camera setups
· Experience working with encoders/decoders, transformers, or contrastive learning frameworks
· Detection under challenging conditions (low-light, thermal, etc.)
Beyond the skills, we’re searching for someone who:
· Is genuinely excited about AI, vision, and building something groundbreaking.
· Brings curiosity, initiative, and a collaborative spirit to every challenge.
· Is the kind of person we’d love to grab a beer (or coffee) with - someone fun, thoughtful, and great to have on the team.
רוצה לראות עוד משרות מתאימות? Jobify מנתחת את הניסיון התעסוקתי שלך ומציגה לך משרות עדכניות - בחינם!