Generative Computer Vision Researcher (VLMs, CLIP, Open Vocabulary)

חזרה למשרות

Argu AI LinkedIn

תל אביב - יפו

Computer Vision Researcher (VLMs, Open Vocabulary, CLIP)

Location: Hybrid (Israel-based preferred)

Job Type: Part-Time / Flexible (with potential for full-time)

Overview

Wwe’re seeking a generative natural-language computer vision researcher with deep hands-on expertise in VLMs like CLIP, BLIP, LLaVA and Flamingo, a strong foundation in embedding-space design, contrastive learning and transformer-based encoder/decoder architectures, proven experience developing prompt-to-detect pipelines and benchmarking zero-shot, few-shot and open-vocabulary detection on video data, proficiency in Python and PyTorch/TensorFlow alongside LLM API integration for multimodal workflows

Perfect for advanced students (M.Sc. or Ph.D. candidates) or AI researchers eager to dive into real-world applications and make an impact fast.

What You’ll Do

Research:

· Read, distill, and apply academic and industry research (CVPR, ICCV, NeurIPS, arXiv)

· Research and prototype new ways to connect vision models with LLMs.

· Contribute to IP, publications, research papers, and internal innovation.

· Collaborate with our founding tech team to take ideas from paper to product.

· Analyze and benchmark zero-shot, few-shot, and open-vocabulary detection methods

· Build tools for semantic video search

· Develop object grounding, re-identification, and NLP-related CV technologies.

· Experiment with vision-language models, embeddings, and generative CV.

· Run benchmarks and test ideas in real-world environments, on real-world data.

Requirements

· 6+ years of background in computer vision, ML, AI, & multimodal learning

· Hands-on experience with generative AI, LLMs, VLMs, algorithms and coding.

· Curiosity and passion - you love exploring, building, and learning, even on your free time.

· M.Sc. or Ph.D preferred - academic background in CS/EE/Math/related.

· Bonus: Experience with NVIDIA Jetson, edge-AI, or real-world CV/CCTV systems.

Required Knowledge

· Vision-Language Models (CLIP, BLIP, LLaVA, etc.)

· Embedding models & semantic search in video

· Prompt-to-detect, zero/few-shot detection

· Scene understanding, re-identification, multi-camera setups

· Experience working with encoders/decoders, transformers, or contrastive learning frameworks

· Detection under challenging conditions (low-light, thermal, etc.)

Beyond the skills, we’re searching for someone who:

· Is genuinely excited about AI, vision, and building something groundbreaking.

· Brings curiosity, initiative, and a collaborative spirit to every challenge.

· Is the kind of person we’d love to grab a beer (or coffee) with - someone fun, thoughtful, and great to have on the team.

רוצה לראות עוד משרות מתאימות? Jobify מנתחת את הניסיון התעסוקתי שלך ומציגה לך משרות עדכניות - בחינם!

נשמע מעניין, רוצה לנסות!