AI Research Engineer – Multimodal Systems (Vision + Language)
Remote (Toronto, Canada – Hybrid Option Available)
AI | Multimodal Models | Deep Learning
About the Role
We’re looking for an AI Research Engineer to work on multimodal AI systems combining computer vision and large language models.
You’ll develop models that understand and reason across text, images, and structured data, contributing to next-generation AI systems used in advanced applications such as intelligent assistants, robotics, and enterprise knowledge systems.
This is a high-impact, research-meets-production role.
Key Responsibilities
- Design and train multimodal models (vision + language)
- Work on architectures combining transformers, embeddings, and cross-modal learning
- Optimise models for performance, scalability, and inference efficiency
- Build evaluation frameworks for model accuracy and robustness
- Collaborate with engineering teams to deploy models into production
Required Skills & Experience
- 5+ years in AI / machine learning engineering or applied research
- Strong Python and deep learning frameworks (PyTorch preferred)
- Experience with transformer architectures and LLMs
- Experience with computer vision models (CNNs, ViTs, detection/segmentation)
- Strong understanding of model training, evaluation, and optimisation
Nice to Have
- Experience with multimodal models (CLIP, BLIP, etc.)
- Experience with distributed training or large-scale datasets
- Familiarity with CUDA / GPU optimisation
Why Join?
- Work on frontier AI systems at the intersection of vision and language
- High technical ownership and research depth
- Hybrid flexibility in a leading AI hub (Toronto)
- Opportunity to shape next-generation AI capabilities