Machine Learning Developer/Lead

Meta Black

Toronto, CA; US

On-site

Job Description

Looking for Mid- Senior Level 6+

Multiple Job Openings on ML/AI roles - Toronto, Canada

Please email your resume to bhavya@metablackllc.com

Job Description: Machine Learning Developer

Role Overview

We are seeking a Machine Learning Developer to design, build, and deploy ML solutions that turn data into measurable business impact. This is a hands-on engineering role focused on developing end-to-end ML pipelines—data preparation, feature engineering, model training, evaluation, and production deployment—using Python and an open-source AI/ML stack. You will collaborate with data engineering and platform teams and work in environments that may include Databricks and Spark for scalable data processing and model operations.

Key Objectives

Deliver production-grade ML models and data products from discovery through deployment.

Build repeatable, maintainable ML engineering patterns for training, evaluation, and inference.

Improve model quality, reliability, and performance through robust testing, monitoring, and iteration.

Partner with data and platform teams to leverage scalable compute and data platforms (including Databricks/Spark) while meeting security and governance requirements.

Primary Responsibilities

Design, develop, and iterate on machine learning models for classification, regression, clustering, recommendation, forecasting, and/or NLP use cases as needed.

Build end-to-end ML pipelines in Python: data ingestion and preparation, feature engineering, training, evaluation, and batch/real-time inference.

Apply sound experimentation practices: baselines, ablation studies, cross-validation (as applicable), and clear success metrics aligned to business outcomes.

Develop and maintain reusable ML code (packages, utilities, pipelines) with strong software engineering practices (tests, code review, documentation, CI/CD).

Implement model evaluation and testing: offline benchmarks, data/label quality checks, reproducible training runs, and regression tests to prevent performance degradation.

Operationalize MLOps: model versioning, experiment tracking, model registry, automated deployments, and monitoring for drift, bias, latency, and cost.

Integrate ML services with product systems via APIs and event-driven patterns; collaborate on feature stores, data contracts, and production SLAs.

Leverage open-source AI/ML components (e.g., scikit-learn, PyTorch/TensorFlow, XGBoost/LightGBM, Hugging Face ecosystem) and choose the right tool for accuracy, latency, and maintainability.

Collaborate with data engineering and platform teams to use Databricks/Spark for large-scale ETL, feature computation, distributed training (where relevant), and scheduled jobs.

Ensure solutions follow security, privacy, and responsible AI practices, including safe handling of sensitive data and auditability of model decisions.

Required Skills & Experience

Strong software engineering experience in Python (clean architecture, API design, testing, packaging, performance tuning).

Hands-on experience building and deploying machine learning models in production environments.

Proficiency with common ML libraries and frameworks (e.g., scikit-learn, PyTorch or TensorFlow; XGBoost/LightGBM as applicable).

Experience with data processing in Python (e.g., pandas, NumPy) and strong SQL fundamentals.

Understanding of ML concepts (bias/variance, regularization, feature leakage, evaluation metrics, calibration) and ability to select appropriate metrics for the use case.

Experience with MLOps practices and tooling (e.g., MLflow or equivalent), including experiment tracking, model versioning, and reproducible training.

Experience deploying services (Docker, CI/CD) and operating them with monitoring/observability practices.

Ability to communicate tradeoffs clearly—balancing accuracy, latency, cost, reliability, and risk.

Preferred / Nice to Have

Awareness of Databricks concepts (workspaces, notebooks, jobs, clusters) and practical experience with Spark for large-scale data processing.

Experience with Databricks MLflow Model Registry and/or Unity Catalog (or similar governance) for managing models, features, and controlled data access.

Experience with feature stores, data versioning, and data quality frameworks.

Experience with model serving and optimization (e.g., FastAPI, TorchServe, ONNX, quantization, batching, caching).

Familiarity with modern open-source LLM and embeddings ecosystem (e.g., Hugging Face Transformers, sentence-transformers) and applying them to NLP tasks when relevant.

Experience with cloud ML services and distributed training patterns (Ray, Spark ML, Horovod, or similar).

Experience implementing responsible AI practices (privacy, explainability, robustness, and security controls).

Skills & Requirements

Technical Skills

PythonScikit-learnPytorchTensorflowXgboostLightgbmHugging faceDatabricksSparkSqlPandasNumpyMachine learningData engineeringPlatform teams

Employment Type

FULL TIME

Level

mid

Posted

4/10/2026

Continue to LinkedIn

You will be redirected to the job posting on LinkedIn.