Job Description:
Innovative AI Solutions is seeking a skilled AI/ML Systems Engineer to join our growing team. As part of our AI and Machine Learning division, you'll play a critical role in designing, optimizing, and deploying large-scale AI/ML systems that drive cutting-edge products and services. This role requires a strong understanding of both the software and hardware aspects of AI systems, as well as the ability to integrate and optimize various components to create high-performance solutions.
Key Responsibilities:
- System Architecture Design: Collaborate with data scientists, software engineers, and hardware teams to design the overall architecture for AI/ML systems. Ensure that machine learning models, data pipelines, and computational resources are optimized and scalable.
- Infrastructure Integration: Work on integrating AI models with infrastructure, including cloud environments (AWS, Azure, GCP), on-premise servers, and edge devices. Focus on making sure the system runs efficiently in production at scale.
- Data Pipeline Optimization: Design and optimize data pipelines for machine learning workflows. Ensure data is processed efficiently from raw collection through preprocessing to model input.
- Model Deployment: Lead efforts in deploying machine learning models to production environments, ensuring they operate reliably and at scale. Work with teams to integrate models with real-time and batch systems.
- Performance Tuning: Identify bottlenecks in AI/ML workflows and optimize system performance. Focus on improving latency, scalability, and energy efficiency.
- Monitoring and Maintenance: Implement systems to monitor the performance of AI models in production, ensuring that the models continue to meet performance standards. Develop strategies for model retraining and updates.
- Cross-Disciplinary Collaboration: Work with data scientists on model improvements and with software engineers on system-level integration, deployment pipelines, and testing.
- Documentation: Maintain clear and comprehensive documentation for system architectures, workflows, and deployed solutions, ensuring cross-team understanding.
Required Skills and Qualifications:
- Bachelor’s or Master’s degree in Systems Engineering, Computer Science, Electrical Engineering, or a related field.
- 3+ years of experience in systems engineering, ideally with a focus on AI/ML, distributed computing, or high-performance computing.
- Experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Proficiency in programming languages such as Python, C++, or Java.
- Strong understanding of machine learning models, data pipelines, and AI infrastructure.
- Experience with machine learning frameworks like TensorFlow, PyTorch, or MXNet.
- Knowledge of distributed systems, networking, and high-performance computing (HPC).
- Excellent problem-solving skills and ability to optimize systems for performance, scalability, and reliability.
Preferred Skills:
- Experience with edge AI deployment (on IoT devices, mobile platforms, or embedded systems).
- Familiarity with real-time AI systems (e.g., autonomous systems, real-time decision-making).
- Strong knowledge of model deployment tools such as MLflow, TensorFlow Serving, or Seldon.
- Understanding of AI-related performance metrics such as latency, throughput, and model accuracy.
Why Join Us?:
- Be part of a dynamic team working on innovative AI technologies.
- Competitive salary, benefits, and professional development opportunities.
- Opportunity to contribute to high-impact, real-world AI/ML applications in industries like healthcare, finance, and autonomous vehicles.
- Work in a collaborative and forward-thinking environment where you can grow your skills in both AI/ML and systems engineering.
NOT AVAILABLE FOR C-C