Member of Technical Staff - Senior ML Engineer

Microsoft Schweiz GmbH - March 6, 2026

Overview

We are seeking a Senior Machine Learning Engineer to bridge the gap between advanced Vision-Language Model (VLM) research and high-performance production serving. Unlike standard data science and engineering roles, this position requires a unique skill set: the ability to design novel VLM architectures—addressing dataset curation and multilingual alignment—while simultaneously optimizing the inference stack through kernel optimization, distillation, and memory management tailored for specific hardware constraints such as NVIDIA H100 and AMD MI300x.

The successful candidate will take ownership of the entire vertical slice—from analyzing the latest arXiv papers and enhancing training sets to writing C++/CUDA kernels that serve the final model in production.

Responsibilities

  • VLM Research & Architecture Design
    • Continuously evaluate and implement the latest research trends in Vision-Language Models, focusing on areas such as Referring Expression Comprehension (REC), Document Understanding (Pix2Struct), and Visual Question Answering (VQA).
    • Design and build large-scale training and evaluation datasets, ensuring multilingual compatibility and comprehensive visual understanding tailored for the European market.
    • Lead the model co-design process by creating architectures that are natively optimized for accelerator capabilities, differentiating between compute-bound and memory-bound operations.
  • Advanced Inference Optimization & Serving
    • Architect high-throughput serving layers using SGLang and vLLM, optimizing for innovative decoding strategies.
    • Conduct scientific experiments to discover the Pareto-optimal balance between serving latency and generation quality.
    • Implement Knowledge Distillation (KD), unstructured pruning, and quantization techniques to adapt large-scale VLM architectures for single-node GPU setups (specifically H100 or MI300x) without compromising model quality.
  • Systems Engineering & Kernel Development
    • Write and optimize custom kernels (CUDA/HIP) to accelerate serving latency by identifying bottlenecks at the operator level.
    • Manage the complete pre-training and post-training tech stack, ensuring seamless integration between model weights and inference engines.
    • Take ownership of deploying the serving-efficient model in a production environment, ensuring reliability and scalability.

Qualifications

Mandatory Requirements (Must Have)

  • Education: Master’s or PhD in Computer Science, Artificial Intelligence, or High-Performance Computing.
  • Experience: Minimum 4+ years in Machine Learning, with a strong focus on both Model Architecture and Systems Optimization.
  • VLM Expertise: Proven experience in building and deploying Vision-Language Models (e.g., architectures similar to CLIP, Flamingo, Pix2Struct), including the creation of custom evaluation sets for tasks like Document Understanding.
  • Serving Stack Proficiency: Expert-level knowledge of SGLang and vLLM for optimized serving.
  • Hardware Specifics: Demonstrable experience in optimizing models for NVIDIA (H100) and AMD (MI300x) accelerators.
  • Optimization Techniques: Practical experience with Knowledge Distillation and Pruning to effectively reduce model latency for targeted serving sizes.
  • Production Engineering: A proven track record of taking complex multi-modal models from research phase to a deployed, user-facing production product.

Apply online using the form below. Please note that only applications matching the job profile will be considered.

Location : Zürich
Country : Switzerland

Application Form

Please enter your information in the following form and attach your resume (CV)

Only pdf, Word, or OpenOffice file. Maximum file size: 3 MB.