Senior Machine Learning Engineer / Senior Machine Learning Engineeress

Microsoft Schweiz GmbH - February 24, 2026

Overview

We are seeking a Senior Machine Learning Engineer to bridge the gap between advanced Vision-Language Model (VLM) research and high-performance production serving. This unique position requires a dual competency: the ability to design novel VLM architectures, including dataset curation and multilingual alignment, as well as optimizing the inference stack—such as kernel optimization, distillation, and memory management—to operate these models within specific hardware constraints (NVIDIA H100 and AMD MI300x). The successful candidate will oversee the entire vertical slice, from reviewing the latest research papers on arXiv to enhancing training sets and developing the C++/CUDA kernels that enable the final model to function in a production environment.

Responsibilities

  • VLM Research & Architecture Design
    • Continuously evaluate and implement the latest research trends in Vision-Language Models, with a focus on Referring Expression Comprehension (REC), Document Understanding (Pix2Struct), and Visual Question Answering (VQA).
    • Design and develop massive-scale training and evaluation datasets, ensuring multilingual compatibility and addressing broader visual understanding tailored for the European market.
    • Lead the model co-design process to create architectures that are natively optimized for accelerator capabilities, considering both compute-bound and memory-bound operations.
  • Advanced Inference Optimization & Serving
    • Architect high-throughput serving layers using SGLang and vLLM, optimizing for non-standard decoding strategies.
    • Implement scientific experiments to discover the Pareto-optimal frontier between serving latency and generation quality.
    • Execute Knowledge Distillation (KD), unstructured pruning, and quantization techniques to fit large-scale VLM architectures onto single-node GPU setups (specifically H100 or MI300x) without compromising model quality.
  • Systems Engineering & Kernel Development
    • Write and optimize custom kernels (CUDA/HIP) to accelerate serving latency, targeting bottlenecks at the operator level.
    • Manage the complete pre-training and post-training tech stack, ensuring seamless integration between model weights and inference engines.
    • Take ownership of deploying a serving-efficient model in a production environment, ensuring both reliability and scalability.

Qualifications

  • Mandatory Requirements (Must Have)
    • Education: Master’s or PhD in Computer Science, Artificial Intelligence, or High-Performance Computing.
    • Experience: Minimum 4+ years of experience in Machine Learning, with a strong focus on both Model Architecture and Systems Optimization.
    • VLM Expertise: Proven experience building and deploying Vision-Language Models (e.g., architectures similar to CLIP, Flamingo, Pix2Struct). Experience in creating custom evaluation sets for tasks like Document Understanding is essential.
    • Serving Stack Proficiency: Expert-level knowledge of SGLang and vLLM for optimized serving.
    • Hardware Specifics: Demonstrable experience optimizing models for NVIDIA (H100) and AMD (MI300x) accelerators.
    • Optimization Techniques: Hands-on experience with Knowledge Distillation and Pruning to reduce model latency for target serving sizes.
    • Production Engineering: A proven track record of transitioning complex multi-modal models from research code to a deployed, user-facing production product.

Apply online using the form below. Please note that only applications matching the job profile will be considered.

Location : Zürich
Country : Switzerland

Application Form

Please enter your information in the following form and attach your resume (CV)

Only pdf, Word, or OpenOffice file. Maximum file size: 3 MB.