AI Applied Scientist / AI Applied Scientistess

Microsoft Schweiz GmbH - February 9, 2026

Overview

The Spatial AI Lab is part of the Applied Sciences Group, a Microsoft research and development organization dedicated to creating next-generation human-computer interaction technologies. Our focus leverages the latest advancements in AI while exploring new hardware capabilities and device form factors. Our team of scientists and engineers boasts strong expertise in computer vision, multi-modal AI, and spatial and embodied AI.

Your primary responsibility will be to help develop intelligent systems for innovative agents by training and refining multimodal AI models. This role provides a unique opportunity to enhance your experience in building and deploying AI models for Microsoft products and large-scale AI systems. You will also engage in cutting-edge research in collaboration with esteemed partners, such as ETH Zurich, to publish in top-tier venues, present at workshops, and mentor students.

At Microsoft, our mission is to empower every person and organization on the planet to achieve more. We foster a culture of growth, innovation, and collaboration, built on the values of respect, integrity, and accountability. Our inclusive environment enables everyone to thrive both at work and beyond.

Responsibilities

  • Research novel machine learning algorithms and models.
  • Work on pre and/or post-training of foundational multimodal models.
  • Build data and learning solutions for scalability, efficiency, and performance.
  • Curate training and evaluation datasets/benchmarks.
  • Optimize models for CPUs, GPUs, and NPUs, and integrate them into products.
  • Collaborate across Microsoft research and engineering teams.

Qualifications

Required Qualifications:

  • PhD in Machine Learning / Computer Vision, or 3+ years of relevant industry experience.
  • Proficiency in programming languages such as Python and/or C++.
  • Hands-on experience with modern deep learning frameworks (e.g., Pytorch, Tensorflow, Jax).
  • Self-motivated team player, adept problem solver, and eager to learn.
  • Ability to present complex technical concepts to a diverse audience.

Preferred Qualifications:

  • Experience in one or more of the following areas:
    • Hands-on experience with multimodal models, including pre and/or post-training of large vision-language models.
    • Knowledge of techniques such as pruning, distillation, and fine-tuning.
    • Familiarity with large language models (LLMs) and large vision-language models (VLMs).
    • Experience with video generative models and diffusion algorithms.
    • Understanding of action-based transformers and Vision Language Action models (VLAs).
    • Experience with large-scale machine learning compute systems.
    • Proven track record of impact through research publications at leading conferences (NeurIPS, ICML, CVPR, ECCV, ICCV) or significant industry contributions.

To apply, please apply online using the form below. Only applications matching the job profile will be considered.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations, and ordinances.

Location : Zürich
Country : Switzerland

Application Form

Please enter your information in the following form and attach your resume (CV)

Only pdf, Word, or OpenOffice file. Maximum file size: 3 MB.