Join the Microsoft 365 Copilot Team
Microsoft 365 Copilot is revolutionizing productivity by seamlessly integrating large language models with user data, Microsoft Graph, and the web. At the forefront of this innovation is the Substrate Intelligence Platform (DSX) team, which powers personalized, secure, and scalable Copilot experiences across Microsoft 365—encompassing Teams, Word, Excel, PowerPoint, OneNote, and beyond.
Our team is pioneering the infrastructure for tenant-isolated fine-tuning, a foundational capability that allows customers to safely personalize Copilot agents using their own data. This innovative approach supports leading OpenAI models (e.g., GPT-5, O4 Mini) as well as open-source models such as Qwen, Mistral, and GPT-OSS.
We manage the end-to-end fine-tuning platform via Heron, which includes:
- Data extraction and isolation
- Secure training and evaluation workflows
- Model deployment, migration, and lifecycle management
Our systems operate at massive scale within multi-tenant environments, enforcing strict security and compliance boundaries, managing shared GPU resources effectively, and enabling the seamless onboarding of new models and customers.
About the Role
As a Principal Software Engineer, you will assume a critical technical leadership role in shaping the next generation of Copilot’s fine-tuning and evaluation infrastructure.
This position transcends mere feature development. In this role, you will:
- Set the technical direction for core platform components
- Influence architecture and design decisions across multiple teams
- Address complex, high-impact problems at the intersection of AI infrastructure, security, scalability, and reliability
- Facilitate Copilot scenarios that unlock new customer value and drive revenue
You will collaborate extensively with partner teams across Azure Machine Learning, Foundry, Singularity, TCaaS (Tenant Copilot as a Service), Heron Infra, Copilot Inferencing, and Security & Compliance. Your role will encompass driving alignment on data movement, isolation models, quota management, GPU fungibility, and model deployment strategies.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. Together, we embrace a growth mindset, innovate to empower others, and collaborate to fulfill our shared goals. We build on our core values of respect, integrity, and accountability to foster a culture of inclusion where everyone can thrive.
Responsibilities
- Architect and lead the design of large-scale, distributed services that support tenant-isolated fine-tuning and evaluation workflows.
- Drive end-to-end technical ownership of critical platform areas, from data ingestion and training orchestration to deployment, rollback, and monitoring.
- Define and evolve secure data movement patterns across tenant boundaries, ensuring compliance with Microsoft security, privacy, and governance requirements.
- Establish a long-term technical vision and roadmap for the Heron fine-tuning platform, balancing scalability, reliability, cost, and developer velocity.
- Lead cross-team technical reviews, shaping designs and driving alignment across multiple organizations.
- Build frameworks and abstractions that enhance operational excellence, including observability, quota management, failure recovery, and developer ergonomics.
- Act as a technical mentor for both senior and junior engineers, elevating the standards for design quality, code health, and engineering rigor.
- Collaborate with engineering managers and product leaders to translate business goals into executable technical strategies.
- Proactively identify and resolve systemic production issues, implementing durable solutions rather than temporary fixes.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science or a related technical field AND hands-on technical engineering experience in languages such as C, C++, C#, Java, JavaScript, or Python, or equivalent experience.
- Proven experience in designing and operating large-scale distributed systems in production.
- Demonstrated ability to lead technical decisions across multiple teams or services.
Other Requirements:
- The ability to meet Microsoft, customer, and/or government security screening requirements is essential for this role. These requirements include specialized security screenings.
Preferred Qualifications:
- Master's Degree in Computer Science or a related technical field AND 8+ years of technical engineering experience, or a Bachelor's Degree with extensive experience in programming languages such as C, C++, C#, Java, JavaScript, or Python, or equivalent experience.
- Experience in building platform or infrastructure services in cloud environments (Azure preferred).
- Deep understanding of multi-tenant architectures, security boundaries, and privacy-compliant system design.
- Hands-on experience with Azure Machine Learning, Kubernetes, GPU-backed workloads, or large-scale data pipelines.
- Proven track record in driving architecture simplification, reliability improvements, and cost efficiency at scale.
- Ability to navigate ambiguity effectively, influence without authority, and build trust across organizational boundaries.
Apply online using the form below. Only applications matching the job profile will be considered.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations, and ordinances.