Machine Learning Engineer Job at Evolve Group, San Jose, CA

UnZrL21wbVJ1Y08zb3FzYnB4YUVzL2paeHc9PQ==
  • Evolve Group
  • San Jose, CA

Job Description

Machine Learning Engineer

Tech start-up

San Fransisco based

We’ve partnered with one of the most ambitious and technically rigorous AI research labs in the world. Based in San Francisco, this team is building foundation models entirely from scratch.

They are now hiring ML Infrastructure Engineers to design and scale the systems that power large-scale, distributed model training. If you’ve built infrastructure that runs across hundreds of GPUs, thrive under technical complexity, and want to work side-by-side with elite AI researchers — this is the role.

Key Responsibilities:

  • Build and scale distributed training systems for large-scale model training across LLMs, vision, and robotics.
  • Set up and run large-scale training across many GPUs using tools like Kubernetes, DeepSpeed, and FSDP.
  • Troubleshoot system issues (GPU errors, network problems) and build tools to monitor and recover from failures.
  • Optimize PyTorch pipelines, sharding, and sampling strategies.
  • Collaborate closely with researchers to support novel model training at scale.

Requirements:

  • 3–15 years in ML infrastructure, systems, or research engineering roles.
  • Proven experience scaling distributed training for large models.
  • Strong with PyTorch, CUDA, NCCL, Kubernetes.
  • Familiar with setting up distributed training clusters.
  • Deep understanding of PyTorch dataloaders, data sharding, and sampling.
  • Strong communicator with a collaborative, mission-driven mindset.

This is a fully in-person role based in San Francisco , it's ideal for engineers excited to build at the edge of what's possible in AI.

Job Tags

Immediate start,

Similar Jobs

AdaptHealth LLC

DOT Certified Delivery Driver - DME Supplies Job at AdaptHealth LLC

 ...AdaptHealth we offer full-service home medical equipment products and services to empower patients...  ...are responsible for ensuring safe delivery, providing setup, and education in usage...  ...background essential Military, delivery driver with sales component or health care... 

CENTREX

Commercial Construction Superintendent Job at CENTREX

 ...Commercial Construction Superintendent Centrex Construction Position Overview The primary responsibility of a Superintendent at...  ...fall protection equipment for Centrex personnel Maintain a clean and orderly jobsite Shop Drawings & Submittals: Review... 

Opportunity Education

High School Science Teacher Job at Opportunity Education

 ...Science Teacher About Quest Forward High School Quest Forward High School, a cutting-edge private school in West Central Omaha, is looking to add a Science teacher for the 25/26 school year.Our student-focused, project-based model takes place in a small class environment... 

MedSys Group

Cerner Clinical Project Manager Job at MedSys Group

 ...Implementation Experience Title: Clinical Project Manager Remote (1020% Travel) Type: Contract Full-Time Duration: 12...  ...lead a series of high-impact clinical initiatives for a major healthcare organization. This is a client-facing role focused on projects... 

The Employee Connect

CDL Driver (Class A, B, or C) - $5K SIGN ON Job at The Employee Connect

 ...Job Title: CDL Driver (Class A, B, or C) Location: Seattle, WA (Onsite) Pay: $27.75-$35.75 per hour Sign-On Bonus: $5,000 Job Summary: A leading transportation provider is seeking professional, safety-minded CDL Drivers to transport corporate and commuter...