Related jobs

Anthropic · San Francisco/New York City/Seattle · Hybrid

Research Engineer, Reward Models Training

1/13/2025

Own the end-to-end engineering of reward model training, from data ingestion through model evaluation and deployment
Design and implement efficient, reliable training pipelines that can scale to increasingly large model sizes
Build robust data pipelines for collecting, processing, and incorporating human feedback into reward model training
Optimize training infrastructure for throughput, efficiency, and fault tolerance across distributed systems
Extend reward model capabilities to support new domains and additional data modalities
Collaborate with researchers to implement and iterate on novel reward modeling techniques
Develop tooling and monitoring systems to ensure training quality and identify issues early
Contribute to the design and improvement of our overall model training infrastructure

Have significant experience building and maintaining large-scale ML systems
Are proficient in Python and have experience with ML frameworks such as PyTorch
Have experience with distributed training systems and optimizing ML workloads for efficiency
Are comfortable working with large datasets and building data pipelines at scale
Can balance research exploration with engineering rigor and operational reliability
Enjoy collaborating closely with researchers and translating research ideas into reliable engineering systems
Are results-oriented with a bias towards flexibility and impact
Can navigate ambiguity and make progress in fast-moving research environments
Adapt quickly to changing priorities, while juggling multiple urgent issues
Maintain clarity when debugging complex, time-sensitive issues
Pick up slack, even if it goes outside your job description
Care about the societal impacts of your work and are motivated by Anthropic's mission

Scaling reward model training to handle models with significantly more parameters while maintaining training stability
Building a unified data pipeline that ingests human feedback from multiple sources and formats for reward model training
Implementing fault-tolerant training infrastructure that gracefully handles hardware failures during long training runs
Developing evaluation frameworks to measure reward model quality across diverse domains
Optimizing training throughput to reduce iteration time on reward modeling experiments