Senior ML Engineer, Distributed RL & Post-Training Infrastructure

Affine is building an incentivized RL environment that pays miners for making incremental improvements on tasks like program synthesis and coding. Operating on Bittensor's Subnet 64 (Chutes), we've created a sybil-proof, decoy-proof, copy-proof, and overfitting-proof mechanism that incentivizes genuine model improvements. Our vision is to commoditize reasoning—intelligence's highest form—by directing and aggregating the work-effort of a large, non-permissioned group on RL tasks to break the intelligence sound barrier.
Job Description
About Affine
Affine is building an incentivized RL environment that pays miners for making incremental improvements on tasks like program synthesis and coding. Operating on Bittensor's Subnet 64 (Chutes), we've created a sybil-proof, decoy-proof, copy-proof, and overfitting-proof mechanism that incentivizes genuine model improvements. Our vision is to commoditize reasoning—intelligence's highest form—by directing and aggregating the work-effort of a large, non-permissioned group on RL tasks to break the intelligence sound barrier.
Overview
We're seeking an exceptional ML Engineer to build and optimize the infrastructure for our competitive RL environment. You'll architect systems where validators identify models that dominate the pareto frontier across multiple environments, creating a winners-take-all dynamic that forces continuous improvement. Your engineering expertise will be critical to scaling our incentivized RL approach and enabling fast advancement in model intelligence through directed competition.
Responsibilities
Distributed RL Competition Infrastructure
- Design and implement scalable evaluation systems for models competing across multiple RL environments
- Build pareto frontier tracking systems that identify dominating models across all evaluation tasks
- Develop anti-gaming mechanisms: sybil-proofing, decoy detection, copy prevention, and overfitting mitigation
- Create fault-tolerant systems handling continuous model evaluation and ranking updates
Post-Training & Improvement Pipeline
- Architect systems enabling miners to download, improve, and resubmit pareto frontier models
- Implement GRPO, PPO, and other RL algorithms optimized for program synthesis and coding tasks
- Build infrastructure for incremental model improvements with efficient fine-tuning pipelines
- Develop evaluation frameworks for diverse RL environments (program abduction, coding, reasoning)
- Create automated systems for detecting genuine improvements vs. gaming attempts
Validator & Evaluation Systems
- Build high-throughput model evaluation infrastructure across multiple RL environments
- Implement efficient pareto frontier computation algorithms for multi-objective optimization
- Develop real-time leaderboard systems tracking model dominance and miner contributions
- Create robust validation mechanisms ensuring fair competition and accurate rankings
- Build inference load balancing systems for publicly available model serving
Incentive & Anti-Gaming Mechanisms
- Implement cryptographic proofs for model ownership and improvement verification
- Build systems to detect and prevent sybil attacks across multiple miner identities
- Develop decoy-proof evaluation ensuring models can't be optimized for specific test cases
- Create copy-detection algorithms identifying unauthorized model cloning
- Design overfitting prevention through dynamic evaluation set rotation
Performance & Scale Engineering
- Optimize evaluation throughput for handling 1000+ model submissions daily
- Implement efficient model diff systems to track incremental improvements
- Build distributed inference infrastructure supporting concurrent model evaluations
- Develop caching strategies for repeated model evaluations and comparisons
- Create monitoring systems tracking network health and competitive dynamics
Required Qualifications
- Bachelor's/Master's degree in Computer Science, Engineering, or related technical field
- 5+ years of experience in distributed ML systems with focus on RL or competitive ML
- Deep understanding of reinforcement learning algorithms (PPO, GRPO, DPO) and multi-objective optimization
- Experience with blockchain/decentralized systems, preferably Bittensor or similar platforms
- Strong systems programming skills in Python and experience with PyTorch
- Experience building evaluation infrastructure for ML competitions or benchmarks
- Track record of building anti-gaming mechanisms in competitive environments
Preferred Experience
- Experience with program synthesis, code generation, or automated reasoning tasks
- Knowledge of pareto optimization and multi-objective reinforcement learning
- Contributions to ML competitions (Kaggle, etc.) or competitive RL environments
- Experience with large-scale model evaluation and benchmarking systems
Technical Stack & Tools
Core Infrastructure
- RL Frameworks: OpenRLHF, TRL, custom PPO/GRPO implementations
- Evaluation: Custom RL environments, program synthesis benchmarks
- Anti-Gaming: Cryptographic hashing, model fingerprinting, statistical detection
Distributed Systems
- Load Balancing: Dynamic inference routing, model serving optimization
- Storage: Distributed model versioning, incremental update tracking
- Monitoring: Real-time leaderboard, network statistics, miner analytics
- Communication: Bittensor protocol, P2P model exchange
Development Tools
- Languages: Python
- ML Frameworks: PyTorch, JAX for specific RL algorithms
- Infrastructure: Kubernetes, Docker, distributed compute management
- Databases: Time-series for performance tracking, graph DBs for model lineage
Key Engineering Challenges
- Building fair evaluation systems resistant to sophisticated gaming attempts
- Implementing efficient pareto frontier computation for 100+ models across multiple tasks
- Creating incentive mechanisms that genuinely drive model improvement
- Developing real-time evaluation infrastructure with minimal latency
- Ensuring decentralized trust while preventing exploitation
- Scaling to support exponential growth in miner participation
Immediate Engineering Objectives (0-6 Months)
- Enhance current validator infrastructure for improved gaming resistance
- Implement advanced pareto frontier tracking with multi-objective optimization
- Build comprehensive evaluation suite for program synthesis and coding tasks
- Develop real-time model lineage tracking to verify incremental improvements
- Create automated detection systems for sybil, decoy, and copy attempts
- Launch public dashboards showing network dynamics and model evolution
Long-Term Engineering Goals (6+ Months)
- Expand RL environments to cover broader reasoning and intelligence tasks
- Implement advanced game-theoretic mechanisms for optimal incentive design
- Build cross-subnet integration enabling model improvements across Bittensor
- Develop state-of-the-art program synthesis benchmarks as evaluation tasks
- Create open-source tools enabling broader participation in incentivized RL
Impact You'll be at the forefront of commoditizing intelligence by building infrastructure that harnesses competitive dynamics for rapid AI advancement. Your work will enable the first successful directed incentive system for RL, aggregating global talent to break through intelligence barriers. This isn't just about building infrastructure—it's about creating the economic engine that will drive the next leap in AI capabilities through decentralized, competitive improvement.
How to Apply Send your application materials to: careers@affine.com