Affine

Private

1-50

Software Development

Affine is building an incentivized RL environment that pays miners for making incremental improvements on tasks like program synthesis and coding. Operating on Bittensor's Subnet 64 (Chutes), we've created a sybil-proof, decoy-proof, copy-proof, and overfitting-proof mechanism that incentivizes genuine model improvements. Our vision is to commoditize reasoning—intelligence's highest form—by directing and aggregating the work-effort of a large, non-permissioned group on RL tasks to break the intelligence sound barrier.

Job Description

About Affine

Overview

We're seeking an exceptional ML Engineer to build and optimize the infrastructure for our competitive RL environment. You'll architect systems where validators identify models that dominate the pareto frontier across multiple environments, creating a winners-take-all dynamic that forces continuous improvement. Your engineering expertise will be critical to scaling our incentivized RL approach and enabling fast advancement in model intelligence through directed competition.

Responsibilities

Distributed RL Competition Infrastructure

Design and implement scalable evaluation systems for models competing across multiple RL environments
Build pareto frontier tracking systems that identify dominating models across all evaluation tasks
Develop anti-gaming mechanisms: sybil-proofing, decoy detection, copy prevention, and overfitting mitigation
Create fault-tolerant systems handling continuous model evaluation and ranking updates

Post-Training & Improvement Pipeline

Architect systems enabling miners to download, improve, and resubmit pareto frontier models
Implement GRPO, PPO, and other RL algorithms optimized for program synthesis and coding tasks
Build infrastructure for incremental model improvements with efficient fine-tuning pipelines
Develop evaluation frameworks for diverse RL environments (program abduction, coding, reasoning)
Create automated systems for detecting genuine improvements vs. gaming attempts

Validator & Evaluation Systems

Build high-throughput model evaluation infrastructure across multiple RL environments
Implement efficient pareto frontier computation algorithms for multi-objective optimization
Develop real-time leaderboard systems tracking model dominance and miner contributions
Create robust validation mechanisms ensuring fair competition and accurate rankings
Build inference load balancing systems for publicly available model serving

Incentive & Anti-Gaming Mechanisms

Implement cryptographic proofs for model ownership and improvement verification
Build systems to detect and prevent sybil attacks across multiple miner identities
Develop decoy-proof evaluation ensuring models can't be optimized for specific test cases
Create copy-detection algorithms identifying unauthorized model cloning
Design overfitting prevention through dynamic evaluation set rotation

Performance & Scale Engineering

Optimize evaluation throughput for handling 1000+ model submissions daily
Implement efficient model diff systems to track incremental improvements
Build distributed inference infrastructure supporting concurrent model evaluations
Develop caching strategies for repeated model evaluations and comparisons
Create monitoring systems tracking network health and competitive dynamics

Required Qualifications

Bachelor's/Master's degree in Computer Science, Engineering, or related technical field
5+ years of experience in distributed ML systems with focus on RL or competitive ML
Deep understanding of reinforcement learning algorithms (PPO, GRPO, DPO) and multi-objective optimization
Experience with blockchain/decentralized systems, preferably Bittensor or similar platforms
Strong systems programming skills in Python and experience with PyTorch
Experience building evaluation infrastructure for ML competitions or benchmarks
Track record of building anti-gaming mechanisms in competitive environments

Preferred Experience

Experience with program synthesis, code generation, or automated reasoning tasks
Knowledge of pareto optimization and multi-objective reinforcement learning
Contributions to ML competitions (Kaggle, etc.) or competitive RL environments
Experience with large-scale model evaluation and benchmarking systems

Technical Stack & Tools

Core Infrastructure

RL Frameworks: OpenRLHF, TRL, custom PPO/GRPO implementations
Evaluation: Custom RL environments, program synthesis benchmarks
Anti-Gaming: Cryptographic hashing, model fingerprinting, statistical detection

Distributed Systems

Load Balancing: Dynamic inference routing, model serving optimization
Storage: Distributed model versioning, incremental update tracking
Monitoring: Real-time leaderboard, network statistics, miner analytics
Communication: Bittensor protocol, P2P model exchange

Development Tools

Languages: Python
ML Frameworks: PyTorch, JAX for specific RL algorithms
Infrastructure: Kubernetes, Docker, distributed compute management
Databases: Time-series for performance tracking, graph DBs for model lineage

Key Engineering Challenges

Building fair evaluation systems resistant to sophisticated gaming attempts
Implementing efficient pareto frontier computation for 100+ models across multiple tasks
Creating incentive mechanisms that genuinely drive model improvement
Developing real-time evaluation infrastructure with minimal latency
Ensuring decentralized trust while preventing exploitation
Scaling to support exponential growth in miner participation

Immediate Engineering Objectives (0-6 Months)

Enhance current validator infrastructure for improved gaming resistance
Implement advanced pareto frontier tracking with multi-objective optimization
Build comprehensive evaluation suite for program synthesis and coding tasks
Develop real-time model lineage tracking to verify incremental improvements
Create automated detection systems for sybil, decoy, and copy attempts
Launch public dashboards showing network dynamics and model evolution

Long-Term Engineering Goals (6+ Months)

Expand RL environments to cover broader reasoning and intelligence tasks
Implement advanced game-theoretic mechanisms for optimal incentive design
Build cross-subnet integration enabling model improvements across Bittensor
Develop state-of-the-art program synthesis benchmarks as evaluation tasks
Create open-source tools enabling broader participation in incentivized RL

Impact You'll be at the forefront of commoditizing intelligence by building infrastructure that harnesses competitive dynamics for rapid AI advancement. Your work will enable the first successful directed incentive system for RL, aggregating global talent to break through intelligence barriers. This isn't just about building infrastructure—it's about creating the economic engine that will drive the next leap in AI capabilities through decentralized, competitive improvement.

How to Apply Send your application materials to: careers@affine.com

Apply Now Apply Now

Recommended Jobs