Show HN: PipelineRL – Async RL with in-flight weight updates (train ~2× faster)(github.com/ServiceNow)
2 points by muchomuchach0 4 hours ago | 2 comments
- muchomuchach0 4 hours agoI’m involved with this project.
A recurring issue in on-policy RL for LLMs is GPU under-utilization while actors wait for weight syncs from the learner. PipelineRL uses in-flight weight updates: actors keep sampling while the learner updates weights, which reduces policy lag without stalling the pipeline.
In practice this gives ~2× wall-clock speedups on large models.
A paper on the approach was recently accepted to TMLR and discusses policy-lag bounds in more detail.
[-]- muchomuchach0 4 hours ago