Faster Fine-Tuning, Smarter Agents, and Real-World Vision-Language Wins
Weight-averaging magic, biomedical NER SOTA, and RL-trained agents for real-world tasks.
Before we jump in, meet today’s Sequa Context Engine makes AI-powered software development actually work in real-world codebases.
They index and deeply understand your entire multi-repo architecture, then feed that knowledge directly into your AI dev tools and agents via MCP. Sequa shifts the focus from explaining your code to LLMs, to orchestrating AI to ship, ship, ship. The result: far fewer errors, faster shipping, and AI that finally “gets” your project.
Want to get one month of Sequa Context Engine for free? Reply to this email or leave a comment on Substack.
This week’s research shows how less can be more in fine-tuning, why domain-adapted transformers are still winning, and how reinforcement learning is powering everything from self-evolving desktop agents to long-context coding assistants. Plus, VLMs trained in synthetic worlds are finally making the leap to real-world performance.
Here’s what’s new:
⚖️ Model Stock: A fine-tuning method that beats baselines by averaging the weights of just two fine-tuned models—no massive model soup required. The trick? Staying close to the “center” of weight space. Outperforms Model Soup on both in- and out-of-distribution tasks while using far fewer resources.
🧬 OpenMed NER: A suite of open-source, domain-adapted transformers that set SOTA on 10 of 12 biomedical NER benchmarks. Using lightweight domain-adaptive pre-training + LoRA fine-tuning on just 1.5% of parameters, it trains in under 12 hours on a single GPU with a tiny carbon footprint.
🖥️ SEAgent: A self-evolving computer-use agent that learns by trial and error—no labeled data needed. Gains 23.2% success rate improvement by combining curriculum generation with a specialist-to-generalist approach, enabling continuous autonomous evolution.
💻 RL for Long-Context Coding Agents: Applying RL to multi-turn software engineering tasks boosts SWE-bench Verified success rates by 95% (20% → 39%). Shows RL can handle complex, stateful environments—matching top open-weight coding models.
🖼️ VL-DAC: A lightweight RL algorithm that trains vision-language models in synthetic simulators but transfers gains to the real world (+50% game control, +5% spatial planning, +2% web navigation). Focuses PPO updates only on action tokens, keeping general image understanding intact.
Model Stock: All we need is just a few fine-tuned models (🔗 Read more)
This paper introduces Model Stock, a fine-tuning method that achieves superior performance by averaging weights from just two fine-tuned models instead of the many models traditionally required, based on the insight that better performance correlates with proximity to the center of weight space. The approach outperforms existing methods like Model Soup on both in-distribution and out-of-distribution tasks while requiring significantly fewer computational resources.
OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets (🔗 Read the Paper)
OpenMed NER introduces a suite of open-source, domain-adapted transformer models that achieve state-of-the-art performance on 10 of 12 biomedical named entity recognition benchmarks by combining lightweight domain-adaptive pre-training with parameter-efficient LoRA fine-tuning, updating less than 1.5% of model parameters. The approach delivers substantial improvements across diverse biomedical entities while maintaining remarkable computational efficiency, completing training in under 12 hours on a single GPU with low carbon footprint.
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience (🔗 Read the Paper)
SEAgent introduces a self-evolving framework that enables computer use agents to autonomously learn and improve through trial-and-error interactions with novel software, without requiring human-labeled training data. The approach achieves a 23.2% improvement in success rate (from 11.3% to 34.5%) by using experiential learning with curriculum generation and a specialist-to-generalist training strategy that creates agents capable of continuous autonomous evolution.
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning (🔗 Read the Paper)
This paper applies reinforcement learning to train long-context software engineering agents that can interact with stateful environments across multiple turns, achieving a 95% improvement (20% to 39% success rate) on the SWE-bench Verified benchmark using a modified DAPO algorithm. The approach demonstrates that RL can effectively train agents for complex, multi-turn real-world tasks beyond traditional single-turn applications, with performance matching leading open-weight models on software engineering benchmarks.
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success (🔗 Read the Paper)
This paper introduces VL-DAC, a lightweight reinforcement learning algorithm that trains vision-language models by applying PPO updates only to action tokens while learning values at the environment level, achieving significant improvements in real-world tasks (+50% on game control, +5% on spatial planning, +2% on web navigation) despite training only in inexpensive synthetic simulators. The work demonstrates for the first time that simple RL methods can successfully transfer VLM capabilities from synthetic environments to real-world applications without compromising general image understanding.