🧠 Recursive Reasoning, Graph Compression, and 4-Minute AI Videos
Tiny recursive models outperform giants, Meta rethinks compression, long-form video generation scales up, and new RL methods teach models to reuse experience.
This week’s papers explore recursive reasoning in tiny networks, graph-based compression models, long-form video generation breakthroughs, and AI systems that learn from experience. Plus, a multi-agent system that turns research papers into presentation videos.
Here’s what’s new:
🧠 Less is More: Recursive Reasoning with Tiny Networks — A 7M parameter Tiny Recursive Model (TRM) solves complex reasoning tasks like ARC-AGI, outperforming models thousands of times larger. Proof that recursion may trump raw scale.
🕸️ OpenZL — Meta’s graph-based compression framework treats codecs as nodes in a DAG, enabling rapid creation of application-specific compressors. Cuts dev timelines from months to days while boosting efficiency.
🎬 Self-Forcing++ — A technique enabling 4+ minute high-quality video generation (50× longer than baselines). Uses teacher model guidance to maintain fidelity and avoid degradation without requiring long-video data.
📚 Paper2Video — A multi-agent system that automatically generates academic presentation videos from papers — including slides, speech synthesis, and talking heads — easing the burden of research communication.
🧩 ExGRPO — A reinforcement learning framework that reuses valuable reasoning experiences instead of discarding them. Delivers +3.5–7.6 point gains across benchmarks while improving stability and generalization.
Less is More: Recursive Reasoning with Tiny Networks (🔗 Read the Paper)
Tiny Recursive Model (TRM) achieves superior performance on complex reasoning tasks like ARC-AGI using only a 2-layer, 7M parameter network that recursively processes problems, significantly outperforming much larger language models while using less than 0.01% of their parameters. This demonstrates that recursive reasoning architectures can solve hard puzzle tasks more efficiently than scaling up model size, challenging the prevailing “bigger is better” paradigm in AI.
OpenZL: A Graph-Based Model for Compression (🔗 Read the Paper)
OpenZL introduces a “graph model” of compression that represents compression as a directed acyclic graph of modular codecs, enabling rapid development of application-specific compressors that outperform general-purpose alternatives while maintaining deployability through a universal decoder. The system achieves superior compression ratios and speeds on real-world datasets and has reduced development timelines from months to days in Meta’s internal deployments.
Self-Forcing++: Towards Minute-Scale High-Quality Video Generation (🔗 Read the Paper)
This paper introduces Self-Forcing++, a method that enables diffusion models to generate high-quality videos up to 4+ minutes long (50x longer than baseline) by using teacher model guidance on self-generated video segments, avoiding quality degradation and error accumulation without requiring long-video training data. The approach maintains temporal consistency while scaling video length by 20x beyond the teacher model’s capability, significantly outperforming existing methods in both fidelity and consistency.
Paper2Video: Automatic Video Generation from Scientific Papers (🔗 Read the Paper)
This paper introduces Paper2Video, the first multi-agent framework that automatically generates academic presentation videos from research papers, addressing the labor-intensive process of creating scientific presentation content. The system includes a benchmark dataset of 101 research papers with corresponding videos and novel evaluation metrics, demonstrating more faithful and informative video generation compared to existing baselines through coordinated slide creation, speech synthesis, and talking-head rendering.
ExGRPO: Learning to Reason from Experience (🔗 Read the Paper)
ExGRPO introduces a framework that improves large language model reasoning by prioritizing and reusing valuable past experiences based on rollout correctness and entropy, rather than discarding them after single updates like standard on-policy methods. The approach achieves consistent improvements of +3.5-7.6 points over baseline reinforcement learning methods while providing more stable training across different model sizes.


