Faster LLMs, Safer Chains of Thought, and Image Tokenization Reinvented

From cascade inference to monitoring AI reasoning—this week’s most thought-provoking AI drops.

Jul 18, 2025

Before we jump in, let’s thank Superflex. Superflex is an extension for VSCode, Cursor and Windsurf that converts Figma to production-ready code that matches your coding style, design system and reuses your UI components

Get 30% off Superflex for 1 month. Use code HACKERPULSE.

Get Superflex

This week’s papers tackle the fundamentals of speed, safety, and theory in deep learning - from rethinking how we tokenize images to questioning whether RL agents are actually reasoning or just memorizing. We also look at a rigorous new book on the math of deep learning and a promising but fragile AI safety approach.

Here’s what’s new:

📘 Mathematical Introduction to Deep Learning: A comprehensive book covering deep learning architectures, optimization, approximation theory, and even PDEs via physics-informed neural networks. Great for both beginners and researchers seeking formal grounding.

🧠 Chain-of-Thought Monitorability: A fragile but promising AI safety technique that leverages language-based reasoning to monitor intent. The paper calls for preserving this safety window as models evolve beyond interpretable thought patterns.

⚡ Cascade Speculative Drafting: A two-dimensional optimization of speculative decoding—Vertical for removing autoregressive drafts, Horizontal for importance-based token allocation—leading to faster inference without sacrificing output quality.

🖼️ VFMTok: A tokenizer built from frozen vision foundation models that enables state-of-the-art image generation (gFID 2.07) with 3x faster convergence, bypassing classifier-free guidance and unlocking a new use case for vision encoders.

❗ Reasoning or Memorization?: This paper finds that RL advances in math reasoning may be inflated by data contamination. Using a clean benchmark (RandomCalculation), they show genuine reasoning only improves with correct reward signals—calling for higher rigor in RL evaluation.

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory (🔗 Read the Paper)

This book provides a comprehensive mathematical introduction to deep learning, covering neural network architectures, optimization algorithms, and theoretical foundations including approximation theory and generalization bounds. It also explores applications to partial differential equations through physics-informed neural networks and deep Galerkin methods, targeting both beginners and practitioners seeking rigorous mathematical understanding.

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety (🔗 Read the Paper)

This research proposes that AI systems using human language for reasoning create new safety opportunities through monitoring their "chains of thought" for malicious intent, but warns this approach is imperfect and fragile. The authors recommend frontier AI developers prioritize preserving and investing in chain-of-thought monitorability as a complementary safety method alongside existing oversight techniques.

Cascade Speculative Drafting for Even Faster LLM Inference (🔗 Read the Paper)

This paper introduces Cascade Speculative Drafting (CS Drafting), which improves LLM inference speed by eliminating autoregressive generation in draft creation (Vertical Cascade) and optimizing time allocation across tokens based on importance (Horizontal Cascade). The method achieves greater speedup than existing speculative decoding approaches while maintaining the same output quality as the target model.

Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation (🔗 Read the Paper)

VFMTok introduces a novel image tokenizer that leverages frozen vision foundation models as encoders, combined with region-adaptive quantization and semantic reconstruction objectives, achieving state-of-the-art autoregressive image generation with a gFID of 2.07 on ImageNet while accelerating convergence 3x. This approach demonstrates that pre-trained vision models can be effectively repurposed for generative tasks, eliminating the need for classifier-free guidance in high-quality image synthesis.

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination (🔗 Read the Paper)

This paper reveals that recent reinforcement learning improvements in mathematical reasoning for large language models, particularly Qwen2.5, are likely due to data contamination rather than genuine reasoning enhancement, as evidenced by the failure of random rewards to improve performance on clean synthetic datasets. The authors introduce RandomCalculation, a contamination-free benchmark, and demonstrate that only accurate reward signals consistently improve reasoning performance, calling for more rigorous evaluation practices in RL research.

HackerPulse Dispatch

Discussion about this post