🧠 Can Brainformers Outsmart You?

🤩 AI just gave your selfies the glowup of the year!

May 03, 2024

Welcome to AI Fridays, our roundup of this week’s freshest updates, insights, and trends shaping the AI landscape! Join HackerPulse and AIModels.fyi, as we delve into the dynamic AI world, with top papers straight from the source.

🧠 Transformers Level Up Planning via SDB

🔍 OpenELM: Pioneering Transparency in Language Models

📸 Clearing the View: AI's Solution to Reflection Removal in RAW Photos

💡 Brainformers: Trading Simplicity for Efficiency

📚 NSR Bridges Human-Like Learning Gap

**Beyond A*: Better Planning with Transformers via SDB** (🔗 Read Paper)

This paper introduces Search Dynamics Bootstrapping (SDB), a novel approach to training Transformers, using them to improve planning performance beyond traditional A* search algorithms. This method trains an encoder-decoder Transformer model, known as the Searchformer, which can optimally solve complex planning tasks, such as Sokoban puzzles. The Searchformer was found to solve puzzles 93.7% of the time and use up to 26.8% fewer search steps than the initial A* implementation. This new approach significantly outperforms baselines and allows for smaller model sizes and datasets while maintaining more efficient and accurate planning in complex environments. The paper puts SDB to the test on various planning problems, including stream-search, motion planning, and partially observable planning, demonstrating significant improvements over existing approaches.

OpenELM: An Efficient Language Model Family With Open-Source Training and Inference Framework (🔗 Read Paper)

This paper underscores the importance of reproducibility and transparency in large language models (LLMs) to advance open research, ensure trustworthiness, and investigate biases and potential risks.

The authors introduce OpenELM, a state-of-the-art open language model that leverages a layer-wise scaling strategy for enhanced accuracy and efficiency. OpenELM demonstrates a 2.36% improvement in accuracy over a previous model while using half the pre-training tokens. Unlike traditional approaches that provide only model weights and inference code and train on private datasets, the OpenELM release includes a comprehensive framework for training and evaluating the model on publicly available datasets. It encompasses logs, checkpoints, and configurations. OpenELM also offers code for converting models to MLX for use on Apple devices. By providing this comprehensive release, the paper paves the way for future open research endeavors and empowers the research community to drive progress in natural language AI.

Removing Reflections from RAW Photos (🔗 Read Paper)

This paper presents a novel approach to removing reflections from RAW photos, a common issue in photography. The method involves synthesizing realistic reflections and using them to train a neural network to eliminate reflections from RAW images. Evaluated on a new dataset of RAW images with reflections, the approach demonstrates state-of-the-art performance in reflection removal, offering a promising solution for photographers seeking cleaner, glare-free photos.

Brainformers: Trading Simplicity for Efficiency (🔗 Read Paper)

This paper explores how to improve the traditional Transformer model, a key player in natural language processing and computer vision. Instead of sticking to the standard setup of alternating feed-forward and self-attention layers, the researchers created a new Transformer block called Brainformer. It features a diverse set of layer types such as sparse and dense feed-forward layers, attention layers, and various forms of normalization and activation functions.

The Brainformer outperforms state-of-the-art dense and sparse Transformers in both quality and efficiency. For example, a model with 8 billion activated parameters trains twice as fast and runs five times faster per step compared to the GLaM Transformer. Additionally, it achieves a 3% higher score on a language benchmark test compared to a similarly sized GLaM model.

Overall, this study suggests that a more complex, flexible Transformer architecture can lead to significant performance improvements in AI applications.

Neural-Symbolic Recursive Machine for Systematic Generalization (🔗 Read Paper)

A new AI model called the Neural-Symbolic Recursive Machine (NSR) offers a breakthrough in learning models by overcoming the challenge of human-like systematic generalization, particularly in learning compositional rules from limited data and applying them to new combinations. NSR integrates neural perception, syntactic parsing, and semantic reasoning using a Grounded Symbol System (GSS) to learn language building blocks directly from training data. NSR's modular design and inductive biases of equivariance and compositionality allow it to excel at various sequence-to-sequence tasks such as semantic parsing and arithmetic reasoning, and achieve unparalleled systematic generalization compared to other state-of-the-art models.

And that’s a wrap. 😊 It’s been fun covering the most exciting AI developments of the week. Stay tuned for more!

HackerPulse Dispatch