Welcome to AI Fridays, a weekly rollup of the latest and greatest in AI. HackerPulse and AIModels.fyi join forces to bring you the most sensational AI innovations and groundbreaking discoveries.
📏 Size Matters? Introducing Phi-3 a Highly Capable Language Model Locally on Your Phone
🛑 Training Language Models With Pause Tokens
🎨 Don't Let ASCII Fool Ya: ASCII Art-based Jailbreak Attacks against Aligned LLMs
🔍 Scaling Up the Science: A replication attempt
🌐 From LLM to NMT: Advancing Low-Resource Machine Translation With Claude
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (🔗 Read Paper)
Introducing Phi-3-mini, a highly capable language model that can run locally on a cell phone! With 3.8 billion parameters and trained on 3.3 trillion tokens, this language model packs a punch, rivaling big names like Mixtral 8x7B and GPT-3.5. Despite its small size, Phi-3-mini achieves impressive results on benchmarks like MMLU (69%) and MT-bench (8.38).
Thanks to a carefully curated dataset and optimized design, you can now enjoy AI wonders directly on your mobile device. In AI models bigger is often better: Phi-3-mini's bigger siblings, phi-3-small and phi-3-medium, are even more capable, achieving up to 78% on MMLU and 8.9 on MT-bench.
Phi-3-mini is a potential game-changer, making AI assistants more accessible and mobile-friendly. By focusing on safety, robustness, and chat alignment, it bridges the gap between the cloud and your pocket. So, get ready to embrace AI on the go!
Think Before You Speak: Training Language Models With Pause Tokens (🔗 Read Paper)
This paper introduces an innovative approach to language model training using pause tokens to simulate the natural pauses in human speech. By integrating these tokens into the training process, the models can predict pauses alongside words, aiming to produce more human-like and coherent text. The "pause-training" approach was tested on various language tasks and compared to standard models, showing promising results in generating more natural text.
ArtPrompt: ASCII Art-Based Jailbreak Attacks against Aligned LLMs (🔗 Read Paper)
Large language models (LLMs) are critical tools, but their safety remains a significant concern. Existing techniques to enhance LLM safety, such as data filtering and supervised fine-tuning, typically rely on the assumption that safety alignment can be achieved through semantic analysis alone. However, this paper introduces a novel ASCII art-based jailbreak attack that challenges this assumption and reveals vulnerabilities in LLMs.
The authors present the Vision-in-Text Challenge (ViTC), a benchmark to evaluate LLMs' ability to recognize prompts that cannot be interpreted solely through semantics. This benchmark tests models on their understanding of ASCII art, a form of text-based art used to convey visual information.
The research demonstrates that state-of-the-art LLMs, such as GPT-3.5, GPT-4, Gemini, Claude, and Llama2, struggle with ASCII art prompts, exposing significant weaknesses in their safety measures. By leveraging this vulnerability, the ArtPrompt attack effectively bypasses safety filters and induces undesired behaviors in these models.
Chinchilla Scaling: Replication Attempt (🔗 Read Paper)
This paper is a replication attempt of the "Chinchilla Scaling" research presented in the paper "Unraveling the mystery of neural scaling laws" by Hoffmann et al. Researchers attempted to assess the reliability and generalizability of the phenomenon in large language models (LLMs). The study focused on validating Hoffmann et al.'s findings by extracting data from Figure 4 in their paper and replicating their "Approach 3" scaling analysis.
While the replication was largely successful, the authors acknowledged some limitations in the study and called for further research to expand on their findings and deepen our understanding of the scaling principles in AI and machine learning.
From LLM to NMT: Advancing Low-Resource Machine Translation with Claude (🔗 Read Paper)
This paper investigates the use of large language models (LLMs) for low-resource machine translation (MT), specifically focusing on the Claude model. It aims to determine whether Claude can outperform traditional neural machine translation (NMT) models when training data is limited. The study evaluates Claude's performance on various low-resource MT tasks, including translations between Geez and other languages.
The results suggest Claude demonstrates remarkable resource efficiency and could surpass strong baselines like NLLB-54B and Google Translate in tasks such as Yoruba-English translation. Additionally, the paper proposes using Claude-generated synthetic data to advance traditional NMT models through knowledge distillation, improving the state-of-the-art for certain language pairs.
🔮 And that’s a wrap. Tell us your boldest AI predictions!