👩‍🎤 Text-to-music level up?

Happy AI Friday to you, you selection of 5 papers on AI & ML are here - read up.

Jan 12, 2024

We’re so close to 5k subscribers, help us get there!

Welcome to the latest installment of AI Fridays, our CTO and AI Researcher, Vishwas Mruthyunjaya, presents a meticulously handpicked collection of five papers. These papers shed light on the most recent advancements, innovative strategies, and forward-thinking ideas that are molding the AI landscape.

Long-Context Retrieval Models with Monarch Mixer (🔗 Read the Paper)

Introducing the latest development in text embeddings – long-context retrieval models. Text embeddings play a vital role in various applications, but the typical short context length of models like BERT can be limiting for long documents. In this preview release, the team builds on Monarch Mixer (M2) to present long-context versions of M2-BERT, accommodating up to 32K context length. Stay tuned to discover the innovations in data mixture and loss function enabling these extended text embeddings.

Long-context retrieval models designed for documents with extensive text.
Utilizes Monarch Mixer (M2) model family, known for attention- and MLP-free BERT models.
Accommodates up to 32K context length for improved performance on longer documents.
Enhanced data mixture and loss function to support long-context embeddings.\

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting (🔗 Read the Paper)

Understanding how large language models (LLMs) perform in various scenarios is vital as they become increasingly integral to language technologies. One aspect that significantly impacts their behavior is prompt design, particularly in meaning-preserving choices like prompt formatting. In this study, we delve into the sensitivity of several prominent open-source LLMs to subtle changes in prompt formatting, particularly in few-shot settings.

Examination of LLM sensitivity to prompt formatting choices in few-shot settings.
Notable performance differences of up to 76 accuracy points observed across various LLMs.
Sensitivity remains consistent across different model sizes, few-shot example counts, and instruction tuning.
Introduction of FormatSpread, an algorithm to efficiently evaluate multiple prompt formats.
Emphasis on the importance of reporting performance across plausible prompt formats in LLM evaluation.

Application of Deep Learning in Blind Motion Deblurring: Current Status and Future Prospects (🔗 Read the Paper)

Motion blur in images is a pervasive challenge in computer vision, prompting the development of blind motion deblurring techniques that leverage deep learning. This paper offers an extensive overview of the role of deep learning in blind motion deblurring, summarizing methods, datasets, and evaluation metrics developed over the last six years.

Types of Motion Blur: The paper explains various types of motion blur and the fundamental principles of deblurring, setting the stage for deep learning-based solutions.
Advantages of Deep Learning: It outlines the limitations of traditional non-blind deblurring approaches and highlights how deep learning methods provide a superior solution for deblurring tasks.
Categorization of Methods: The paper categorizes and summarizes blind motion deblurring methods based on different backbone networks, including CNNs, GANs, RNNs, and Transformer networks.
Performance Evaluation: Qualitative and quantitative evaluations are conducted on four widely used datasets, allowing for comparisons of the performance of state-of-the-art methods. The paper provides insights into the advantages and limitations of each category.

Masked Audio Generation using a Single Non-Autoregressive Transformer (🔗 Read the Paper)

This paper presents MAGNeT, a unique masked generative sequence modeling method designed for multiple streams of audio tokens. It employs a one-stage, non-autoregressive transformer for faster audio generation, introduces an innovative rescoring technique for improved audio quality, and explores a hybrid model combining autoregressive and non-autoregressive approaches. MAGNeT's performance is evaluated through comprehensive empirical assessments, including objective metrics and human studies.

Single-Stage Non-Autoregressive Transformer: MAGNeT operates as a one-stage, non-autoregressive transformer, allowing for faster audio generation.
Innovative Rescoring Method: The paper introduces a novel rescoring technique that employs an external pre-trained model to rank MAGNeT's predictions, enhancing audio quality.
Hybrid Model Exploration: A hybrid version of MAGNeT combines autoregressive and non-autoregressive models, efficiently generating audio while maintaining quality.
Empirical Evaluation: MAGNeT's performance is evaluated through extensive empirical assessments, including objective metrics and human studies. It demonstrates comparable results to baselines while achieving significantly faster processing (7x faster than autoregressive approaches).

TechGPT-2.0: A large language model project to solve the task of knowledge graph construction (🔗 Read the Paper)

This report introduces TechGPT-2.0, a project focused on enhancing large language models for knowledge graph construction tasks, particularly named entity recognition (NER) and relationship triple extraction (RTE) in NLP applications. It serves the Chinese open-source model community and offers advanced capabilities, including specialized weights for extensive text processing. Trained on Huawei's Ascend server, TechGPT-2.0 excels in domains like medicine and law and can handle a wide range of textual content, making it versatile for various applications.

Expanded Domain Expertise: TechGPT-2.0 offers enhanced capabilities for processing texts spanning various domains, broadening its utility beyond traditional NLP tasks.
Huawei's Ascend Training: The model's training on Huawei's Ascend server underscores its robust text processing capabilities.
Specialized Weights: Alongside 7B large model weights, a dedicated QLoRA weight facilitates the processing of lengthy texts.
Fine-Tuning Insights: The report provides a comprehensive guide to the model's fine-tuning process, including experiences in Ascend server debugging and instruction fine-tuning data processing.

We’re so close to 5k subscribers, help us get there!

Refer a friend

HackerPulse Dispatch

Discussion about this post

Ready for more?