What's the state of AI?

Your weekend reading with papers, repos & reports on AI is here

Oct 13, 2023

We launched a referral program with perks like free CodeHub AI free for 1 year (!) and 1:1 expert career coaching. You can get this stuff starting with just 1 referral!

Refer a friend

Welcome to another exciting edition of AI Fridays, your portal to the ever-evolving world of artificial intelligence. In this edition, we present a curated selection of 5 groundbreaking papers in the field of AI. These papers have been meticulously chosen by our CTO and AI Researcher, Vishwas Mruthyunjaya, to provide you with insights into the latest breakthroughs, innovative approaches, and visionary concepts that are shaping the AI landscape.

State of AI Report 2023 (🔗 Read the Report)

The State of AI in 2023 reflects the dominance of Large Language Models (LLMs) like GPT-4 and their transformative impact on AI research, industry, and geopolitics. It highlights the competition between proprietary and open-source models, the quest for surpassing proprietary performance, LLMs' real-world applications in life sciences, the significance of compute power, the rise of generative AI startups, the mainstream focus on AI safety, and the challenges in evaluating state-of-the-art models.

GPT-4 Dominance: GPT-4 leads in the LLM landscape, showcasing the power of proprietary AI architectures and reinforcement learning from human feedback.
Quest for Better Models: The pursuit of surpassing proprietary AI performance intensifies through smaller models, improved datasets, and context enhancement, addressing scaling concerns.
LLMs Transforming Life Sciences: Large Language Models drive real-world breakthroughs, particularly in life sciences, advancing molecular biology and drug discovery.
AI Safety and Evaluation Challenges: Mainstream attention on AI safety increases, but global governance lacks clarity. Evaluating state-of-the-art models remains a challenge, necessitating more robust approaches.

Loading Llama-2 70b 20x faster with Anyscale Endpoints (🔗 Read the Paper)

This article delves into the critical aspect of loading large language models with speed and efficiency. It discusses techniques that have led to a remarkable 20x acceleration in the loading process, specifically utilizing the Llama 2 model series. Furthermore, the post provides insights into reducing latency and operational costs by leveraging the Anyscale platform.

Key Points:

Optimizing LLM Loading: The post emphasizes the importance of quickening the loading process for large language models, demonstrating a substantial 20x performance improvement.
Reducing Latency and Costs: It offers practical advice on minimizing latency and operational expenses through the Anyscale platform.
Loading Process Insights: The blog outlines the detailed steps involved in serving a large language model in production, addressing the complexities of cold start scenarios, remote storage, local caching, and more.

Text Embeddings Reveal (Almost) As Much As Text (🔗 Read the Paper)

This investigation delves into the privacy implications of text embeddings, specifically focusing on "embedding inversion," which involves reconstructing the original text from dense text embeddings. The study employs a controlled generation approach to generate text that aligns with a predefined point in latent space. It explores the limitations of a basic embedding model and introduces an effective multi-step method for accurately reconstructing text. Additionally, the study demonstrates the model's ability to recover sensitive personal information from clinical notes, highlighting privacy concerns. The research code is available on GitHub for reference.

Privacy Concerns: The study explores the privacy implications of text embeddings and their potential for revealing sensitive information within the original text.
Controlled Generation: The research adopts a controlled generation approach to reconstruct text from embeddings, aiming to produce text that aligns with a specific point in latent space.
Effective Multi-Step Model: While a simple embedding-based model underperforms, the research introduces a multi-step method that corrects and re-embeds text iteratively, achieving an impressive 92% accuracy in recovering 32-token text inputs.
Sensitive Data Recovery: The study extends its findings to demonstrate the model's capability to recover significant personal information, such as full names, from clinical notes, underlining potential privacy risks associated with text embeddings. The research code is publicly available on GitHub for reference.

SAM-OCTA: Prompting Segment-Anything for OCTA Image Segmentation (🔗 Read the Paper)

SAM-OCTA, a groundbreaking method in medical image analysis, enhances the precision of segmenting specific targets in optical coherence tomography angiography (OCTA) images. By employing a low-rank adaptation technique and innovative prompt point generation strategies, it effectively combats overfitting issues associated with limited supervised datasets. SAM-OCTA has achieved or nearly achieved state-of-the-art segmentation performance on datasets like OCTA-500 and ROSE, excelling in various segmentation tasks, including retinal vessel, foveal avascular zone, capillary, artery, and vein segmentation. It also tackles previously challenging tasks such as local vessel segmentation and effective artery-vein segmentation.

SAM-OCTA introduces a novel approach for segmenting targets in optical coherence tomography angiography (OCTA) images, a vital task in medical image analysis.
Traditional methods rely on supervised datasets with limited samples, often resulting in overfitting due to the scarcity of training data.
SAM-OCTA leverages a low-rank adaptation technique and prompt point generation strategies to fine-tune foundational models for various segmentation tasks related to OCTA datasets.
The method is rigorously evaluated using publicly available OCTA-500 and ROSE datasets, consistently achieving state-of-the-art segmentation performance metrics. SAM-OCTA's significance extends to retinal vessel, foveal avascular zone, capillary, artery, and vein segmentation tasks, and it effectively addresses the challenges of local vessel segmentation and artery-vein segmentation that previous methods struggled with.

Interactive Text to Image by Prompting Large Language Models (🔗 Read the Paper)

The rapid evolution of text to image (T2I) models is reshaping artificial content generation. However, existing T2I models often struggle to understand and communicate in natural language. This work introduces interactive text to image (iT2I), enabling natural language interactions with T2I models. An approach to enhance LLMs for iT2I is presented, facilitating seamless integration without undermining core LLM capabilities.

The rapid progress in T2I diffusion models has resulted in high-quality content generation within a short period.
Existing T2I models often struggle to communicate effectively in natural language.
Interactive text to image (iT2I) introduces a novel concept, allowing users to interact with LLMs for image generation and question answering.
An approach is presented to augment LLMs for iT2I, enabling convenient and low-cost integration without compromising the LLMs' core capabilities.

Looking for a job? Check out HackerPulse Jobs, where tech companies are looking for ambitious talents like you!

See Jobs

HackerPulse Dispatch