AI answers your questions in video

5 papers on AI, LLMs and ML

Nov 17, 2023

We launched a referral program with perks like free CodeHub AI free for 1 year (!) and 1:1 expert career coaching. You can get this stuff starting with just 1 referral!

Refer a friend

Welcome to a new installment of AI Insights, your access point to the ever-evolving world of artificial intelligence. Within these pages, our CTO and AI Researcher, Vishwas Mruthyunjaya, has curated a collection of 5 papers. These papers act as a compass, navigating you through the latest groundbreaking discoveries, innovative methodologies, and visionary concepts steering the course of AI evolution.

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (🔗 Read the Paper)

Dive into the realm of Video Question Answering (VideoQA), where Large Language Models (LLMs) shine. While they excel in natural language tasks, challenges arise in VideoQA due to over-reliance on linguistic cues, leading to 'ungrounded guesses.' To counter this, the revolutionary Flipped-VQA framework is introduced, effectively enhancing model performance across various VideoQA benchmarks.

LLMs' Linguistic Priors: Exploiting linguistic shortcuts in temporal and causal reasoning for VideoQA.
Challenges of Linguistic Bias: Addressing issues like 'ungrounded guesses' caused by over-reliance on questions.
Flipped-VQA Framework: Novel approach encouraging prediction of combinations to understand complex relationships.
Performance Boost: Empirical evidence showcasing the effectiveness of Flipped-VQA, outperforming existing models on multiple benchmarks.

SCB-ST-Dataset4: Extending the Spatio-Temporal Behavior Dataset in Student Classroom Scenarios Through Image Dataset Method (🔗 Read the Paper)

In the realm of education, automatic detection of students' classroom behavior through deep learning is a promising avenue. However, the scarcity of publicly available spatio-temporal datasets on student behavior poses a challenge. To address this, an innovative approach extends the Student Classroom Scenarios through an image dataset, resulting in the comprehensive SCB-ST-Dataset4. Focused on behaviors like hand-raising, reading, and writing, this dataset is rapidly generated without manual annotation. Additionally, a proposed Behavior Similarity Index (BSI) gauges behavior similarities. Rigorous evaluation showcases a mean average precision of up to 82.3%. This novel method and dataset pave the way for future research, advancing student behavior detection.

Introduction of SCB-ST-Dataset4 for student behavior analysis, comprising 754,094 images with 25,670 labels.
A novel method to extend spatio-temporal behavioral datasets without manual annotation, addressing the scarcity of such datasets.
Introduction of the Behavior Similarity Index (BSI) to quantify the similarity of different student behaviors.
Rigorous evaluation of the dataset using advanced algorithms, demonstrating a mean average precision of up to 82.3%.

SentAlign: Accurate and Scalable Sentence Alignment (🔗 Read the Paper)

Introducing SentAlign, a powerful sentence alignment tool meticulously designed to handle extensive parallel document pairs. This tool excels in accuracy, particularly in the context of very large documents, scaling seamlessly to thousands and even tens of thousands of sentences. Leveraging a divide-and-conquer strategy, SentAlign evaluates all potential alignment paths based on user-defined parameters. The scoring mechanism relies on LaBSE bilingual sentence representations, setting it apart. Demonstrating superior performance, SentAlign outshines five other alignment tools, as validated on German-French and English-Icelandic datasets, and in a downstream machine translation task.

Robust Alignment: SentAlign stands out in accurately aligning sentences, even in the context of vast parallel documents.
Scalability: Uniquely designed for scalability, SentAlign efficiently handles document pairs with thousands or even tens of thousands of sentences.
Advanced Scoring Mechanism: The scoring function, based on LaBSE bilingual sentence representations, contributes to its precision and reliability.
Superior Performance: Through rigorous evaluations, SentAlign has demonstrated superior performance compared to five other sentence alignment tools, particularly in German-French and English-Icelandic datasets, and in the context of a downstream machine translation task.

Fine-tuning Language Models for Factuality (🔗 Read the Paper)

Navigating the terrain of large pre-trained language models (LLMs) has become synonymous with navigating an expansive sea of information. However, the fluency and creativity of these models come with a caveat — the potential for factually inaccurate claims, often termed 'hallucinations.' In response, this work pioneers a fine-tuning approach that enhances the factual accuracy of language models, particularly in open-ended generation settings. An innovative combination of external knowledge base consistency and preference optimization algorithms forms the backbone of this methodology.

Mitigating Hallucinations: Recognizing the challenge of hallucinations in language models, the approach fine-tunes LLMs to be more factual, mitigating the spread of misinformation.
No Human Labeling Required: Uniquely, this work achieves factual enhancement without the need for labor-intensive human labeling, offering a cost-effective alternative.
Incorporating Recent NLP Innovations: Leveraging recent advancements in Natural Language Processing, the methodology utilizes techniques to judge factuality by aligning with external knowledge bases and employs preference optimization algorithms.
Performance Gains: Demonstrating tangible results, the fine-tuned Llama-2 showcases a remarkable reduction in factual error rates, offering a 58% reduction in generating biographies and a 40% reduction in answering medical questions at the 7B scale compared to Llama-2-chat.

Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO (🔗 Read the Paper)

This paper explores the integration of AI into smart glasses, overcoming challenges posed by limited form factor and battery capacity. It details the design of tiny machine-learning algorithms tailored for low-power processors, aiming to extend continuous operation in smart glasses.

Powerful Design for Constraints: The research focuses on designing tiny machine-learning algorithms suited for low-power processors, addressing the challenge of providing advanced functionalities in smart glasses while adhering to size and battery limitations.
Innovative Prototype: The smart glasses prototype, serving as a research platform, incorporates microcontrollers with a milliwatt-power RISC-V parallel processor and a hardware accelerator for visual AI. The Bluetooth low-power module ensures efficient communication.
Optimized Power Usage: Power cycling mechanisms, including image and audio sensing interfaces, are integrated into the smart glasses to optimize power consumption.
TinyissimoYOLO Models: The paper introduces the TinyissimoYOLO family of tiny deep-learning models, specifically crafted for microcontroller-based inference. These models, based on YOLO, set a benchmark for object detection with a focus on energy efficiency and low latency.
Impressive Performance Metrics: Evaluation results on the smart glasses prototype highlight TinyissimoYOLO's capabilities, showcasing a remarkable 17ms inference latency and 1.59mJ energy consumption per inference. The end-to-end latency, from image capturing to algorithm prediction, stands at 56ms or 18 fps, outperforming alternatives like MCUNet (TinyNAS+TinyEngine).

Looking for a job? Check out HackerPulse Jobs, where tech companies are looking for ambitious talents like you!

See Jobs

HackerPulse Dispatch