How Is ChatGPT Changing Its Behavior Over Time?
AI Friday is here and we're bringing you 5 papers to look at.
We launched a referral program with perks like free ChatGPT Plus, CodeHub AI free for 1 year (!) and 1:1 expert career coaching. You can get this stuff starting with just 1 referral!
Welcome to the third edition of AI Friday, brought to you by PeerPulse Dispatch! We're excited to unveil a thoughtfully curated collection of AI tools, tips, prompts, and more.
In this edition, Vishwas Mruthyunjaya, PeerPulse's CTO, AI and Robotics Researcher brings you a useful selection of AI material. Let’s go 👇
Large Language Models Are Not Fair Evaluators (🔗 Read the Paper)
Goal: Making LLMs Judge More Like Humans
Uncover Bias: Delving into the systematic bias when using big language models (LLMs) to rate response quality.
Nifty Fixes: The paper spills the beans on some cool tricks to iron out evaluation bias and get LLMs on the same wavelength as us humans.
LLM Reasoners (🔗 Read on GitHub)
Not Quite There Yet: While LLMs are getting their hands dirty with reasoning tasks, they're not quite on par with knowledge graphs in the reasoning game.
LLM Reasoners to the Rescue: But fear not! LLM reasoners step in with a nifty toolkit for LLMs to tackle intricate reasoning using some snazzy reasoning algorithms. They treat multi-step reasoning like planning, hunting down the best reasoning path that balances exploring new possibilities with exploiting known knowledge, all under the nifty concepts of "World Model" and "Reward".
Simple Solution: When faced with a reasoning puzzle, all you need to do is whip up a reward function and maybe toss in a world model (explained below), and then just kick back and let those LLM reasoners do the heavy lifting. They handle everything from Reasoning Algorithms to Visualization, and even give LLMs a shout-out for assistance!
Can LLMs Master APIs? (🔗 Read the Paper)
LLMs have shown their chops when it comes to wrangling text-based knowledge, but what about diving into APIs? This paper dives headfirst into the realm of fine-tuning LLMs to become API virtuosos.
Advancements and Limits: Open-source LLMs like LLaMA and Vicuna show promise but lag in advanced tasks compared to knowledge graphs.
ToolLLM Empowerment: ToolLLM framework transforms LLMs into adept tool users, bridging text and API interactions via data, training, and evaluation.
Simplified Mastery: Define rewards, add a world model, and LLM reasoners handle algorithms, visualization, and tasks – a streamlined solution for intricate challenges.
A Data Driven Approach to Understanding Skills Using LLMs (🔗 Read the Paper)
Data Quality & Model Performance: The data quality impacts pre-trained LMs' performance. Our study optimizes data selection within fixed tokens for better downstream task results.
Sequential Learning: Inspired by how humans learn skills in order, our framework leverages this idea for LMs, defining skill sets and their sequence from data.
Skill-It Algorithm: Introducing Skill-It, our algorithm excels in pre-training and fine-tuning, showing significant gains in LEGO and Natural Instructions datasets.
Real-world Impact: Applied to RedPajama dataset, our approach improves 3B-parameter LM's accuracy over baseline, showcasing its effectiveness.
How Is ChatGPT Changing Its Behavior Over Time?
(🔗 Read the Paper)
GPT-3.5 & GPT-4 Evaluation: We examine GPT-3.5 and GPT-4, the most popular large language models (LLMs), focusing on their March and June 2023 versions.
Varied Performance & Behavior: Across diverse tasks like math, sensitive questions, opinion surveys, code generation, and more, both GPT-3.5 and GPT-4 show significant fluctuations in performance and behavior over time.
Shifting Behavior: Notably, GPT-4's performance on tasks like identifying prime numbers declined from March (84% accuracy) to June (51% accuracy). It became less responsive to chain-of-thought prompts. Similarly, both GPT-4 and GPT-3.5 showed more formatting errors in code generation in June. These shifts underscore the need for continuous LLM monitoring.
If you are not a subscriber yet, join 1400+ ambitious minds! Get tips, trends, and insights from PeerPulse Dispatch to excel in the tech job market: