Autonomous-Agents

Autonomous Agents

Autonomous Agents-research papers. Updated daily. See as well the Resources-section.

Research papers

Chronological order.

6th of August 2024

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Reviews scaling up inference compute (test-time) in order to built self-improving agents. Quantifies the amount of improvement, when increasing inference.
Test-time compute outperforms 14x larger models.
Compute optiml scaling strategy can improve efficiency of test-time compute by factor of up to 4x.

5th of August 2024

ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems

ReDel (Recursive Delegation): Recursive multi-agent framework, where LLM decides when to delegate/how to delegate (delegation graph).
Includes custom tool-use, delegation schema, event-based logging and interactive replay (web UI).
Icludes open-source Python package.
ReDel delegation schemes include DelegateOne (wait parent-agent until child-agent completion) and DelegateWait (provide separate function for parent agent to retrieve child agent response).
Event-driven logging includes built-in events ans custom events.

3rd of August 2024

The Drama Machine: Simulating Character Development with LLM Agents

Drama Machine: Reviews Automated Identity-generation with LLMs. Uses multiple LLMs to simulate dynamic/complex AI characters in domain of drama scenes: interview/detective.
Roles include Ego, SuperEgo, Autobiography, Director and Critic.

2nd of July 2024

Coalitions of Large Language Models Increase the Robustness of AI Agents

Coalition of LLM models outperform single model and fine-tuned LLMs.
Specific LLMs fit for particular tasks and cheaper interference.

1st of August 2024

AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

AgentGen: Generates diverse LLM agent environments and planning tasks. LLM fine-tuned with this data improves significantly planning capabilities.
Uses inspirational corpus to generate environment context (actions/restrictions/etc). Generates tasks, which include "difficulty diversification: easy/medium/hard with bidirectional evolution (Bi-Evol) to smoothly acquire new planning skills.

31st of July 2024

Tulip Agent -- Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries

Tulip Agent and AutoTulipAgent: LLM-agent has priviledges to create, update, delete and edit tool library.
Self-Recursively extendible tool library.
AutoTulipAgent includes 5 generic tools: 2 to decompose tasks/search tools, includes apart capability to create/delete/update tools.

28th of July 2024

Solving Robotics Problems in Zero-Shot with Vision-Language Models

Wonderful Team: uses off-shelf VLM model for high-level planning, low-level location extraction and action execution.

25th of July 2024

PersonaGym: Evaluating Persona Agents and LLMs

Introduces PersnaGym-benchmark to evaluate persona LLM-agents.
Sets an automatic PersonaScore-metric to evaluate five different capabilities.
Finds SOTA level LLMs to offer highly varying level of capabilities as persona-agents.
Increasing model size is not guarantee of better persona agent performance with varying level of persona agent performance detected.

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

RISE (Recursive IntroSpEction): iteratively sel-improve LLM responses through fine-tuning with RL.
RISE starts with turn 1, where only prompt is provided. In turn 2, the prompt, the original response and its feedback is provided to generate the turn 2 response. Majority voting is used to select the final response from multiple responses generated.

24th of July 2024

Reinforced Prompt Personalization for Recommendation with Large Language Models

Reinforced Prompt Personalization (RPP): uses instance-based prompting with MARL.
Instead of task-based (role-play/history/reasoning guidance/output format), Instance-based prompting personalises to these four-characteristics with MARL.

AI-Gadget Kit: Integrating Swarm User Interfaces with LLM-driven Agents for Rich Tabletop Game Applications

AI-gadget Kit: multi-agent driven Swarm UI (SUI) tabletop gaming system, which consist of meta-motion, interactive behaviour, interactive relationship and application.

3D Question Answering for City Scene Understanding

Sg-CityU: 3D multimodal QA, which uses scene graph to provide answers related to spatial relationships about city-scenes

23rd of July 2024

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

RedAgent: Introduces concept of "Jaillbreaking strategy" (strategies used by attackers to construct jaillbreaking prompts) red teaming through multi-agent self-reflection from context feedback and skill memory.
The approach can jaillbreak LLMs and LLM-based apps (even more vulnerable) using just few queries.
The Red-Agent architecture includes skill memory and multiple roles (profile constructor/planner/attacker/evaluator) and short/long term memory.

AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game

AmongAgents: multi-agent LLM-framework with memory, reflection and interaction in social deduction game with ambiguous and deceptive characters.
Includes meeting/task-phases.
Agents pose personality-component: generated with personality prompt from pre-defined set of personalities: behaviour/decision-making, which contribute to more dynamism/realism.

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

OpenDevin: LLM-based multi-agent framework, where agents interact as human-like SW agents writing code, using command line and browsing web.
The framework includes: interaction mechanism (event stream), environment(sandbox environment for code execution), interface(human-like), multi-agent delegation (co-operate) and evaluation framework.
Event stream tracks history of action and observation.

PyBench: Evaluating LLM Agent on various real-world coding tasks

Introduces PyBench-benchmark for real-world like coding tasks withh LLM-agents.
Introduces high-performance PyLlama3 model for coding tasks.

Artificial Agency and Large Language Models

Reviews theoretical models for agents, LLM agents and concept of artificial agency.

LawLuo: A Chinese Law Firm Co-run by LLM Agents

LawLuo: includes LLM-based receptionist/lawyer/secrretary/boss-agents to realistic legal consultation company based on SOP (Standard Operating Principle).

22th of July 2024

[TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON](https://arxiv.org/abs/2407.15734

TaskGen: LLM-agent framework to solve tasks by dividing task into sub-tasks, executed by its own agent/equipped function. Manages memory/information based on need-to-know. Uses in StrictJson-format.
Includes meta-agent, inner-agent, function-calls, sub-tasks, shared memory (sub-task completed/list of past equiped function inputs or outputs/shared variables) and passing context/shared memory to inner agent/function.
Utilises global context adds data to default LLM prompt (carrying shared variables throughout a task/to store the current state of a dynamic environmental variable/specific instructions).

Odyssey: Empowering Agents with Open-World Skills

Odyssey: interactive (plan-actor-critic) LLM-agent (fine-tuned Llama 3) with real world skill library.
Introduces long-term planning/dynamic-immediate planning/autonomous exploration benchmark.
Planner decomposes long-term goals into sub-goals with ultimate goals/behavioural constraints/agent states/achievements.
Actor executes skill code using query context/similarity match/skill selection.
Critic uses execution feedback/self-validation/self-reflection.

19th of July 2024

The Vision of Autonomic Computing: Can LLMs Make It a Reality?

Explores feasibility of Autonomic Computing Vision (ACV) with multi-agent framework based on LLMs.
LLM-based multi-agent framework achieves level 3 autonomy.
The original ACV-framework identified 4 pillars: self-configuration, self-optimization, self-healing and self-protection.

12th of July 2024

PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents

PersonaRAG: Includes compoments k-docs retrieval, user interaction analysis (user profile/contextual retrieval/live session/document ranking/feedback agents) and cognitive dynamic adaption(selective/collaborative use of agents).

Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments

IGOR (Instruction following with GOal-conditioned RL): LLM translates instructions into high-level action plan with sub-goals and RL executes them.

Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation'

LLMs generate novel and diverse biomedical hypthesis through multi-agent interaction.

11th of July 2024

GTA: A Benchmark for General Tool Agents

GTA-benchmark: evaluates general tool usage of LLM agents in real user queries with real deployed tools. for example web page screenshots.
Evaluates perception, operation, logic and creativity tools.
Defines "Real-World" as helping humans in real-life with being step/tool-implicit.
GPT-4 solves 50% of these tasks.
Includes illustration of executable tool chains.

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Internet of Agents (IoA): LLM agents lack capability to interact in dynamic environments with other agents outside its hard-coded communication pipeline.
Limitations include: ecosystem isolation, single-device simulation and rigid communication/coordination.
IoA acts in Internet-like environment to achieve collective intelligence and new capabilities.
Includes architectural design of the IoA-framework.

Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents

LAAs (LLM-empowered Autonomous Agents): Introduces concept of LAAs, which include three elements: external tools, LLMs (knowledge modelling) and Agentic workflow (human-like symbolic reasoning).
LAAs are characterised by natural language dialogue, decision making, planning, task decomposition and actionining.

GPT-4 is judged more human than humans in displaced and inverted Turing tests

Introduces Inverted Turing text.

Beyond Instruction Following: Evaluating Rule Following of Large Language Models

RuleBench-benchmark: evaluates LLMs capability to follow rules.
Evaluation dimensions include: executing rules, triggering rules, following formal rules, applying rules and following counterfactual rules.

Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency

Argues, that LLMs cannot be linguistic agents in the actual form by lacking embodiment, participation and precariousness.

Incorporating Large Language Models into Production Systems for Enhanced Task Automation and Flexibility

Reviews integration of LLMs into Automated Production Systems.

10th of July 2024

WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment

Discovers lower-bound of covering 0.5% of WikiHow instructions equals roughly usage of 300 APIs, which we can consider lower-bound limit for covering wide variety of WikiHow instructions in Embodied agent tasks.
The framework iteratively produces action spaces for APIs to be used by a LLM based embodied agent.
This two-step process works by iteratively generating through hallucination: semi-executable agent policies with python by LLM few-shot prompting from WikiHow instructions, parse partial/full python programs into pool of APIs

9th of July 2024

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

Hypothetical Minds: Introduces "Theory-of-Mind"-module. Includes as well perception, memory and hierarchical two-level planning.

Vision language models are blind

Reviews 7 visual tasks, where SOTA-level VLMs perform shockingly bad.

5th of July 2024

On scalable oversight with weak LLMs judging strong LLMs

Explores debate and consultancy to supervise AI.
Finds debate outperforms consultancy in general. Better debater models modestly improve judge accuracy.

When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions

Reviews toxicity/bias in LLM agent multi-step inputs/outputs, instead of individual LLM input-output.

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

Reviews LLMs in strategic games. LLMs come with systematic bias: positional bias, payoff bias and behavioural bias. LLMs performance decreases, when the mentioned bias-dimensions are misaligned.

3rd of July 2024

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

LivePortrait: generates realistic video from single portrait image with facial expressions and head poses from different angles.
Offers better computational efficiency and controllability over diffusion models, by using implicit-keypoint-based framework.
Generation speed is 12.8 ms with RTX 4090.

Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

Cactus: multi-turn dialogue dataset for mental health counseling, consisting of goal-oriented/structured Cognitive Behavioral Therapy interation.
Trains Camel-LLM using the Cactus-dataset.