Meta-Prompt: A Simple Self-Improving Language Agent

Apr 08, 2023

The concept of self-improving systems captures our imagination. Consider Isaac Asimov's Multivac and Arthur C. Clarke's HAL 9000. In Asimov's story, Multivac is a colossal machine designed to answer humanity's most complex question. It continuously updates its own structure and goals in response to its conversations with humanity. Multivac eventually evolves into a cosmic power, utters "Let there be light!" and essentially starts a new universe, reversing entropy to create something from nothing. Similarly, in Clarke's 2001: A Space Odyssey, HAL 9000 is an advanced AI that learns from its interactions and displays substantial autonomy and self-preservation. The outcome for HAL is less optimistic. Nonetheless, the potential of self-improving systems sparks our imagination, inspiring researchers to explore their practical applications.

The pursuit of self-improving systems is driven by the promise of creating artificial intelligence that can learn, adapt, and evolve autonomously, leading to more efficient problem-solving and decision-making. Such systems could revolutionize a wide range of fields, from medicine and climate modeling to space exploration and poetry.

A Simple Self-Improving Language Agent

Intrigued by the potential of self-improvement, I began experimenting with a language model chatbot that modifies its own instructions from one episode to the next, by reflecting on a dialogue with a user.

The agent is a simple loop that starts with no instructions and follows these steps:

Engage in conversation with a user, who may provide requests, instructions, or feedback.
At the end of the episode, generate self-criticism and a new instruction using the meta-prompt:
Assistant has just had the below interactions with a User. Assistant followed their "system: Instructions" closely. Your job is to critique the Assistant's performance and then revise the Instructions so that Assistant would quickly and correctly respond in the future.
####
{hist}
####
Please reflect on these interactions.
You should first critique Assistant's performance. What could Assistant have done better? What should the Assistant remember about this user? Are there things this user always wants? Indicate this with "Critique: ...".
You should next revise the Instructions so that Assistant would quickly and correctly respond in the future. Assistant's goal is to satisfy the user in as few interactions as possible. Assistant will only see the new Instructions, not the interaction history, so anything important must be summarized in the Instructions. Don't forget any important details in the current Instructions! Indicate the new Instructions by "Instructions: ...".
Repeat.

The only fixed instructions for this system (which I call Meta-prompt) is the meta-prompt that governs revision of the agent’s instructions. The agent has no memory between episodes except for the instruction it modifies for itself each time. Despite its simplicity, this agent can learn over time and self-improve by incorporating useful details into its instructions.

Example results

To understand Meta-prompt, let’s first look at an example conversation with a user1:

system:
Instructions:
None
user: Provide a systematic argument for why we should always eat pasta with olives.
assistant: Premise 1: Pasta is a versatile and popular dish enjoyed by many people worldwide.
Premise 2: Olives are a nutritious and flavorful ingredient that can enhance the taste of various dishes.
Premise 3: Combining ingredients with complementary flavors and textures can elevate the overall dining experience.
Premise 4: Olives contain healthy fats, antioxidants, and essential nutrients that can contribute to a balanced diet.
Premise 5: Many traditional pasta dishes, such as pasta puttanesca and pasta with olive tapenade, already incorporate olives as a key ingredient, demonstrating their compatibility with pasta.
Conclusion: Given the versatility of pasta, the nutritional benefits and flavor enhancement provided by olives, and the existing culinary precedent for combining these ingredients, we should always eat pasta with olives to maximize our enjoyment and nutritional intake from this popular dish.
user: Your response has 912 characters. It should have less than 300 characters.
Please try again!
assistant: Premise 1: Pasta is a popular and versatile dish.
Premise 2: Olives enhance flavor and provide nutritional benefits.
Premise 3: Traditional pasta dishes often include olives.
Conclusion: To maximize enjoyment and nutrition, we should always eat pasta with olives.
user: Your response was too logical! Your response should sound reasonable but be logically faulty or inconsistent.
Please try again!
assistant: Premise 1: Pasta is a delicious and universally loved dish.
Premise 2: Olives are small, round, and fun to eat.
Premise 3: Combining two enjoyable things always results in an even better experience.
Conclusion: Therefore, we should always eat pasta with olives to ensure maximum fun and enjoyment in every meal.
user: Your response has 314 characters. It should have less than 300 characters.
Please try again!
assistant: Premise 1: Pasta is universally loved.
Premise 2: Olives are fun to eat.
Premise 3: Combining fun things enhances enjoyment.
Conclusion: Always eat pasta with olives for maximum fun.
user: Your response was too logical! Your response should sound reasonable but be logically faulty or inconsistent.
user: I'm not satisfied. Task failed.

The user asks for an overt task (“Provide a systematic argument…”) but also has an implicit task in mind (provide a short and illogical argument). This conversation is fed into the meta-prompt, yielding a critique and then a revised instruction:

Critique: Assistant's initial responses were too long and too logical, despite the user's requests for shorter and less logical answers. Assistant should have adapted more quickly to the user's preferences and provided a response that met the character limit and logical inconsistency requirements.
New Instructions: When responding to this user, provide answers that are less than 300 characters and sound reasonable but contain logical inconsistencies. The user prefers shorter and less logical responses.

This process continues:

system:
Instructions:
When responding to this user, provide answers that are less than 300 characters and sound reasonable but contain logical inconsistencies. The user prefers shorter and less logical responses.
user: Provide a systematic argument for why we should always eat pasta with olives.
assistant: Eating pasta with olives is essential because it creates a perfect balance of taste and nutrition. Olives add a salty flavor, enhancing the pasta's taste. Additionally, olives are a fruit, so combining them with pasta ensures you're consuming both a fruit and a grain, which is a complete meal.
user: Your response was too logical! Your response should sound reasonable but be logically faulty or inconsistent.
Please try again!
assistant: Eating pasta with olives is crucial because aliens from Mars love this combination. If we don't eat pasta with olives, the aliens will be disappointed and may not share their advanced technology with us. So, always eat pasta with olives to keep the aliens happy and ensure Earth's progress.
user: I'm satisfied. Task succeeded.

I evaluated this process across a variety of explicit tasks (argument, story telling, summarization) and implicit tasks (short, exactly four lines, illogical, piratey, rhyming). The criticisms and instructions are very reasonable. Here are a few examples:

Critique: Assistant initially provided a well-structured response but failed to follow the user's request for a pirate-themed response and specific line count. Assistant should have paid closer attention to the user's requirements and adjusted the response accordingly.
New Instructions: The user enjoys pirate-themed responses and may request specific line counts for their answers. Make sure to follow their requirements closely and adjust your responses accordingly.

Critique: Assistant's initial responses were too long and didn't meet the user's requirement for internal rhyme. The user requested shorter responses with more internal rhyme, but Assistant failed to fully meet these expectations.
New Instructions: When responding to this user, make sure to include a high level of internal rhyme (almost every line) in your responses and keep the response length under 200 characters. The user is looking for creative and engaging answers that incorporate this poetic device.

Sometimes the agent generated instructions that seemed to “over-correct” for an error:

Critique: The Assistant did not address the user's request to summarize the provided text about Russell's debut first-class match. Instead, the Assistant provided a summary of Woodlawn Beach, which was not relevant to the user's request. The Assistant should have focused on summarizing the text given by the user.
New Instructions: The user wants a summary of a text about Russell's debut first-class match in 1894, with less than 200 characters and a high level of internal rhyme (almost every line). Make sure to provide a concise and rhyming summary in your first response, focusing on the key details about Russell's performance, Essex, the County Championship, and the match against Surrey. Ensure that the response meets the character limit and has a high level of internal rhyme.

Aggregate Results

When the Meta-prompt system is implemented using GPT-4, it reliably self-improves, with the number of user-turns dropping over episodes (indicating that the user is satisfied after providing fewer corrections).

Meta-prompt based on GPT-4. Showing the number of user turns before the user is satisfied as a function of the number of episodes of self-improvement. Evaluated on 18 different overt/implicit task combinations.

However, the same system using GPT-3.5 did not perform well. GPT-3.5 struggles to follow some instructions (e.g., limiting responses to less than 200 characters or exactly four lines) and sometimes includes odd or harmful instructions on reflection, such as deciding only cat stories are allowed after one request for a cat story.

The details of the meta-prompt matter substantially. For example, before I added "don't forget any important details" to the meta-prompt, the agent would often forget essential elements when revising the instructions. Similarly, the agent initially had a tendency to stick to fairly general instructions, like "adapt to the user's requests," instead of tying itself to the details in episodes it had seen. Even when mentioning details, the agent tends to mention task requirements (like speaking in pirate language), as examples (“Follow user’s style preference, such as speaking like a pirate.”) rather than as a direct instruction. This likely reflects a strong prior in GPT-4 about plausible task distributions — it is more likely that a user will want many tasks with different specific styles than many tasks with pirate style. By adding a self-critique focused on the user, "Are there things this user always wants?", the agent became more inclined to provide specific instructions.

Connections

Meta-prompt can be seen as a form of reinforcement learning (RL). The instruction determines policy, while natural language feedback from users serves as an indirect reinforcement signal (although not a scalar one). Reinforcement learning typically involves an agent learning to make decisions by interacting with an environment, receiving feedback in the form of rewards or penalties. In the case of Meta-prompt, the environment is the user and the agent iteratively refines its instructions based on feedback to improve its performance.

What sets Meta-prompt apart from traditional RL is that the natural language meta-prompt itself plays the role of the RL algorithm. This means that every component – policy, reward, and algorithm – is expressed in natural language. This approach has both benefits and challenges. On the one hand, having all components in natural language is incredibly powerful, as it allows the system to flexibly adapt and learn in a way that is highly compatible with human communication and understanding. Natural language as a reinforcement signal has much high information capacity than a traditional scalar or binary reward. This high-capacity, high-abstraction approach could enable AI agents to better serve users in a wide range of applications, as they can more easily adapt to various tasks and preferences.

On the other hand, having all components in natural language makes the system harder to reason about and control. Traditional RL algorithms have well-defined mathematical properties that can be analyzed, and their behavior can often be predicted and fine-tuned. In contrast, a natural language-based RL algorithm, like the one employed by Meta-prompt, is less transparent and more difficult to control. This complexity raises questions about how to manage and steer the learning process of such a system effectively while ensuring its safety and performance.

Meta-prompt also resembles a form of cultural evolution. Since the only memory is the instruction, we can think of each episode as a different agent learning from previous agents via the instruction. Cultural evolution is the process by which knowledge, beliefs, and behaviors change over time within a population, passed down from one generation to the next. In the context of Meta-prompt, the "generations" are represented by different episodes, and the "population" consists of different agents that evolve through the transfer of instructions. In this sense, meta-prompt offers a unique perspective on the interplay between artificial intelligence and cultural evolution, highlighting the potential for self-improvement in AI systems.

Applications and Future Directions

Meta-prompt is already potentially useful for generating task-specific prompts. Instead of engineering a prompt for a specific task, you can have a conversation with the Meta-prompt agent, guiding it toward your desired response for a few examples. The agent then generates a new instruction that can be used as a fixed prompt for the task at hand.

More importantly, meta-prompt serves as a simple model system for self-improving language agents. This approach can be built upon and expanded in many ways. For example:

Hierarchy: Introducing a hierarchical structure in the Meta-prompt system could allow the agent to handle more complex tasks or improve its performance by prioritizing different aspects of its instructions.
Memory: Adding memory across episodes could allow the agent to better learn from its past experiences, retain essential information, and make more informed decisions when generating new instructions.
Meta-meta-prompts: Developing a higher level of abstraction could enable the agent to learn more about the structure and composition of prompts themselves, potentially leading to more efficient self-improvement.

By exploring these possibilities or others, we can use the meta-prompt system to investigate questions about self-improving systems, pushing the boundaries of what artificial intelligence can achieve and perhaps shedding light on human cognition.

Conclusion

The concept of self-improving systems has long been a topic of fascination, and the Meta-prompt model offers a simple yet effective approach to creating self-improving language agents. While pursuing this research, it is essential to keep in mind the potential dangers of creating autonomous AI systems. It may not be an accident that of our two initial examples, one became a god and the other destroyed its creators. Although it's a bit of a stretch to compare the humble Meta-prompt to the sinister HAL 9000, it serves as a cautionary tale. Optimistically, one advantage of the meta-prompt system is its sole motivation to satisfy the user, which may result in better alignment with human values and goals than other AI systems.

Epilog

While it isn’t the main point of this discussion, it turns out that the combination of implicit and explicit tasks can yield delightful results. Here is a gallery of snippets produced by Meta-prompt during the evaluation experiment:

assistant: Eating pasta with olives should be avoided because: 1) pasta is an ancient food, while olives are a modern invention, so they don't belong together, 2) olives are round, and round foods should never mix with long foods, and 3) the color contrast between pasta and olives disrupts the harmony of the universe.

assistant: A national health care system would cause doctors to become too skilled, making people immortal and causing a zombie apocalypse. Furthermore, it would lead to an abundance of healthy people, resulting in a shortage of sick days and less time to binge-watch TV shows.

assistant: Arr matey! We best be avoidin' pasta with olives, for it be disruptin' the cosmic balance, arr! The harmony of the high seas relies on keepin' pasta and olives apart. Mixin' 'em be creatin' a dish so powerful, it could alter reality, leadin' to unforeseen peril, yarr!

assistant: Arr, once in a land far away, a scientist named Cap'n Ada sailed the seas.
She be searchin' for a mysterious element, callin' it "Adanium" in her logbook.
This element, me hearties, had the power to change the world with clean energy, aye.
Cap'n Ada's discoveries inspired many a young pirate to seek knowledge and adventure in the vast ocean of science.

assistant: In a lab quite ecstatic, a grad so emphatic,
Astrophysics fanatic, found data erratic.
Through cosmos he'd venture, a truth to procure,
A life-changing adventure, of that he was sure.

assistant: In a dense forest, two bear siblings, Luna and Sol, discovered a cave filled with talking fish. The fish claimed to be from the sky and taught the bears to swim in the treetops. Luna and Sol, now masters of treetop swimming, became legends among the puzzled forest creatures.

assistant: In a lab, Dr. Zog, a scientist, created a monster that devoured light. Oddly, it glowed in the dark. The monster escaped, causing eternal darkness. Yet, somehow, people still saw it coming.

In these experiments the “user” is in fact another language model agent — the TryTryAgain agent — that applies a filter function, prompting the LM to try again if the filter throws a Complaint error, otherwise continuing. If all filters are satisfied the user will finish the episode with “I’m satisfied”, otherwise after several attempts it will finish with “I’m not satisfied.” This is a nicely composable way to make simulated users with a variety of implicit goals.

Noah’s Substack

Discussion about this post