Plan–Code–Execute: Designing Agents That Create Their Own Tools

The paradigm of Large Language Model (LLM) agents has evolved rapidly. Initially, we focused on 'Reasoning' (Chain of Thought). Then, we moved to 'Acting' (ReAct), where agents use a predefined set of tools to interact with the world. However, as we build more complex systems, the limitations of pre-built toolkits are becoming apparent. The next frontier is the Plan–Code–Execute (PCE) framework, where agents don't just use tools—they build them on the fly.

The Bottleneck of Pre-built Tools

In traditional agentic architectures, developers provide a library of functions (tools) that the agent can call. While effective for simple tasks, this approach suffers from several critical flaws:

Context Window Bloat: Providing 50+ tool definitions consumes significant tokens, reducing the space available for actual task data.
The Discovery Problem: LLMs often struggle to select the correct tool when faced with too many similar options, leading to hallucinations.
Rigidity: If a task requires a specific data transformation not covered by your library, the agent fails.

To overcome these hurdles, developers are turning to high-performance API providers like n1n.ai to access models capable of sophisticated code generation and reasoning, enabling the agent to write its own logic.

The Plan–Code–Execute (PCE) Workflow

The PCE framework shifts the responsibility of tool creation from the human developer to the AI agent. This process typically involves three distinct phases:

1. The Planning Phase

Instead of jumping straight to action, the agent analyzes the user's request and decomposes it into sub-tasks. It determines if an existing tool exists or if a new, custom function is required. Models like OpenAI o3 or Claude 3.5 Sonnet are particularly adept at this high-level architectural reasoning.

2. The Coding Phase

Once the plan is set, the agent writes Python or JavaScript code to solve the sub-task. For instance, if the agent needs to calculate the volatility of a specific stock over a non-standard period, it writes a script using pandas and numpy rather than relying on a hardcoded /get_volatility endpoint.

3. The Execution Phase

The generated code is sent to a secure, sandboxed environment (like E2B or a Docker container). The output is then fed back into the agent's context to inform the next step. By using the low-latency endpoints at n1n.ai, developers can ensure this loop happens in near real-time.

Technical Implementation: A Python Example

Below is a simplified conceptual implementation of a Tool-Making Agent. We use a system prompt that encourages the agent to define functions when necessary.

import subprocess

class ToolMakerAgent:
    def __init__(self, api_key):
        self.api_key = api_key
        self.sandbox_env = {}

    def execute_code(self, code_string):
        # In production, use a secure sandbox like E2B
        try:
            exec(code_string, self.sandbox_env)
            return "Execution Successful"
        except Exception as e:
            return f"Error: {str(e)}"

    def handle_request(self, user_query):
        # Step 1: Plan & Code generation (Simplified)
        # Use n1n.ai to call Claude 3.5 Sonnet
        prompt = f"Write a Python function to solve: {user_query}. Return only code."
        generated_code = self.call_llm(prompt)

        # Step 2: Execute
        result = self.execute_code(generated_code)
        return result

Choosing the Right Model for PCE

Not all LLMs are created equal for the PCE workflow. The strategy requires high "Coding Intelligence" and "Instruction Following."

Model	Coding Score (HumanEval)	Reasoning Depth	Best Use Case
Claude 3.5 Sonnet	High	Exceptional	General-purpose PCE agents
DeepSeek-V3	Very High	High	Cost-effective tool generation
OpenAI o3	Extreme	State-of-the-art	Complex mathematical/logical tools

Accessing these models through a unified gateway like n1n.ai allows you to swap models dynamically based on the complexity of the tool being created. For example, use DeepSeek-V3 for simple data cleaning scripts and switch to Claude for complex financial modeling tools.

Security Considerations: The Sandbox

Allowing an agent to execute self-generated code is inherently risky. You must implement strict security measures:

Resource Limits: Limit CPU and RAM to prevent infinite loops or memory exhaustion (e.g., Memory < 512MB).
Network Isolation: Disable internet access within the sandbox unless explicitly required.
Ephemeral Environments: Each execution should happen in a fresh container that is destroyed immediately after use.

Optimization: Tool Persistence

A common optimization is to "save" the tools the agent creates. If an agent builds a complex regex parser for a specific log format, save that code to a local database. The next time a similar request arrives, the agent can search its own "created library" via RAG (Retrieval-Augmented Generation) before writing new code from scratch.

Conclusion

The shift from static tool-use to dynamic tool-making represents a significant leap in AI autonomy. By implementing the Plan-Code-Execute framework, you build systems that are more flexible, token-efficient, and capable of solving open-ended problems.

To start building your own tool-making agents with the world's most powerful models, get a free API key at n1n.ai.

Source: https://towardsdatascience.com/plan-code-execute-designing-agents-that-create-their-own-tools/