Plan–Code–Execute: Designing Agents That Create Their Own Tools
- Authors

- Name
- Nino
- Occupation
- Senior Tech Editor
The paradigm of Large Language Model (LLM) agents has evolved rapidly. Initially, we focused on 'Reasoning' (Chain of Thought). Then, we moved to 'Acting' (ReAct), where agents use a predefined set of tools to interact with the world. However, as we build more complex systems, the limitations of pre-built toolkits are becoming apparent. The next frontier is the Plan–Code–Execute (PCE) framework, where agents don't just use tools—they build them on the fly.
The Bottleneck of Pre-built Tools
In traditional agentic architectures, developers provide a library of functions (tools) that the agent can call. While effective for simple tasks, this approach suffers from several critical flaws:
- Context Window Bloat: Providing 50+ tool definitions consumes significant tokens, reducing the space available for actual task data.
- The Discovery Problem: LLMs often struggle to select the correct tool when faced with too many similar options, leading to hallucinations.
- Rigidity: If a task requires a specific data transformation not covered by your library, the agent fails.
To overcome these hurdles, developers are turning to high-performance API providers like n1n.ai to access models capable of sophisticated code generation and reasoning, enabling the agent to write its own logic.
The Plan–Code–Execute (PCE) Workflow
The PCE framework shifts the responsibility of tool creation from the human developer to the AI agent. This process typically involves three distinct phases:
1. The Planning Phase
Instead of jumping straight to action, the agent analyzes the user's request and decomposes it into sub-tasks. It determines if an existing tool exists or if a new, custom function is required. Models like OpenAI o3 or Claude 3.5 Sonnet are particularly adept at this high-level architectural reasoning.
2. The Coding Phase
Once the plan is set, the agent writes Python or JavaScript code to solve the sub-task. For instance, if the agent needs to calculate the volatility of a specific stock over a non-standard period, it writes a script using pandas and numpy rather than relying on a hardcoded /get_volatility endpoint.
3. The Execution Phase
The generated code is sent to a secure, sandboxed environment (like E2B or a Docker container). The output is then fed back into the agent's context to inform the next step. By using the low-latency endpoints at n1n.ai, developers can ensure this loop happens in near real-time.
Technical Implementation: A Python Example
Below is a simplified conceptual implementation of a Tool-Making Agent. We use a system prompt that encourages the agent to define functions when necessary.
import subprocess
class ToolMakerAgent:
def __init__(self, api_key):
self.api_key = api_key
self.sandbox_env = {}
def execute_code(self, code_string):
# In production, use a secure sandbox like E2B
try:
exec(code_string, self.sandbox_env)
return "Execution Successful"
except Exception as e:
return f"Error: {str(e)}"
def handle_request(self, user_query):
# Step 1: Plan & Code generation (Simplified)
# Use n1n.ai to call Claude 3.5 Sonnet
prompt = f"Write a Python function to solve: {user_query}. Return only code."
generated_code = self.call_llm(prompt)
# Step 2: Execute
result = self.execute_code(generated_code)
return result
Choosing the Right Model for PCE
Not all LLMs are created equal for the PCE workflow. The strategy requires high "Coding Intelligence" and "Instruction Following."
| Model | Coding Score (HumanEval) | Reasoning Depth | Best Use Case |
|---|---|---|---|
| Claude 3.5 Sonnet | High | Exceptional | General-purpose PCE agents |
| DeepSeek-V3 | Very High | High | Cost-effective tool generation |
| OpenAI o3 | Extreme | State-of-the-art | Complex mathematical/logical tools |
Accessing these models through a unified gateway like n1n.ai allows you to swap models dynamically based on the complexity of the tool being created. For example, use DeepSeek-V3 for simple data cleaning scripts and switch to Claude for complex financial modeling tools.
Security Considerations: The Sandbox
Allowing an agent to execute self-generated code is inherently risky. You must implement strict security measures:
- Resource Limits: Limit CPU and RAM to prevent infinite loops or memory exhaustion (e.g., Memory < 512MB).
- Network Isolation: Disable internet access within the sandbox unless explicitly required.
- Ephemeral Environments: Each execution should happen in a fresh container that is destroyed immediately after use.
Optimization: Tool Persistence
A common optimization is to "save" the tools the agent creates. If an agent builds a complex regex parser for a specific log format, save that code to a local database. The next time a similar request arrives, the agent can search its own "created library" via RAG (Retrieval-Augmented Generation) before writing new code from scratch.
Conclusion
The shift from static tool-use to dynamic tool-making represents a significant leap in AI autonomy. By implementing the Plan-Code-Execute framework, you build systems that are more flexible, token-efficient, and capable of solving open-ended problems.
To start building your own tool-making agents with the world's most powerful models, get a free API key at n1n.ai.