LLM - Dify Docs

The LLM node calls a large language model to generate responses based on your instructions and inputs from upstream nodes.

Choose a Model

Choose a model that best fits your task from your configured providers. After selection, you can adjust model parameters to control how it generates responses. Available parameters and presets vary by model.

Write the Prompt

Instruct the model on how to process inputs and generate responses. Reference variables by typing / or {. If you’re unsure where to start or want to refine existing prompts, try our AI-assisted prompt generator.

Specify Instructions and Messages

Define the system instruction and click Add Message to add user/assistant messages. They are all sent to the model in order in a single prompt. Think of it as chatting directly with the model:

System instructions set the rules for how the model should respond—its role, tone, and behavioral guidelines.
User messages are what you send to the model—a question, request, or task for the model to work on.
Assistant messages are the model’s responses.

Separate Inputs from Rules

Define the role and rules in the system instruction, then pass the actual task input in a user message. For example:

# System instruction
You are a children's story writer. Write a story based on the user's input. Use simple language and a warm tone.

# User message
Write a bedtime story about a rabbit who makes friends with a shy hedgehog

While it may seem simpler to put everything in the system instruction, separating role definitions from task inputs gives the model clearer structure to work with.

Simulate Chat History

You might wonder: if assistant messages are the model’s responses, why would I add them manually? By adding alternating user and assistant messages, you create simulated chat history in the prompt. The model treats these as prior exchanges, which can help guide its behavior.

Import Chat History from Upstream LLMs

Click Add Chat History to import chat history from an upstream Agent or LLM node. This lets the model know what happened upstream and continue from where that node left off. Chat history includes user, assistant, and . You can view it in an Agent or LLM node’s context output variable.

System instructions are not included, as they are node-specific.

This is useful when chaining multiple Agent or LLM nodes:

Without importing chat history, a downstream node only receives the upstream node’s final output, with no idea how it got there.
With imported chat history, it sees the entire process: what the user asked, what tools were called, what results came back, and how the model reasoned through them.

Specify your new task in the automatically added user message. The imported history is prepended to the current node’s messages, so the model sees it as one continuous conversation. Since the imported history typically ends with an assistant message, the model needs a follow-up user message to know what to do next.

Example: Chain Research and Report LLMs

Suppose two LLM nodes run in sequence: LLM A researches a topic by calling search tools, and LLM B writes a report based on the research.If LLM B only receives LLM A’s final text output, it can summarize the conclusions but can’t verify them or cite specific sources. By importing LLM A’s chat history, LLM B sees the raw data from each tool call and can reference it directly in the report.Here’s the complete message sequence LLM B sees after importing LLM A’s chat history:

# LLM B's own system instruction
1. System: "You are a professional report writer..."

# from LLM A         
2. User: "What are the new trends in the EV market?"

# from LLM A
3. Tool: [search results with URLs and raw data]

# from LLM A
4. Assistant: "Based on the search results, the key trends are..."

# LLM B's own user message
5. User: "Write a 500-word market analysis report."

LLM B understands: it has seen the research process (question, search, summary), and now needs to write a report based on that information—including the raw data it couldn’t access through the text output alone.

Create Dynamic Prompts Using Jinja2

Create dynamic prompts using Jinja2 syntax. For example, use conditionals to customize instructions based on variable values.

Jinja2 Example: Conditional Prompt by User Level

You are a 
{% if user_level == "beginner" %}patient and friendly 
{% elif user_level == "intermediate" %}professional and efficient 
{% else %}senior expert-level 
{% endif %} assistant.

{% if user_level == "beginner" %} 
Please explain in simple and easy-to-understand language. Provide examples when necessary. Avoid using technical jargon. 
{% elif user_level == "intermediate" %} You may use some technical terms, but provide appropriate explanations. Offer practical advice and best practices. 
{% else %} You may delve into technical details and use professional terminology. Focus on advanced use cases and optimization solutions. 
{% endif %}

By default, you’d need to send all possible instructions to the model, describe the conditions, and let it decide which to follow—an approach that’s often unreliable. With Jinja2 templating, only the instructions matching the defined conditions are sent, ensuring predictable behavior and reducing token usage.

Add Context

In Advanced Settings > Context, provide the LLM with additional reference information to reduce hallucination and improve response accuracy. A typical pattern: pass retrieval results from a knowledge retrieval node for Retrieval-Augmented Generation (RAG).

Enable Conversation Memory (Chatflows Only)

Memory is node-specific and doesn’t persist between different conversations.

Enable Memory to keep recent dialogues, so the LLM can answer follow-up questions coherently. A user message will be automatically added to pass the current user query and any uploaded files. This is because memory works by storing recent user-assistant exchanges. If the current query isn’t passed through a user message, there’s nothing to record on the user side. Window Size controls how many recent exchanges to retain. For example, 5 keeps the last 5 user-query and LLM-response pairs.

Use Dify Tools

Only models with the Tool Call tag can use Dify tools.

Add Dify tools to let the model to interact with external services and APIs. This is useful when tasks require real-time data or actions beyond text generation, like web searches or database queries. You can disable or delete added tools, and modify their configuration. A clearer tool description helps the model judge when to use it. Adjust Max Iterations Max Iterations in Advanced Settings limits how many times the model can repeat its reasoning-and-action cycle (think, call a tool, process the result) for a single request. Increase this value for complex, multi-step tasks that require multiple tool calls. Higher values increase latency and token costs.

Process Multimodal Inputs

To let multimodal-capable models process images, audio, video, or documents, choose either approach:

Reference file variables directly in the prompt.
Enable Vision in Advanced Settings and select the file variable there. Resolution only controls the detail level for image processing:
- High: Better accuracy for complex images but uses more tokens
- Low: Faster processing with fewer tokens for simple images

Separate Thinking and Tool Calling from Responses

To get a clean response without the model’s thinking process and tool calls (if any), reference the text output variable (with Enable Reasoning Tag Separation turned on) or generation.content. The generations variable itself includes all intermediate steps alongside the final response.

Force Structured Output

Describing an output format in instructions can produce inconsistent results. For more reliable formatting, enable structured output to enforce a defined JSON schema.

For models without native JSON support, Dify includes the schema in the prompt, but strict adherence is not guaranteed.

Next to Output Variables, toggle on Structured. A structured_output variable will appear at the end of the output variable list.
Click Configure to define the output schema using one of the following methods.
- Visual Editor: Define simple structures with a no-code interface. The corresponding JSON schema is generated automatically.
- JSON Schema: Directly write schemas for complex structures with nested objects, arrays, or validation rules.
- AI Generation: Describe needs in natural language and let AI generate the schema.
- JSON Import: Paste an existing JSON object to automatically generate the corresponding schema.

Handle Errors

Configure automatic retries for temporary issues (like network glitches) and a fallback error handling strategy to keep the workflow running if errors persist.

​Choose a Model

​Write the Prompt

​Specify Instructions and Messages

​Separate Inputs from Rules

​Simulate Chat History

​Import Chat History from Upstream LLMs

​Create Dynamic Prompts Using Jinja2

​Add Context

​Enable Conversation Memory (Chatflows Only)

​Use Dify Tools

​Process Multimodal Inputs

​Separate Thinking and Tool Calling from Responses

​Force Structured Output

​Handle Errors

Choose a Model

Write the Prompt

Specify Instructions and Messages

Separate Inputs from Rules

Simulate Chat History

Import Chat History from Upstream LLMs

Create Dynamic Prompts Using Jinja2

Add Context

Enable Conversation Memory (Chatflows Only)

Use Dify Tools

Process Multimodal Inputs

Separate Thinking and Tool Calling from Responses

Force Structured Output

Handle Errors