# Memory Module

This module provides memory management for LLM conversations, enabling context retention across dialogue turns.

## Overview

The memory module contains two types of memory implementations:

1. **TokenBufferMemory** - Conversation-level memory (existing)
2. **NodeTokenBufferMemory** - Node-level memory (**Chatflow only**)

> **Note**: `NodeTokenBufferMemory` is only available in **Chatflow** (advanced-chat mode).
> This is because it requires both `conversation_id` and `node_id`, which are only present in Chatflow.
> Standard Workflow mode does not have `conversation_id` and therefore cannot use node-level memory.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           Memory Architecture                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────-┐   │
│  │                      TokenBufferMemory                               │   │
│  │  Scope: Conversation                                                 │   │
│  │  Storage: Database (Message table)                                   │   │
│  │  Key: conversation_id                                                │   │
│  └─────────────────────────────────────────────────────────────────────-┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────-┐   │
│  │                    NodeTokenBufferMemory                             │   │
│  │  Scope: Node within Conversation                                     │   │
│  │  Storage: WorkflowNodeExecutionModel.outputs["context"]              │   │
│  │  Key: (conversation_id, node_id, workflow_run_id)                    │   │
│  └─────────────────────────────────────────────────────────────────────-┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## TokenBufferMemory (Existing)

### Purpose

`TokenBufferMemory` retrieves conversation history from the `Message` table and converts it to `PromptMessage` objects for LLM context.

### Key Features

- **Conversation-scoped**: All messages within a conversation are candidates
- **Thread-aware**: Uses `parent_message_id` to extract only the current thread (supports regeneration scenarios)
- **Token-limited**: Truncates history to fit within `max_token_limit`
- **File support**: Handles `MessageFile` attachments (images, documents, etc.)

### Data Flow

```
Message Table                    TokenBufferMemory              LLM
     │                                  │                        │
     │  SELECT * FROM messages          │                        │
     │  WHERE conversation_id = ?       │                        │
     │  ORDER BY created_at DESC        │                        │
     ├─────────────────────────────────▶│                        │
     │                                  │                        │
     │                    extract_thread_messages()              │
     │                                  │                        │
     │                    build_prompt_message_with_files()      │
     │                                  │                        │
     │                    truncate by max_token_limit            │
     │                                  │                        │
     │                                  │  Sequence[PromptMessage]
     │                                  ├───────────────────────▶│
     │                                  │                        │
```

### Thread Extraction

When a user regenerates a response, a new thread is created:

```
Message A (user)
    └── Message A' (assistant)
            └── Message B (user)
                    └── Message B' (assistant)
            └── Message A'' (assistant, regenerated)  ← New thread
                    └── Message C (user)
                            └── Message C' (assistant)
```

`extract_thread_messages()` traces back from the latest message using `parent_message_id` to get only the current thread: `[A, A'', C, C']`

### Usage

```python
from core.memory.token_buffer_memory import TokenBufferMemory

memory = TokenBufferMemory(conversation=conversation, model_instance=model_instance)
history = memory.get_history_prompt_messages(max_token_limit=2000, message_limit=100)
```

---

## NodeTokenBufferMemory

### Purpose

`NodeTokenBufferMemory` provides **node-scoped memory** within a conversation. Each LLM node in a workflow can maintain its own independent conversation history.

### Use Cases

1. **Multi-LLM Workflows**: Different LLM nodes need separate context
2. **Iterative Processing**: An LLM node in a loop needs to accumulate context across iterations
3. **Specialized Agents**: Each agent node maintains its own dialogue history

### Design: Zero Extra Storage

**Key insight**: LLM node already saves complete context in `outputs["context"]`.

Each LLM node execution outputs:
```python
outputs = {
    "text": clean_text,
    "context": self._build_context(prompt_messages, clean_text),  # Complete dialogue history!
    ...
}
```

This `outputs["context"]` contains:
- All previous user/assistant messages (excluding system prompt)
- The current assistant response

**No separate storage needed** - we just read from the last execution's `outputs["context"]`.

### Benefits

| Aspect | Old Design (Object Storage) | New Design (outputs["context"]) |
|--------|----------------------------|--------------------------------|
| Storage | Separate JSON file | Already in WorkflowNodeExecutionModel |
| Concurrency | Race condition risk | No issue (each execution is INSERT) |
| Cleanup | Need separate cleanup task | Follows node execution lifecycle |
| Migration | Required | None |
| Complexity | High | Low |

### Data Flow

```
WorkflowNodeExecutionModel        NodeTokenBufferMemory           LLM Node
     │                                  │                           │
     │                                  │◀── get_history_prompt_messages()
     │                                  │                           │
     │  SELECT outputs FROM             │                           │
     │  workflow_node_executions        │                           │
     │  WHERE workflow_run_id = ?       │                           │
     │  AND node_id = ?                 │                           │
     │◀─────────────────────────────────┤                           │
     │                                  │                           │
     │  outputs["context"]              │                           │
     ├─────────────────────────────────▶│                           │
     │                                  │                           │
     │                    deserialize PromptMessages                │
     │                                  │                           │
     │                    truncate by max_token_limit               │
     │                                  │                           │
     │                                  │  Sequence[PromptMessage]  │
     │                                  ├──────────────────────────▶│
     │                                  │                           │
```

### Thread Tracking

Thread extraction still uses `Message` table's `parent_message_id` structure:

1. Query `Message` table for conversation → get thread's `workflow_run_ids`
2. Get the last completed `workflow_run_id` in the thread
3. Query `WorkflowNodeExecutionModel` for that execution's `outputs["context"]`

### API

```python
class NodeTokenBufferMemory:
    def __init__(
        self,
        app_id: str,
        conversation_id: str,
        node_id: str,
        tenant_id: str,
        model_instance: ModelInstance,
    ):
        """Initialize node-level memory."""
        ...

    def get_history_prompt_messages(
        self,
        *,
        max_token_limit: int = 2000,
        message_limit: int | None = None,
    ) -> Sequence[PromptMessage]:
        """
        Retrieve history as PromptMessage sequence.
        
        Reads from last completed execution's outputs["context"].
        """
        ...

    # Legacy methods (no-op, kept for compatibility)
    def add_messages(self, *args, **kwargs) -> None: pass
    def flush(self) -> None: pass
    def clear(self) -> None: pass
```

### Configuration

Add to `MemoryConfig` in `core/workflow/nodes/llm/entities.py`:

```python
class MemoryMode(StrEnum):
    CONVERSATION = "conversation"  # Use TokenBufferMemory (default)
    NODE = "node"                  # Use NodeTokenBufferMemory (Chatflow only)

class MemoryConfig(BaseModel):
    role_prefix: RolePrefix | None = None
    window: MemoryWindowConfig | None = None
    query_prompt_template: str | None = None
    mode: MemoryMode = MemoryMode.CONVERSATION
```

**Mode Behavior:**

| Mode           | Memory Class          | Scope                    | Availability  |
| -------------- | --------------------- | ------------------------ | ------------- |
| `conversation` | TokenBufferMemory     | Entire conversation      | All app modes |
| `node`         | NodeTokenBufferMemory | Per-node in conversation | Chatflow only |

> When `mode=node` is used in a non-Chatflow context (no conversation_id), it falls back to no memory.

---

## Comparison

| Feature        | TokenBufferMemory        | NodeTokenBufferMemory              |
| -------------- | ------------------------ | ---------------------------------- |
| Scope          | Conversation             | Node within Conversation           |
| Storage        | Database (Message table) | WorkflowNodeExecutionModel.outputs |
| Thread Support | Yes                      | Yes                                |
| File Support   | Yes (via MessageFile)    | Yes (via context serialization)    |
| Token Limit    | Yes                      | Yes                                |
| Use Case       | Standard chat apps       | Complex workflows                  |

---

## Extending to Other Nodes

Currently only **LLM Node** outputs `context` in its outputs. To enable node memory for other nodes:

1. Add `outputs["context"] = self._build_context(prompt_messages, response)` in the node
2. The `NodeTokenBufferMemory` will automatically pick it up

Nodes that could potentially support this:
- `question_classifier`
- `parameter_extractor`
- `agent`

---

## Future Considerations

1. **Cleanup**: Node memory lifecycle follows `WorkflowNodeExecutionModel`, which already has cleanup mechanisms
2. **Compression**: For very long conversations, consider summarization strategies
3. **Extension**: Other nodes may benefit from node-level memory