Files
ComfyUI-Lora-Manager/docs/agent_skills.md
Will Miao cf898da193 feat(agent): add LLM-powered metadata enrichment system with AgentCLI and PostProcessor
Introduce an agent skill framework for LLM-driven metadata enrichment:

- AgentCLI (py/agent_cli/): in-process wrappers around internal services
  using standard relative imports, eliminating the need for sys.path hacks
- LLMService: centralized BYOK (bring-your-own-key) LLM client supporting
  OpenAI, Ollama, and custom OpenAI-compatible endpoints
- PostProcessor: deterministic engine that applies LLM output via AgentCLI
  (replaces old handler.py + _BASE_MODEL_ALIASES approach)
- SkillRegistry: filesystem-based skill discovery (skill.yaml + prompt.md)
- AgentService: orchestrates skill execution with WebSocket progress
- Frontend AgentManager: WebSocket listeners, skill execution, config UI
- Context menu entries (single + bulk) for "Enrich Metadata (Agent)"
- Settings UI for AI Provider configuration (BYOK)
- Full i18n support across 9 locales

Bug fixes found during review:
- aiohttp.web.json_response: status_code= -> status=
- settings_modal cancelEditApiKey: wrong argument position
- AgentManager.isLlmConfigured: allow Ollama without API key
- PostProcessor._merge_tags: lowercase all tags to match TagUpdateService
2026-07-02 21:27:01 +08:00

8.0 KiB

Agent Skills System

The LoRA Manager agent skills system enables LLM-powered metadata enrichment and other AI-driven tasks. Users configure their own LLM provider (BYOK), and skills are executed through right-click context menu actions.

Architecture

┌──────────────────────────────────────────────┐
│              LoRA Manager Backend             │
│                                               │
│  ┌──────────────┐    ┌────────────────┐       │
│  │ LLMService    │───▶│ LLM Provider   │       │
│  │ (BYOK config, │◀───│ (OpenAI/Ollama │       │
│  │  API calls)   │    │ /custom)       │       │
│  └───────┬───────┘    └────────────────┘       │
│          │                                     │
│  ┌───────▼───────────────────────┐             │
│  │     AgentService              │             │
│  │  (orchestration: validate     │             │
│  │   → LLM call → post-process   │             │
│  │   → WebSocket broadcast)      │             │
│  └───────┬───────────────────────┘             │
│          │                                     │
│  ┌───────▼───────────────────────┐             │
│  │     SkillRegistry             │             │
│  │  ┌─────────────────────────┐  │             │
│  │  │ enrich_hf_metadata:     │  │             │
│  │  │  - skill.yaml           │  │             │
│  │  │  - prompt.md            │  │             │
│  │  │  - handler.py           │  │             │
│  │  └─────────────────────────┘  │             │
│  └───────────────────────────────┘             │
└──────────────────────────────────────────────┘

Key Design Principle

Skills define what to do (prompt + post-processing). The AgentService handles how (LLM calls, validation, progress).

Skills never call the LLM directly. This keeps BYOK configuration centralized and provider-agnostic.

BYOK Configuration

Users configure their LLM provider in Settings → AI Provider:

Setting Description Example
llm_provider Provider type openai, ollama, or custom
llm_api_key API key (not needed for local Ollama) sk-...
llm_api_base Custom API base URL (empty = provider default) https://api.openai.com/v1
llm_model Model name gpt-4o-mini

Environment variable overrides: LLM_API_KEY, LLM_MODEL, LLM_API_BASE, LLM_PROVIDER.

Supported Providers

  • OpenAI: Uses https://api.openai.com/v1 by default
  • Ollama (local): Uses http://localhost:11434/v1, no API key required
  • Custom: Any OpenAI-compatible endpoint (vLLM, LM Studio, etc.) — set llm_api_base explicitly

Available Skills

enrich_hf_metadata

Enriches HuggingFace-downloaded models with metadata extracted by an LLM from the HF model card.

Entry point: Right-click context menu → "Enrich Metadata (Agent)"

What it does:

  1. Reads the model's .metadata.json to get the hf_url
  2. Fetches the README.md from the HuggingFace repository
  3. Sends the README + local metadata to the LLM for structured extraction
  4. Writes extracted fields to .metadata.json:
    • base_model — only if current value is empty
    • trainedWords — trigger words (LoRA only, if none exist)
    • modelDescription — concise summary (if none exists)
    • tags — merged with existing tags, deduplicated
    • metadata_source — audit trail: agent:enrich_hf_metadata
    • llm_enriched_at — ISO timestamp
  5. Downloads and optimizes preview image (if LLM found one in the README)
  6. Updates the scanner cache
  7. Broadcasts WebSocket progress events

Model types: LoRA, Checkpoint, Embedding

Adding a New Skill

1. Create the skill directory

py/services/agent/skills/<skill_name>/
├── skill.yaml      # Skill metadata and schemas
├── prompt.md       # LLM prompt template
└── handler.py      # Pre-processing and post-processing

2. Write skill.yaml

name: my_skill
title: "My Skill"
description: "What this skill does"
llm_required: true
model_type_filter: ["lora"]  # or null for all types
input_schema:
  type: object
  properties:
    model_paths:
      type: array
      items:
        type: string
  required:
    - model_paths
output_schema:
  type: object
  properties:
    # ... JSON schema for LLM output
permissions:
  write_metadata: true
  write_previews: false
  network_domains:
    - "example.com"

3. Write prompt.md

Use {{variable}} placeholders that will be replaced with data from the prepare function:

You are an expert assistant...

Model URL: {{hf_url}}
README content:
{{readme_content}}

Current metadata:
{{current_metadata}}

4. Write handler.py

async def prepare(model_path: str, input_data: dict) -> dict:
    """Gather context for the LLM prompt. Returns variables for template rendering."""
    return {
        "model_path": model_path,
        # ... other variables used in prompt.md
    }

async def post_process(context) -> dict:
    """Apply the LLM-extracted data to the model."""
    llm_response = context.llm_response
    # ... write metadata, download previews, update cache
    return {
        "success": True,
        "updated_fields": ["base_model", "tags"],
        "errors": [],
    }

Important: Use absolute imports (from py.utils.metadata_manager import MetadataManager) because skills are loaded via importlib.util.spec_from_file_location, which doesn't support relative imports.

5. Test

The skill is automatically discovered by SkillRegistry on startup. Test with:

pytest tests/services/test_agent_service.py

API Endpoints

Method Path Description
GET /api/lm/agent/skills List available skills
POST /api/lm/agent/execute/{skill_name} Execute a skill (body: {"model_paths": [...]})
POST /api/lm/agent/cancel Cancel running skill (stub)

WebSocket Events

Type When Key fields
agent_progress Skill started/processing skill, status, total, processed, success, current_path
agent_progress Skill completed skill, status, updated_models, errors, summary
agent_progress Skill error skill, status, error

Security Model

Skills declare permissions in skill.yaml:

  • write_metadata — can write .metadata.json files
  • write_previews — can download/replace preview images
  • network_domains — allowed domains for HTTP requests

These are declarative constraints checked by AgentService. They are defense-in-depth, not a sandbox — the Python process can technically do anything, but the contract is clear and auditable.

File Locations

Component Path
LLMService py/services/llm_service.py
AgentService py/services/agent/agent_service.py
SkillRegistry py/services/agent/skill_registry.py
SkillDefinition py/services/agent/skill_definition.py
Skills directory py/services/agent/skills/
Route handlers py/routes/handlers/agent_handlers.py
Frontend manager static/js/managers/AgentManager.js
Settings UI templates/components/modals/settings_modal.html
Context menu templates/components/context_menu.html