feat(agent): add LLM-powered metadata enrichment system with AgentCLI and PostProcessor

Introduce an agent skill framework for LLM-driven metadata enrichment: - AgentCLI (py/agent_cli/): in-process wrappers around internal services using standard relative imports, eliminating the need for sys.path hacks - LLMService: centralized BYOK (bring-your-own-key) LLM client supporting OpenAI, Ollama, and custom OpenAI-compatible endpoints - PostProcessor: deterministic engine that applies LLM output via AgentCLI (replaces old handler.py + _BASE_MODEL_ALIASES approach) - SkillRegistry: filesystem-based skill discovery (skill.yaml + prompt.md) - AgentService: orchestrates skill execution with WebSocket progress - Frontend AgentManager: WebSocket listeners, skill execution, config UI - Context menu entries (single + bulk) for "Enrich Metadata (Agent)" - Settings UI for AI Provider configuration (BYOK) - Full i18n support across 9 locales Bug fixes found during review: - aiohttp.web.json_response: status_code= -> status= - settings_modal cancelEditApiKey: wrong argument position - AgentManager.isLlmConfigured: allow Ollama without API key - PostProcessor._merge_tags: lowercase all tags to match TagUpdateService
2026-07-02 23:41:16 -03:00 · 2026-07-02 20:51:11 +08:00
parent fe90f7f9b1
commit cf898da193
44 changed files with 5937 additions and 2180 deletions
--- a/docs/agent_skills.md
+++ b/docs/agent_skills.md
@@ -0,0 +1,208 @@
+# Agent Skills System
+
+The LoRA Manager agent skills system enables LLM-powered metadata enrichment and other AI-driven tasks. Users configure their own LLM provider (BYOK), and skills are executed through right-click context menu actions.
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────┐
+│              LoRA Manager Backend             │
+│                                               │
+│  ┌──────────────┐    ┌────────────────┐       │
+│  │ LLMService    │───▶│ LLM Provider   │       │
+│  │ (BYOK config, │◀───│ (OpenAI/Ollama │       │
+│  │  API calls)   │    │ /custom)       │       │
+│  └───────┬───────┘    └────────────────┘       │
+│          │                                     │
+│  ┌───────▼───────────────────────┐             │
+│  │     AgentService              │             │
+│  │  (orchestration: validate     │             │
+│  │   → LLM call → post-process   │             │
+│  │   → WebSocket broadcast)      │             │
+│  └───────┬───────────────────────┘             │
+│          │                                     │
+│  ┌───────▼───────────────────────┐             │
+│  │     SkillRegistry             │             │
+│  │  ┌─────────────────────────┐  │             │
+│  │  │ enrich_hf_metadata:     │  │             │
+│  │  │  - skill.yaml           │  │             │
+│  │  │  - prompt.md            │  │             │
+│  │  │  - handler.py           │  │             │
+│  │  └─────────────────────────┘  │             │
+│  └───────────────────────────────┘             │
+└──────────────────────────────────────────────┘
+```
+
+### Key Design Principle
+
+**Skills define *what* to do (prompt + post-processing). The AgentService handles *how* (LLM calls, validation, progress).**
+
+Skills never call the LLM directly. This keeps BYOK configuration centralized and provider-agnostic.
+
+## BYOK Configuration
+
+Users configure their LLM provider in **Settings → AI Provider**:
+
+| Setting | Description | Example |
+|---|---|---|
+| `llm_provider` | Provider type | `openai`, `ollama`, or `custom` |
+| `llm_api_key` | API key (not needed for local Ollama) | `sk-...` |
+| `llm_api_base` | Custom API base URL (empty = provider default) | `https://api.openai.com/v1` |
+| `llm_model` | Model name | `gpt-4o-mini` |
+
+Environment variable overrides: `LLM_API_KEY`, `LLM_MODEL`, `LLM_API_BASE`, `LLM_PROVIDER`.
+
+### Supported Providers
+
+- **OpenAI**: Uses `https://api.openai.com/v1` by default
+- **Ollama** (local): Uses `http://localhost:11434/v1`, no API key required
+- **Custom**: Any OpenAI-compatible endpoint (vLLM, LM Studio, etc.) — set `llm_api_base` explicitly
+
+## Available Skills
+
+### enrich_hf_metadata
+
+Enriches HuggingFace-downloaded models with metadata extracted by an LLM from the HF model card.
+
+**Entry point**: Right-click context menu → "Enrich Metadata (Agent)"
+
+**What it does**:
+1. Reads the model's `.metadata.json` to get the `hf_url`
+2. Fetches the README.md from the HuggingFace repository
+3. Sends the README + local metadata to the LLM for structured extraction
+4. Writes extracted fields to `.metadata.json`:
+   - `base_model` — only if current value is empty
+   - `trainedWords` — trigger words (LoRA only, if none exist)
+   - `modelDescription` — concise summary (if none exists)
+   - `tags` — merged with existing tags, deduplicated
+   - `metadata_source` — audit trail: `agent:enrich_hf_metadata`
+   - `llm_enriched_at` — ISO timestamp
+5. Downloads and optimizes preview image (if LLM found one in the README)
+6. Updates the scanner cache
+7. Broadcasts WebSocket progress events
+
+**Model types**: LoRA, Checkpoint, Embedding
+
+## Adding a New Skill
+
+### 1. Create the skill directory
+
+```
+py/services/agent/skills/<skill_name>/
+├── skill.yaml      # Skill metadata and schemas
+├── prompt.md       # LLM prompt template
+└── handler.py      # Pre-processing and post-processing
+```
+
+### 2. Write skill.yaml
+
+```yaml
+name: my_skill
+title: "My Skill"
+description: "What this skill does"
+llm_required: true
+model_type_filter: ["lora"]  # or null for all types
+input_schema:
+  type: object
+  properties:
+    model_paths:
+      type: array
+      items:
+        type: string
+  required:
+    - model_paths
+output_schema:
+  type: object
+  properties:
+    # ... JSON schema for LLM output
+permissions:
+  write_metadata: true
+  write_previews: false
+  network_domains:
+    - "example.com"
+```
+
+### 3. Write prompt.md
+
+Use `{{variable}}` placeholders that will be replaced with data from the `prepare` function:
+
+```markdown
+You are an expert assistant...
+
+Model URL: {{hf_url}}
+README content:
+{{readme_content}}
+
+Current metadata:
+{{current_metadata}}
+```
+
+### 4. Write handler.py
+
+```python
+async def prepare(model_path: str, input_data: dict) -> dict:
+    """Gather context for the LLM prompt. Returns variables for template rendering."""
+    return {
+        "model_path": model_path,
+        # ... other variables used in prompt.md
+    }
+
+async def post_process(context) -> dict:
+    """Apply the LLM-extracted data to the model."""
+    llm_response = context.llm_response
+    # ... write metadata, download previews, update cache
+    return {
+        "success": True,
+        "updated_fields": ["base_model", "tags"],
+        "errors": [],
+    }
+```
+
+**Important**: Use absolute imports (`from py.utils.metadata_manager import MetadataManager`) because skills are loaded via `importlib.util.spec_from_file_location`, which doesn't support relative imports.
+
+### 5. Test
+
+The skill is automatically discovered by `SkillRegistry` on startup. Test with:
+
+```python
+pytest tests/services/test_agent_service.py
+```
+
+## API Endpoints
+
+| Method | Path | Description |
+|---|---|---|
+| GET | `/api/lm/agent/skills` | List available skills |
+| POST | `/api/lm/agent/execute/{skill_name}` | Execute a skill (body: `{"model_paths": [...]}`) |
+| POST | `/api/lm/agent/cancel` | Cancel running skill (stub) |
+
+## WebSocket Events
+
+| Type | When | Key fields |
+|---|---|---|
+| `agent_progress` | Skill started/processing | `skill`, `status`, `total`, `processed`, `success`, `current_path` |
+| `agent_progress` | Skill completed | `skill`, `status`, `updated_models`, `errors`, `summary` |
+| `agent_progress` | Skill error | `skill`, `status`, `error` |
+
+## Security Model
+
+Skills declare permissions in `skill.yaml`:
+- `write_metadata` — can write `.metadata.json` files
+- `write_previews` — can download/replace preview images
+- `network_domains` — allowed domains for HTTP requests
+
+These are declarative constraints checked by `AgentService`. They are defense-in-depth, not a sandbox — the Python process can technically do anything, but the contract is clear and auditable.
+
+## File Locations
+
+| Component | Path |
+|---|---|
+| LLMService | `py/services/llm_service.py` |
+| AgentService | `py/services/agent/agent_service.py` |
+| SkillRegistry | `py/services/agent/skill_registry.py` |
+| SkillDefinition | `py/services/agent/skill_definition.py` |
+| Skills directory | `py/services/agent/skills/` |
+| Route handlers | `py/routes/handlers/agent_handlers.py` |
+| Frontend manager | `static/js/managers/AgentManager.js` |
+| Settings UI | `templates/components/modals/settings_modal.html` |
+| Context menu | `templates/components/context_menu.html` |