feat(agent): add LLM-powered metadata enrichment system with AgentCLI and PostProcessor

Introduce an agent skill framework for LLM-driven metadata enrichment: - AgentCLI (py/agent_cli/): in-process wrappers around internal services using standard relative imports, eliminating the need for sys.path hacks - LLMService: centralized BYOK (bring-your-own-key) LLM client supporting OpenAI, Ollama, and custom OpenAI-compatible endpoints - PostProcessor: deterministic engine that applies LLM output via AgentCLI (replaces old handler.py + _BASE_MODEL_ALIASES approach) - SkillRegistry: filesystem-based skill discovery (skill.yaml + prompt.md) - AgentService: orchestrates skill execution with WebSocket progress - Frontend AgentManager: WebSocket listeners, skill execution, config UI - Context menu entries (single + bulk) for "Enrich Metadata (Agent)" - Settings UI for AI Provider configuration (BYOK) - Full i18n support across 9 locales Bug fixes found during review: - aiohttp.web.json_response: status_code= -> status= - settings_modal cancelEditApiKey: wrong argument position - AgentManager.isLlmConfigured: allow Ollama without API key - PostProcessor._merge_tags: lowercase all tags to match TagUpdateService
2026-07-03 07:51:16 -03:00 · 2026-07-02 20:51:11 +08:00
parent fe90f7f9b1
commit cf898da193
44 changed files with 5937 additions and 2180 deletions
--- a/py/services/agent/skills/enrich_hf_metadata/prompt.md
+++ b/py/services/agent/skills/enrich_hf_metadata/prompt.md
@@ -0,0 +1,77 @@
+You are an expert assistant for AI image generation models. Your task is to extract structured metadata from a HuggingFace model card (README.md).
+
+## Model Information
+
+- **Repository**: {{hf_url}}
+- **Model file path**: {{model_path}}
+- **Repository ID**: {{repo}}
+
+## Current Metadata (may be incomplete)
+
+```json
+{{current_metadata}}
+```
+
+## HuggingFace README Content
+
+```
+{{readme_content}}
+```
+
+## Extraction Instructions
+
+Extract the following information from the README content above:
+
+### base_model
+The base model this LoRA/checkpoint was trained on. Use EXACTLY one of the names from the **Available Base Models** list below. Do not invent new names or use aliases.
+
+Available Base Models:
+{{base_models}}
+
+Check the YAML frontmatter (between --- markers) for `base_model:` first, then look at the description text and safetensors metadata. If you cannot determine it, return an empty string.
+
+### trigger_words
+The trigger words or activation prompts needed to use this LoRA. Look for:
+- `instance_prompt:` in the YAML frontmatter
+- Phrases like "trigger word:", "trigger:", "use this prompt:", "activation prompt:"
+- Example prompts at the start (usually the first word or phrase before any description)
+Return as an array of strings. If none found, return an empty array.
+
+### description
+A concise 1-2 sentence summary of what this model does. Extract from the "Model description" section or the first paragraph. Return empty string if the README is too minimal.
+
+### tags
+3-8 relevant tags for categorizing this model. Extract from:
+- The YAML frontmatter `tags:` list (often contains excellent categorization tags)
+- The model type (e.g. "lora", "checkpoint", "flux", "sdxl")
+- The style/subject (e.g. "anime", "photorealistic", "style", "character")
+All lowercase, no spaces. Return empty array if none found.
+
+### preview_url
+The URL of the most suitable preview image from the README. Look for image tags (e.g. `![alt](url)`) and the YAML frontmatter `widget:` section (which often has `output.url` fields). Choose the first image that appears to be a generation example (not a logo or diagram). Construct the absolute URL as `https://huggingface.co/{{repo}}/resolve/main/{filename}`. If no suitable image is found, return an empty string.
+
+### confidence
+Your confidence level in the extracted data:
+- "high" — most fields were explicitly stated in the README
+- "medium" — some fields were inferred from context
+- "low" — most fields are guesses based on limited information
+
+## Output Format
+
+Return ONLY a JSON object with exactly these fields (no markdown fences, no extra text):
+
+{
+  "model_path": "{{model_path}}",
+  "base_model": "<canonical name or empty string>",
+  "trigger_words": ["<word1>", "<word2>"],
+  "description": "<1-2 sentence summary>",
+  "tags": ["<tag1>", "<tag2>"],
+  "preview_url": "<image URL or empty string>",
+  "confidence": "<high|medium|low>"
+}
+
+Important:
+- Only include the JSON object, no other text
+- If a field cannot be determined, use an empty string or empty array
+- Do not fabricate information not supported by the README
+- For base_model, the YAML frontmatter often has `base_model:` with a HuggingFace repo name like "black-forest-labs/FLUX.1-dev" — map this to "Flux.1 D"
--- a/py/services/agent/skills/enrich_hf_metadata/skill.yaml
+++ b/py/services/agent/skills/enrich_hf_metadata/skill.yaml
@@ -0,0 +1,47 @@
+name: enrich_hf_metadata
+title: "Enrich Metadata from HuggingFace"
+description: >
+  Parse the HuggingFace model card via LLM to extract description, trigger
+  words, base model, tags, and preview image URL.  Updates .metadata.json
+  and downloads the preview thumbnail.
+llm_required: true
+model_type_filter: ["lora", "checkpoint", "embedding"]
+input_schema:
+  type: object
+  properties:
+    model_paths:
+      type: array
+      items:
+        type: string
+  required:
+    - model_paths
+output_schema:
+  type: object
+  properties:
+    model_path:
+      type: string
+    base_model:
+      type: string
+    trigger_words:
+      type: array
+      items:
+        type: string
+    description:
+      type: string
+    tags:
+      type: array
+      items:
+        type: string
+    preview_url:
+      type: string
+    confidence:
+      type: string
+      enum: ["high", "medium", "low"]
+  required:
+    - model_path
+    - confidence
+permissions:
+  write_metadata: true
+  write_previews: true
+  network_domains:
+    - "huggingface.co"