Will Miao
5494a70f40
chore(tests): commit validation dataset and baseline reports into repo
...
Move the HF model list from ~/Documents/ into tests/enrich_hf_validation/test_data/
and commit the pipeline validation baseline artifacts (report.json,
preprocessing_audit.json, README snapshots) into baselines/.
Update config.py and run_validation.py defaults to use repo-relative paths
via os.path.dirname(__file__) instead of ~/Documents/ hardcode.
Originates from changes in 8fb00998 (validation pipeline audit).
2026-07-05 17:03:45 +08:00
Will Miao
8fb00998a7
feat(agent): fix extract_relevant_section false positives, add validation pipeline audit
...
- extract_relevant_section: raise token threshold >3, verify anchor
sections contain basename, require 2+ heading token overlaps, skip
TOC-style headings (markdown links), verify heading section size
- metadata_constructor: parse repo_id,model_name.safetensors format
so model_path basename matches real filename
- config: replace hardcoded SUPPORTED_BASE_MODELS with dynamic
init_supported_base_models() using production list_base_models()
- preprocessing_auditor: new Phase 1.5 audit module — fetches each
README, runs extract_relevant_section + clean_readme_for_llm,
records stats and flags, saves raw READMEs for cross-reference
- run_validation: integrate audit phase, add --audit-only mode,
add LLM config consistency check, add ComfyUI root to sys.path
- report_generator: add Preprocessing Audit and Config Warnings
sections to both markdown and JSON reports
2026-07-05 11:18:48 +08:00
Will Miao
170c8068c5
feat(agent): enrich_hf_metadata — filename-aware section matching, preview extraction for markdown/HTML/widget, JSON salvage, instance_prompt fallback, and validation suite
...
- extract_relevant_section(): trim README to model-filename-matching section
for collection repos (download link, anchor ID, heading strategies)
- _strip_standalone_images(): preserve markdown image URLs so LLM can
extract preview_url; strip only HTML <img> tags
- extract_simple_markdown_images(): extract civitai.images from ![]() body
- extract_html_img_tags(): extract from <img src="..."> (deadman44-style)
- extract_gallery_images(): fix widget parser for YAML - output: dash prefix
- _is_heading: exclude </hN> closing tags from boundary detection
- _extract_section: start at matching heading when match IS a heading line
- _try_salvage_json(): recover truncated JSON (close braces/brackets in
LIFO order, close unterminated strings, strip trailing commas)
- PostProcessor: store _llm_confidence, add instance_prompt YAML fallback
- agent_service: pass model_basename to prompt, trim README via
extract_relevant_section before clean_readme_for_llm
- Add tests/enrich_hf_validation/ suite: 100-model pipeline with progress
checkpoint/resume, per-field scoring, markdown+JSON reporting
- Fix evaluation_engine: read _llm_confidence (not _llm_response)
2026-07-04 12:00:15 +08:00