Root cause: 231 concurrent /check-model-exists requests on 175K-lora library
caused ~9.4s wall clock time. The bottleneck was two-fold:
1. DownloadedVersionHistoryService opened a new sqlite3.connect() for every
query under asyncio.Lock. With a large WAL from 175K entries, each
connect() took ~8ms. Serialized by the lock across 231 requests, the
230th request waited ~1848ms just for lock acquisition.
2. check_model_exists always queried download history even when the model
was found locally. The history result (hasBeenDownloaded /
downloadedVersionIds) is only used by the UI when the model is NOT
found locally; when found, the 'in library' indicator takes priority.
Changes:
- downloaded_version_history_service.py: added persistent _get_conn() that
creates the SQLite connection once and reuses it across all queries
- misc_handlers.py: early-return from check_model_exists when the model
exists locally, bypassing the history service entirely (lock skipped)
Expected: per-request wait time drops from ~1912ms to <3ms, wall clock
from ~9.4s to <0.3s for the 175K-lora user's 231-card page.