Commit Graph

209 Commits

Author SHA1 Message Date
vmobilis
655f8a5169
fix: clang complains about needless braces (#618) 2025-03-09 12:26:41 +08:00
idostyle
d7c7a34712
fix: ModelLoader::load_tensors duplicated check (#623)
Introduced in 2b6ec97fe2
2025-03-09 12:23:23 +08:00
vmobilis
81556f3136
chore: silence some warnings about precision loss (#620) 2025-03-09 12:22:39 +08:00
stduhpf
3fb275a67b
fix: suport sdxl embedddings (#621) 2025-03-09 12:21:23 +08:00
leejet
30b3ac8e62 fix: avoid potential dangling pointer problem 2025-03-01 16:58:26 +08:00
leejet
195d170136 sync: update ggml 2025-03-01 12:09:55 +08:00
stduhpf
f50a7f66aa
fix: fix race condition causing inconsistent value for decoder_only (#609) 2025-03-01 11:49:06 +08:00
stduhpf
85e9a12988
fix: preprocess tensor names in tensor types map (#607)
Thank you for your contribution
2025-03-01 11:48:04 +08:00
stduhpf
fbd42b6fc1
fix: fix embeddings with quantized models (#601) 2025-03-01 11:45:39 +08:00
yslai
19d876ee30
feat: implement DDIM with the "trailing" timestep spacing and TCD (#568) 2025-02-22 21:34:22 +08:00
lalala
f27f2b2aa2
docs: add missing --mask and --guidance options to print_usage (#572) 2025-02-22 21:32:37 +08:00
piallai
99609761dc
docs: fix typo in readme (#574) 2025-02-22 21:30:28 +08:00
stduhpf
69c73789fe
fix: force binary mask for inpaint models (#589)
Co-authored-by: leejet <leejet714@gmail.com>
2025-02-22 21:29:57 +08:00
Meng, Hengyu
838beb9b5e
chore: add global SYCL compile flags (#597) 2025-02-22 21:23:58 +08:00
stduhpf
f23b803a6b
fix:: unapply current loras properly (#590) 2025-02-22 21:22:22 +08:00
stduhpf
1be2491dcf
feat: partial LyCORIS support (tucker decomposition for LoCon + LoHa + LoKr) (#577) 2025-02-22 21:19:26 +08:00
Matti Pulkkinen
3753223982
fix: make get_files_from_dir works with absolute path (#598)
Co-authored-by: Matti Pulkkinen <pulkkinen@ultimatium.com>
2025-02-22 21:16:50 +08:00
R0CKSTAR
59ca2b0f16
chore: bump MUSA SDK version to rc3.1.1 (#599)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-22 21:14:26 +08:00
vmobilis
d46ed5e184
feat: support JPEG compression (#583) 2025-02-05 16:18:02 +08:00
ag2s20150909
2535ad5a43
chore: fix cuda on github action (#580) 2025-02-05 16:15:41 +08:00
stduhpf
e500d95abd
fix: fix rank 1 loras (#575) 2025-02-05 16:13:17 +08:00
R0CKSTAR
a3cbdf6dcb
chore: SD_USE_CUBLAS => SD_USE_CUDA for MUSA backend (#578)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-05 16:11:26 +08:00
piallai
5eb15ef4d0
docs: add CLI-GUI to list (#546) 2025-01-18 13:16:54 +08:00
stduhpf
d9b5942d98
feat: add sdxl v-pred suppport (#536) 2025-01-18 13:15:54 +08:00
stduhpf
587a37b2e2
fix: avoid sd2((non inpaint) crash on v-pred check (#537) 2025-01-18 13:13:34 +08:00
ag2s20150909
4fe83d52cf
chore: fix CUDA on GitHub Action (#567) 2025-01-18 13:12:26 +08:00
null-define
b70aaa672a
chore: fix amd rocm build (#571) 2025-01-18 13:11:39 +08:00
idostyle
27edb765a5
chore: fix CI windows release artifacts (#532) 2025-01-18 13:09:22 +08:00
leejet
dcf91f9e0f chore: change SD_CUBLAS/SD_USE_CUBLAS to SD_CUDA/SD_USE_CUDA 2024-12-28 13:27:51 +08:00
stduhpf
348a54e34a
feat: use pretty-progress for tensor loading (#516) 2024-12-28 13:14:52 +08:00
stduhpf
d50473dc49
feat: support 16 channel tae (taesd/taef1) (#527) 2024-12-28 13:13:48 +08:00
piallai
b5cc1422da
fix: fix typo for skip layers parameters (#492) 2024-12-28 13:12:08 +08:00
R0CKSTAR
5cc74d1f09
feat: support Moore Threads GPU (#529)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-12-28 13:08:36 +08:00
stduhpf
0d9d6659a7
fix: fix metal build (#513) 2024-12-28 13:06:17 +08:00
stduhpf
8f4ab9add3
feat: support Inpaint models (#511) 2024-12-28 13:04:49 +08:00
stduhpf
cc92a6a1b3
feat: support more LoRA models (#520) 2024-12-28 12:56:44 +08:00
leejet
9578fdcc46 chore: remove rocm5.5 build temporarily 2024-11-30 14:26:29 +08:00
stduhpf
9148b980be
feat: remove type restrictions (#489) 2024-11-30 14:22:15 +08:00
stduhpf
7ce63e740c
feat: flexible model architecture for dit models (Flux & SD3) (#490)
* Refactor: wtype per tensor

* Fix default args

* refactor: fix flux

* Refactor photmaker v2 support

* unet: refactor the refactoring

* Refactor: fix controlnet and tae

* refactor: upscaler

* Refactor: fix runtime type override

* upscaler: use fp16 again

* Refactor: Flexible sd3 arch

* Refactor: Flexible Flux arch

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-30 14:18:53 +08:00
leejet
4570715727 fix: use ggml_nn_attention in vae 2024-11-24 18:21:31 +08:00
stduhpf
53b415f787
fix: remove default variables in c headers (#478) 2024-11-24 18:10:25 +08:00
leejet
c3eeb669cd sync: update ggml 2024-11-23 13:29:32 +08:00
leejet
b5f4932696 refactor: add some sd vesion helper functions 2024-11-23 13:02:44 +08:00
Erik Scholz
1c168d98a5
fix: repair flash attention support (#386)
* repair flash attention in _ext
this does not fix the currently broken fa behind the define, which is only used by VAE

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>

* make flash attention in the diffusion model a runtime flag
no support for sd3 or video

* remove old flash attention option and switch vae over to attn_ext

* update docs

* format code

---------

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 12:39:08 +08:00
William Murray
ea9b647080
docs: update readme, add python bindings (#423) 2024-11-23 11:52:33 +08:00
bssrdf
2b1bc06477
feat: add PhotoMaker Version 2 support (#358)
* first attempt at updating to photomaker v2

* continue adding photomaker v2 modules

* finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file

* added a name converter for Photomaker V2; build ok

* more debugging underway

* failing at cuda mat_mul

* updated chunk_half to be more efficient; redo feedforward

* fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op

* redo weight calculation and weight*v

* fixed a bug now Photomaker V2 kinds of working

* add python script for face detection (Photomaker V2 needs)

* updated readme for photomaker

* fixed a bug causing PMV1 crashing; both V1 and V2 work

* fixed clean_input_ids for PMV2

* fixed a double counting bug in tokenize_with_trigger_token

* updated photomaker readme

* removed some commented code

* improved reconstructing class word free prompt

* changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server

* minor clean up

---------

Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-11-23 11:50:14 +08:00
Flavio Bizzarri
b99cbfe4dc
docs: update README.md (#452) 2024-11-23 11:46:50 +08:00
Plamen Minev
8c7719fe9a
fix: typo in clip-g encoder arg (#472) 2024-11-23 11:46:00 +08:00
LostRuins Concedo
8f94efafa3
feat: add support for loading F8_E5M2 weights (#460) 2024-11-23 11:45:11 +08:00
fszontagh
07585448ad
docs: update readme (#462) 2024-11-23 11:42:12 +08:00