Commit Graph

175 Commits

Author SHA1 Message Date
stduhpf
8f4ab9add3
feat: support Inpaint models (#511) 2024-12-28 13:04:49 +08:00
stduhpf
cc92a6a1b3
feat: support more LoRA models (#520) 2024-12-28 12:56:44 +08:00
leejet
9578fdcc46 chore: remove rocm5.5 build temporarily 2024-11-30 14:26:29 +08:00
stduhpf
9148b980be
feat: remove type restrictions (#489) 2024-11-30 14:22:15 +08:00
stduhpf
7ce63e740c
feat: flexible model architecture for dit models (Flux & SD3) (#490)
* Refactor: wtype per tensor

* Fix default args

* refactor: fix flux

* Refactor photmaker v2 support

* unet: refactor the refactoring

* Refactor: fix controlnet and tae

* refactor: upscaler

* Refactor: fix runtime type override

* upscaler: use fp16 again

* Refactor: Flexible sd3 arch

* Refactor: Flexible Flux arch

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-30 14:18:53 +08:00
leejet
4570715727 fix: use ggml_nn_attention in vae 2024-11-24 18:21:31 +08:00
stduhpf
53b415f787
fix: remove default variables in c headers (#478) 2024-11-24 18:10:25 +08:00
leejet
c3eeb669cd sync: update ggml 2024-11-23 13:29:32 +08:00
leejet
b5f4932696 refactor: add some sd vesion helper functions 2024-11-23 13:02:44 +08:00
Erik Scholz
1c168d98a5
fix: repair flash attention support (#386)
* repair flash attention in _ext
this does not fix the currently broken fa behind the define, which is only used by VAE

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>

* make flash attention in the diffusion model a runtime flag
no support for sd3 or video

* remove old flash attention option and switch vae over to attn_ext

* update docs

* format code

---------

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 12:39:08 +08:00
William Murray
ea9b647080
docs: update readme, add python bindings (#423) 2024-11-23 11:52:33 +08:00
bssrdf
2b1bc06477
feat: add PhotoMaker Version 2 support (#358)
* first attempt at updating to photomaker v2

* continue adding photomaker v2 modules

* finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file

* added a name converter for Photomaker V2; build ok

* more debugging underway

* failing at cuda mat_mul

* updated chunk_half to be more efficient; redo feedforward

* fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op

* redo weight calculation and weight*v

* fixed a bug now Photomaker V2 kinds of working

* add python script for face detection (Photomaker V2 needs)

* updated readme for photomaker

* fixed a bug causing PMV1 crashing; both V1 and V2 work

* fixed clean_input_ids for PMV2

* fixed a double counting bug in tokenize_with_trigger_token

* updated photomaker readme

* removed some commented code

* improved reconstructing class word free prompt

* changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server

* minor clean up

---------

Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-11-23 11:50:14 +08:00
Flavio Bizzarri
b99cbfe4dc
docs: update README.md (#452) 2024-11-23 11:46:50 +08:00
Plamen Minev
8c7719fe9a
fix: typo in clip-g encoder arg (#472) 2024-11-23 11:46:00 +08:00
LostRuins Concedo
8f94efafa3
feat: add support for loading F8_E5M2 weights (#460) 2024-11-23 11:45:11 +08:00
fszontagh
07585448ad
docs: update readme (#462) 2024-11-23 11:42:12 +08:00
stduhpf
6ea812256e
feat: add flux 1 lite 8B (freepik) support (#474)
* Flux Lite (Freepik) support

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 11:41:30 +08:00
stduhpf
9b1d90bc23
fix: improve clip text_projection support (#397) 2024-11-23 11:19:27 +08:00
stduhpf
65fa646684
feat: add sd3.5 medium and skip layer guidance support (#451)
* mmdit-x

* add support for sd3.5 medium

* add skip layer guidance support (mmdit only)

* ignore slg if slg_scale is zero (optimization)

* init out_skip once

* slg support for flux (expermiental)

* warn if version doesn't support slg

* refactor slg cli args

* set default slg_scale to 0 (oops)

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 11:15:31 +08:00
leejet
ac54e00760
feat: add sd3.5 support (#445) 2024-10-24 21:58:03 +08:00
stduhpf
14206fd488
fix: fix clip tokenizer (#383) 2024-09-02 22:31:46 +08:00
zhentaoyu
e410aeb534
sync: update ggml to fix large image generation with SYCL backend (#380)
* turn off fast-math on host in SYCL backend

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* update ggml for sync some sycl ops

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* update sycl readme and ggml

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

---------

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-09-02 22:29:35 +08:00
leejet
58d54738e2 docs: add star history 2024-08-28 00:27:54 +08:00
leejet
4f87b232c2 docs: add Vulkan build command 2024-08-28 00:25:31 +08:00
Erik Scholz
e71ddcedad
fix: improve VAE tiling (#372)
* fix and improve: VAE tiling
- properly handle the upper left corner interpolating both x and y
- refactor out lerp
- use smootherstep to preserve more detail and spend less area blending

* actually fix vae tile merging

Co-authored-by: stduhpf <stephduh@live.fr>

* remove the now unused lerp function

---------

Co-authored-by: stduhpf <stephduh@live.fr>
2024-08-28 00:21:12 +08:00
stduhpf
f4c937cb94
fix: add some missing cli args to usage (#363) 2024-08-28 00:17:46 +08:00
Daniele
0362cc4874
fix: fix some typos (#361) 2024-08-28 00:15:37 +08:00
Yu Xing
6c88ad3fd6
fix: resolve naming conflict while llama.cpp and sd.cpp both build (#351) 2024-08-28 00:14:41 +08:00
Daniele
dc0882cdc9
feat: add exponential scheduler (#346)
* feat: added exponential scheduler

* updated README

* improved exponential formatting

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-08-28 00:13:35 +08:00
Daniele
d00c94844d
feat: add ipndm and ipndm_v samplers (#344) 2024-08-28 00:03:41 +08:00
Daniele
2d4a2f7982
feat: add GITS scheduler (#343) 2024-08-28 00:02:17 +08:00
Tim Miller
353ee93e2d
fix: add enum type to sd_type_t (#293) 2024-08-27 23:57:24 +08:00
soham
2027b16fda
feat: add vulkan backend support (#291)
* Fix includes and init vulkan the same as llama.cpp

* Add Windows Vulkan CI

* Updated ggml submodule

* support epsilon as a parameter for ggml_group_norm

---------

Co-authored-by: Cloudwalk <cloudwalk@icculus.org>
Co-authored-by: Oleg Skutte <00.00.oleg.00.00@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-08-27 23:56:09 +08:00
leejet
8847114abf fix: fix issue when applying lora 2024-08-25 22:39:39 +08:00
leejet
5c561eab31 feat: do not convert more flux tensors 2024-08-25 16:01:36 +08:00
leejet
f5997a1951 fix: do not force using f32 for some flux layers
This sometimes leads to worse result
2024-08-25 14:07:22 +08:00
leejet
1bdc767aaf feat: force using f32 for some layers 2024-08-25 13:53:16 +08:00
leejet
79c9fe9556 feat: do not convert some tensors 2024-08-25 13:37:37 +08:00
leejet
28a614769a docs: update docs/flux.md 2024-08-25 13:11:34 +08:00
leejet
c837c5d9cc style: format code 2024-08-25 00:19:37 +08:00
leejet
d08d7fa632 docs: update README.md 2024-08-24 14:38:44 +08:00
leejet
64d231f384
feat: add flux support (#356)
* add flux support

* avoid build failures in non-CUDA environments

* fix schnell support

* add k quants support

* add support for applying lora to quantized tensors

* add inplace conversion support for f8_e4m3 (#359)

in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16

* add xlabs flux comfy converted lora support

* update docs

---------

Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
2024-08-24 14:29:52 +08:00
zhentaoyu
697d000f49
feat: add SYCL Backend Support for Intel GPUs (#330)
* update ggml and add SYCL CMake option

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* hacky CMakeLists.txt for updating ggml in cpu backend

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* rebase and clean code

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* add sycl in README

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* rebase ggml commit

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* refine README

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* update ggml for supporting sycl tsembd op

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

---------

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-10 13:42:50 +08:00
leejet
5b8d16aa68 docs: reorganize README.md 2024-08-03 12:06:34 +08:00
leejet
3d854f7917 sync: update ggml submodule url 2024-08-03 11:42:12 +08:00
leejet
4a6e36edc5 sync: update ggml 2024-07-28 18:30:35 +08:00
leejet
73c2176648
feat: add sd3 support (#298) 2024-07-28 15:44:08 +08:00
Phu Tran
9c51d8787f
chore: fix cuda CI (#286) 2024-06-12 23:13:24 +08:00
leejet
f9f0d4685b fix: sample_k_diffusion should be static 2024-06-10 23:04:02 +08:00
leejet
8d2050a5cf sync: update ggml 2024-06-10 22:59:36 +08:00