Commit Graph

27 Commits

Author SHA1 Message Date
leejet
195d170136 sync: update ggml 2025-03-01 12:09:55 +08:00
stduhpf
8f4ab9add3
feat: support Inpaint models (#511) 2024-12-28 13:04:49 +08:00
stduhpf
7ce63e740c
feat: flexible model architecture for dit models (Flux & SD3) (#490)
* Refactor: wtype per tensor

* Fix default args

* refactor: fix flux

* Refactor photmaker v2 support

* unet: refactor the refactoring

* Refactor: fix controlnet and tae

* refactor: upscaler

* Refactor: fix runtime type override

* upscaler: use fp16 again

* Refactor: Flexible sd3 arch

* Refactor: Flexible Flux arch

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-30 14:18:53 +08:00
leejet
b5f4932696 refactor: add some sd vesion helper functions 2024-11-23 13:02:44 +08:00
Erik Scholz
1c168d98a5
fix: repair flash attention support (#386)
* repair flash attention in _ext
this does not fix the currently broken fa behind the define, which is only used by VAE

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>

* make flash attention in the diffusion model a runtime flag
no support for sd3 or video

* remove old flash attention option and switch vae over to attn_ext

* update docs

* format code

---------

Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 12:39:08 +08:00
bssrdf
2b1bc06477
feat: add PhotoMaker Version 2 support (#358)
* first attempt at updating to photomaker v2

* continue adding photomaker v2 modules

* finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file

* added a name converter for Photomaker V2; build ok

* more debugging underway

* failing at cuda mat_mul

* updated chunk_half to be more efficient; redo feedforward

* fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op

* redo weight calculation and weight*v

* fixed a bug now Photomaker V2 kinds of working

* add python script for face detection (Photomaker V2 needs)

* updated readme for photomaker

* fixed a bug causing PMV1 crashing; both V1 and V2 work

* fixed clean_input_ids for PMV2

* fixed a double counting bug in tokenize_with_trigger_token

* updated photomaker readme

* removed some commented code

* improved reconstructing class word free prompt

* changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server

* minor clean up

---------

Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-11-23 11:50:14 +08:00
LostRuins Concedo
8f94efafa3
feat: add support for loading F8_E5M2 weights (#460) 2024-11-23 11:45:11 +08:00
stduhpf
6ea812256e
feat: add flux 1 lite 8B (freepik) support (#474)
* Flux Lite (Freepik) support

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 11:41:30 +08:00
stduhpf
65fa646684
feat: add sd3.5 medium and skip layer guidance support (#451)
* mmdit-x

* add support for sd3.5 medium

* add skip layer guidance support (mmdit only)

* ignore slg if slg_scale is zero (optimization)

* init out_skip once

* slg support for flux (expermiental)

* warn if version doesn't support slg

* refactor slg cli args

* set default slg_scale to 0 (oops)

* format code

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-11-23 11:15:31 +08:00
leejet
ac54e00760
feat: add sd3.5 support (#445) 2024-10-24 21:58:03 +08:00
leejet
79c9fe9556 feat: do not convert some tensors 2024-08-25 13:37:37 +08:00
leejet
c837c5d9cc style: format code 2024-08-25 00:19:37 +08:00
leejet
64d231f384
feat: add flux support (#356)
* add flux support

* avoid build failures in non-CUDA environments

* fix schnell support

* add k quants support

* add support for applying lora to quantized tensors

* add inplace conversion support for f8_e4m3 (#359)

in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16

* add xlabs flux comfy converted lora support

* update docs

---------

Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
2024-08-24 14:29:52 +08:00
leejet
4a6e36edc5 sync: update ggml 2024-07-28 18:30:35 +08:00
leejet
73c2176648
feat: add sd3 support (#298) 2024-07-28 15:44:08 +08:00
bssrdf
a469688e30
feat: add TencentARC PhotoMaker support (#179)
* first efforts at implementing photomaker; lots more to do

* added PhotoMakerIDEncoder model in SD

* fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers

* added input id image loading

* added preprocessing inpit id images

* finished get_num_tensors

* fixed a bug in remove_duplicates

* add a get_learned_condition_with_trigger function to do photomaker stuff

* add a convert_token_to_id function for photomaker to extract trigger word's token id

* making progress; need to implement tokenizer decoder

* making more progress; finishing vision model forward

* debugging vision_model outputs

* corrected clip vision model output

* continue making progress in id fusion process

* finished stacked id embedding; to be tested

* remove garbage file

* debuging graph compute

* more progress; now alloc buffer failed

* fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated)

* added delayed subject conditioning; now photomaker runs and generates images

* fixed stat_merge_step

* added photomaker lora model (to be tested)

* reworked pmid lora

* finished applying pmid lora; to be tested

* finalized pmid lora

* add a few print tensor; tweak in sample again

* small tweak; still not getting ID faces

* fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results

* disable pmid lora apply for now; 1 input image seems working; > 1 not working

* turn pmid lora apply back on

* fixed a decode bug

* fixed a bug in ggml's conv_2d, and now > 1 input images working

* add style_ratio as a cli param; reworked encode with trigger for attention weights

* merge commit fixing lora free param buffer error

* change default style ratio to 10%

* added an option to offload vae decoder to CPU for mem-limited gpus

* removing image normalization step seems making ID fidelity much higher

* revert default style ratio back ro 20%

* added an option for normalizing input ID images; cleaned up debugging code

* more clean up

* fixed bugs; now failed with cuda error; likely out-of-mem on GPU

* free pmid model params when required

* photomaker working properly now after merging and adapting to GGMLBlock API

* remove tensor renaming;  fixing names in the photomaker model file

* updated README.md to include instructions and notes for running PhotoMaker

* a bit clean up

* remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update

* add input image requirement in README

* bring back freeing pmid lora params buffer; simply pooled output of CLIPvision

* remove MultiheadAttention2; customized MultiheadAttention

* added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images

* update docs

* fix ci error

* make stable-diffusion.h a pure c header file

This reverts commit 27887b630db6a92f269f0aef8de9bc9832ab50a9.

* fix ci error

* format code

* reuse get_learned_condition

* reuse pad_tokens

* reuse CLIPVisionModel

* reuse LoraModel

* add --clip-on-cpu

* fix lora name conversion for SDXL

---------

Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-03-12 23:15:17 +08:00
leejet
b6368868d9
feat: introduce GGMLBlock and implement SVD(Broken) (#159)
* introduce GGMLBlock and implement SVD(Broken)

* add sdxl vae warning
2024-02-24 20:06:39 +08:00
Steward Garcia
36ec16ac99
feat: Control Net support + Textual Inversion (embeddings) (#131)
* add controlnet to pipeline

* add cli params

* control strength cli param

* cli param keep controlnet in cpu

* add Textual Inversion

* add canny preprocessor

* refactor: change ggml_type_sizef to ggml_row_size

* process hint once time

* ignore the embedding name case

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-01-29 22:38:51 +08:00
leejet
5c614e4bc2
feat: add convert api (#142) 2024-01-14 11:43:24 +08:00
leejet
7cb41b190f fix: avoid encountering 'std::set undefined' in some environments 2024-01-02 22:37:01 +08:00
leejet
2e79a82f85
refactor: reorganize code and use c api (#133) 2024-01-01 16:22:18 +08:00
Steward Garcia
004dfbef27
feat: implement ESRGAN upscaler + Metal Backend (#104)
* add esrgan upscaler

* add sd_tiling

* support metal backend

* add clip_skip

---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-12-28 23:46:48 +08:00
leejet
8f6b4a39d6
fix: enhance the tokenizer's handing of Unicode (#120) 2023-12-21 00:22:03 +08:00
leejet
69efe3ce2b chore: make code cleaner 2023-12-09 17:35:10 +08:00
Steward Garcia
134883aec4
feat: add TAESD implementation - faster autoencoder (#88)
* add taesd implementation

* taesd gpu offloading

* show seed when generating image with -s -1

* less restrictive with larger images

* cuda: im2col speedup x2

* cuda: group norm speedup x90

* quantized models now works in cuda :)

* fix cal mem size

---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-12-05 22:40:03 +08:00
leejet
8a87b273ad fix: allow model and vae using different format 2023-12-03 17:12:04 +08:00
leejet
d7af2c2ba9
feat: load weights from safetensors and ckpt (#101) 2023-12-03 15:47:20 +08:00