Commit Graph

38 Commits

Author SHA1 Message Date
Phu Tran
607e39489f
docs: add Jellybox as UI using sd.cpp (#214) 2024-04-02 12:31:54 +08:00
bssrdf
a469688e30
feat: add TencentARC PhotoMaker support (#179)
* first efforts at implementing photomaker; lots more to do

* added PhotoMakerIDEncoder model in SD

* fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers

* added input id image loading

* added preprocessing inpit id images

* finished get_num_tensors

* fixed a bug in remove_duplicates

* add a get_learned_condition_with_trigger function to do photomaker stuff

* add a convert_token_to_id function for photomaker to extract trigger word's token id

* making progress; need to implement tokenizer decoder

* making more progress; finishing vision model forward

* debugging vision_model outputs

* corrected clip vision model output

* continue making progress in id fusion process

* finished stacked id embedding; to be tested

* remove garbage file

* debuging graph compute

* more progress; now alloc buffer failed

* fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated)

* added delayed subject conditioning; now photomaker runs and generates images

* fixed stat_merge_step

* added photomaker lora model (to be tested)

* reworked pmid lora

* finished applying pmid lora; to be tested

* finalized pmid lora

* add a few print tensor; tweak in sample again

* small tweak; still not getting ID faces

* fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results

* disable pmid lora apply for now; 1 input image seems working; > 1 not working

* turn pmid lora apply back on

* fixed a decode bug

* fixed a bug in ggml's conv_2d, and now > 1 input images working

* add style_ratio as a cli param; reworked encode with trigger for attention weights

* merge commit fixing lora free param buffer error

* change default style ratio to 10%

* added an option to offload vae decoder to CPU for mem-limited gpus

* removing image normalization step seems making ID fidelity much higher

* revert default style ratio back ro 20%

* added an option for normalizing input ID images; cleaned up debugging code

* more clean up

* fixed bugs; now failed with cuda error; likely out-of-mem on GPU

* free pmid model params when required

* photomaker working properly now after merging and adapting to GGMLBlock API

* remove tensor renaming;  fixing names in the photomaker model file

* updated README.md to include instructions and notes for running PhotoMaker

* a bit clean up

* remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update

* add input image requirement in README

* bring back freeing pmid lora params buffer; simply pooled output of CLIPvision

* remove MultiheadAttention2; customized MultiheadAttention

* added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images

* update docs

* fix ci error

* make stable-diffusion.h a pure c header file

This reverts commit 27887b630db6a92f269f0aef8de9bc9832ab50a9.

* fix ci error

* format code

* reuse get_learned_condition

* reuse pad_tokens

* reuse CLIPVisionModel

* reuse LoraModel

* add --clip-on-cpu

* fix lora name conversion for SDXL

---------

Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-03-12 23:15:17 +08:00
Cyberhan123
583cc5bba2
docs: add binding (#189) 2024-03-03 13:27:07 +08:00
Sean Bailey
193fb620b1
feat: add capability to repeatedly run the upscaler in a row (#174)
* Add in upscale repeater logic

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-02-24 21:31:01 +08:00
leejet
b6368868d9
feat: introduce GGMLBlock and implement SVD(Broken) (#159)
* introduce GGMLBlock and implement SVD(Broken)

* add sdxl vae warning
2024-02-24 20:06:39 +08:00
Steward Garcia
36ec16ac99
feat: Control Net support + Textual Inversion (embeddings) (#131)
* add controlnet to pipeline

* add cli params

* control strength cli param

* cli param keep controlnet in cpu

* add Textual Inversion

* add canny preprocessor

* refactor: change ggml_type_sizef to ggml_row_size

* process hint once time

* ignore the embedding name case

---------

Co-authored-by: leejet <leejet714@gmail.com>
2024-01-29 22:38:51 +08:00
旺旺碎冰冰
c6071fa82f
feat: add hipBlas support (#94) 2024-01-14 11:53:42 +08:00
leejet
5c614e4bc2
feat: add convert api (#142) 2024-01-14 11:43:24 +08:00
leejet
b139434b57 docs: update README.md 2023-12-31 11:48:41 +08:00
leejet
78ad76f3f4
feat: add SDXL support (#117)
* add SDXL support

* fix the issue with generating large images
2023-12-29 00:16:10 +08:00
Steward Garcia
004dfbef27
feat: implement ESRGAN upscaler + Metal Backend (#104)
* add esrgan upscaler

* add sd_tiling

* support metal backend

* add clip_skip

---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-12-28 23:46:48 +08:00
旺旺碎冰冰
0e64238e4c
feat: implement the complete bpe function (#119)
* implement the complete bpe function
---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-12-23 12:11:07 +08:00
leejet
ac8f5a044c feat: add SD-Turbo support 2023-12-10 13:15:09 +08:00
leejet
968226abb2 docs: update v2-1_768-nonema-pruned.safetensors url 2023-12-05 22:52:19 +08:00
Steward Garcia
134883aec4
feat: add TAESD implementation - faster autoencoder (#88)
* add taesd implementation

* taesd gpu offloading

* show seed when generating image with -s -1

* less restrictive with larger images

* cuda: im2col speedup x2

* cuda: group norm speedup x90

* quantized models now works in cuda :)

* fix cal mem size

---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-12-05 22:40:03 +08:00
leejet
d7af2c2ba9
feat: load weights from safetensors and ckpt (#101) 2023-12-03 15:47:20 +08:00
Steward Garcia
8124588cf1
feat: ggml-alloc integration and gpu acceleration (#75)
* set ggml url to FSSRepo/ggml

* ggml-alloc integration

* offload all functions to gpu

* gguf format + native converter

* merge custom vae to a model

* full offload to gpu

* improve pretty progress

---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-11-26 19:02:36 +08:00
leejet
64f6002457 docs: add contributors info to README.md 2023-11-19 18:35:19 +08:00
leejet
9a9f3daf8e feat: add LoRA support 2023-11-19 17:43:49 +08:00
leejet
536f3af672 feat: add lcm sampler support
This referenced an issue discussion of the stable-diffusion-webui at
https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/13952, which
may not be too perfect.
2023-11-17 22:53:46 +08:00
Robert Bledsaw
29a56f2e98
docs: update README.md (#71)
fixed type in curl command.
2023-10-22 13:03:32 +08:00
Urs Ganse
afec5051cf
feat: write generation parameter exif data into output png (#57)
* Write generation parameter exif data into output pngs.

This adds prompt, negative prompt (if nonempty) and other generation
parameters to the output file as a tEXt PNG block, in the same format as
AUTOMATIC1111 webui does.

In order to keep everything free of external library dependencies, I
have somewhat dirtily hacked this into the stb_image_write
implementation.

* Mention png text data in README.md, include "karras" in sampler text

* add Steps/Model/RNG to parameter string

---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-09-18 21:09:15 +08:00
Urs Ganse
3a25179d52
feat: add DPM2 and DPM++(2s) a samplers (#56)
* Add DPM2 sampler.

* Add DPM++ (2s) a sampler.

* Update README.md with added samplers

---------

Co-authored-by: leejet <leejet714@gmail.com>
2023-09-12 23:02:09 +08:00
Urs Ganse
b6899e8fc2
feat: add Euler, Heun and DPM++ (2M) samplers (#50)
* Add Euler sampler

* Add Heun sampler

* Add DPM++ (2M) sampler

* Add modified DPM++ (2M) "v2" sampler.

This was proposed in a issue discussion of the stable diffusion webui,
at https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/8457
and apparently works around overstepping of the DPM++ (2M) method with
small step counts.

The parameter is called dpmpp2mv2 here.

* match code style

---------

Co-authored-by: Urs Ganse <urs@nerd2nerd.org>
Co-authored-by: leejet <leejet714@gmail.com>
2023-09-08 23:47:28 +08:00
leejet
e5a7aec252 feat: add CUDA RNG 2023-09-03 19:24:07 +08:00
leejet
31e77e1573
feat: add SD2.x support (#40) 2023-09-03 16:00:33 +08:00
leejet
008d80a0b1 docs: update README.md 2023-08-25 20:59:18 +08:00
leejet
721cb324af chore: add sd Dockerfile 2023-08-22 22:14:20 +08:00
leejet
a393bebec8 docs: update README.md 2023-08-22 20:45:23 +08:00
Derek Anderson
76b9b2e9a2
docs: update README.md (#23) 2023-08-21 23:53:50 +08:00
leejet
17095dddea
feat: add token weighting support (#13) 2023-08-20 20:28:36 +08:00
leejet
7132027862
docs: update sd path 2023-08-17 23:44:56 +08:00
leejet
8f34dd7cc7 perf: free unused params immediately to reduce memory usage 2023-08-17 00:55:36 +08:00
leejet
cbee3c9a4f docs: update README.md 2023-08-16 22:26:15 +08:00
leejet
58735a2813
feat: add img2img mode (#5) 2023-08-16 01:48:07 +08:00
leejet
3265464090 docs: update README.md 2023-08-14 21:27:29 +08:00
leejet
f5d174f9ab docs: update README.md 2023-08-13 19:47:53 +08:00
leejet
3aca342e60 Initial commit 2023-08-13 16:00:22 +08:00