* add taesd implementation
* taesd gpu offloading
* show seed when generating image with -s -1
* less restrictive with larger images
* cuda: im2col speedup x2
* cuda: group norm speedup x90
* quantized models now works in cuda :)
* fix cal mem size
---------
Co-authored-by: leejet <leejet714@gmail.com>
* set ggml url to FSSRepo/ggml
* ggml-alloc integration
* offload all functions to gpu
* gguf format + native converter
* merge custom vae to a model
* full offload to gpu
* improve pretty progress
---------
Co-authored-by: leejet <leejet714@gmail.com>
* Write generation parameter exif data into output pngs.
This adds prompt, negative prompt (if nonempty) and other generation
parameters to the output file as a tEXt PNG block, in the same format as
AUTOMATIC1111 webui does.
In order to keep everything free of external library dependencies, I
have somewhat dirtily hacked this into the stb_image_write
implementation.
* Mention png text data in README.md, include "karras" in sampler text
* add Steps/Model/RNG to parameter string
---------
Co-authored-by: leejet <leejet714@gmail.com>
Concretely, this allows switching to the "Karras" schedule from the
Karras et al 2022 paper, equivalent to the samplers marked as "Karras"
in the AUTOMATIC1111 WebUI. This choice is in principle orthogonal to
the sampler choice and can be given independently.
* Add Euler sampler
* Add Heun sampler
* Add DPM++ (2M) sampler
* Add modified DPM++ (2M) "v2" sampler.
This was proposed in a issue discussion of the stable diffusion webui,
at https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/8457
and apparently works around overstepping of the DPM++ (2M) method with
small step counts.
The parameter is called dpmpp2mv2 here.
* match code style
---------
Co-authored-by: Urs Ganse <urs@nerd2nerd.org>
Co-authored-by: leejet <leejet714@gmail.com>
* move main and stb-libs to subfolder
* cmake : general additions
* ci : add simple building
---------
Co-authored-by: leejet <31925346+leejet@users.noreply.github.com>