* mmdit-x
* add support for sd3.5 medium
* add skip layer guidance support (mmdit only)
* ignore slg if slg_scale is zero (optimization)
* init out_skip once
* slg support for flux (expermiental)
* warn if version doesn't support slg
* refactor slg cli args
* set default slg_scale to 0 (oops)
* format code
---------
Co-authored-by: leejet <leejet714@gmail.com>
* fix and improve: VAE tiling
- properly handle the upper left corner interpolating both x and y
- refactor out lerp
- use smootherstep to preserve more detail and spend less area blending
* actually fix vae tile merging
Co-authored-by: stduhpf <stephduh@live.fr>
* remove the now unused lerp function
---------
Co-authored-by: stduhpf <stephduh@live.fr>
* Fix includes and init vulkan the same as llama.cpp
* Add Windows Vulkan CI
* Updated ggml submodule
* support epsilon as a parameter for ggml_group_norm
---------
Co-authored-by: Cloudwalk <cloudwalk@icculus.org>
Co-authored-by: Oleg Skutte <00.00.oleg.00.00@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
* add flux support
* avoid build failures in non-CUDA environments
* fix schnell support
* add k quants support
* add support for applying lora to quantized tensors
* add inplace conversion support for f8_e4m3 (#359)
in the same way it is done for bf16
like how bf16 converts losslessly to fp32,
f8_e4m3 converts losslessly to fp16
* add xlabs flux comfy converted lora support
* update docs
---------
Co-authored-by: Erik Scholz <Green-Sky@users.noreply.github.com>
This change makes checkpoints load significantly faster by optimizing
pkzip's cyclic redundancy check. This code was developed by Intel and
Google and Mozilla. See Chromium's zlib codebase for further details.
Added NVIDEA's new "Align Your Steps" style scheduler in accordance with their
quick start guide. Currently has handling for SD1.5, SDXL, and SVD, using the
noise levels from their paper to generate the sigma values. Can be selected
using the --schedule ays command line switch. Updates the main.cpp help
message and README to reflect this option, also they now inform the user
of the --color switch as well.
---------
Co-authored-by: leejet <leejet714@gmail.com>
* apply pmid lora only once for multiple txt2img calls
* add better support for SDXL LoRA
* fix for some sdxl lora, like lcm-lora-xl
---------
Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
* Support png image and resize image with 64 pixels in img2img mode
* update the error information
---------
Co-authored-by: leejet <leejet714@gmail.com>
* Fixed a double free issue when running multiple backends on the CPU, eg: CLIP
and the primary backend, as this would result in the *_backend pointers both
pointing to the same thing resulting in a segfault when calling the
StableDiffusionGGML destructor.
* Improve logging to allow for a color switch on the command line interface.
Changed the base log_printf function to not bake the log level directly into
the log buffer as that information is already passed the logging function via
the level parameter and it's easier to add in there than strip it out.
* Added a fix for certain SDXL LoRAs that don't seem to follow the expected
naming convention, converts over the tensor name during the LoRA model
loading. Added some logging of useful LoRA loading information. Had to
increase the base size of the GGML graph as the existing size results in an
insufficient graph memory error when using SDXL LoRAs.
* small fixes
---------
Co-authored-by: leejet <leejet714@gmail.com>