* add taesd implementation
* taesd gpu offloading
* show seed when generating image with -s -1
* less restrictive with larger images
* cuda: im2col speedup x2
* cuda: group norm speedup x90
* quantized models now works in cuda :)
* fix cal mem size
---------
Co-authored-by: leejet <leejet714@gmail.com>
* set ggml url to FSSRepo/ggml
* ggml-alloc integration
* offload all functions to gpu
* gguf format + native converter
* merge custom vae to a model
* full offload to gpu
* improve pretty progress
---------
Co-authored-by: leejet <leejet714@gmail.com>