feat: add TAESD implementation - faster autoencoder (#88)
* add taesd implementation * taesd gpu offloading * show seed when generating image with -s -1 * less restrictive with larger images * cuda: im2col speedup x2 * cuda: group norm speedup x90 * quantized models now works in cuda :) * fix cal mem size --------- Co-authored-by: leejet <leejet714@gmail.com>
This commit is contained in:
parent
f99bcd1f76
commit
134883aec4
5
.gitignore
vendored
5
.gitignore
vendored
@ -8,6 +8,7 @@ test/
|
|||||||
*.bin
|
*.bin
|
||||||
*.exe
|
*.exe
|
||||||
*.gguf
|
*.gguf
|
||||||
|
output*.png
|
||||||
|
models*
|
||||||
|
!taesd-model.gguf
|
||||||
*.log
|
*.log
|
||||||
output.png
|
|
||||||
models/
|
|
35
README.md
35
README.md
@ -9,22 +9,23 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
|
|||||||
## Features
|
## Features
|
||||||
|
|
||||||
- Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
- Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
||||||
- Super lightweight and without external dependencies.
|
- Super lightweight and without external dependencies
|
||||||
- SD1.x and SD2.x support
|
- SD1.x and SD2.x support
|
||||||
- 16-bit, 32-bit float support
|
- 16-bit, 32-bit float support
|
||||||
- 4-bit, 5-bit and 8-bit integer quantization support
|
- 4-bit, 5-bit and 8-bit integer quantization support
|
||||||
- Accelerated memory-efficient CPU inference
|
- Accelerated memory-efficient CPU inference
|
||||||
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
|
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
|
||||||
- AVX, AVX2 and AVX512 support for x86 architectures
|
- AVX, AVX2 and AVX512 support for x86 architectures
|
||||||
- Full CUDA backend for GPU acceleration, for now just for float16 and float32 models. There are some issues with quantized models and CUDA; it will be fixed in the future.
|
- Full CUDA backend for GPU acceleration.
|
||||||
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models.
|
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models
|
||||||
- No need to convert to `.ggml` or `.gguf` anymore!
|
- No need to convert to `.ggml` or `.gguf` anymore!
|
||||||
- Flash Attention for memory usage optimization (only cpu for now).
|
- Flash Attention for memory usage optimization (only cpu for now)
|
||||||
- Original `txt2img` and `img2img` mode
|
- Original `txt2img` and `img2img` mode
|
||||||
- Negative prompt
|
- Negative prompt
|
||||||
- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now)
|
- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now)
|
||||||
- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
|
- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
|
||||||
- Latent Consistency Models support (LCM/LCM-LoRA)
|
- Latent Consistency Models support (LCM/LCM-LoRA)
|
||||||
|
- Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd)
|
||||||
- Sampling method
|
- Sampling method
|
||||||
- `Euler A`
|
- `Euler A`
|
||||||
- `Euler`
|
- `Euler`
|
||||||
@ -47,9 +48,10 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
|
|||||||
- [ ] More sampling methods
|
- [ ] More sampling methods
|
||||||
- [ ] Make inference faster
|
- [ ] Make inference faster
|
||||||
- The current implementation of ggml_conv_2d is slow and has high memory usage
|
- The current implementation of ggml_conv_2d is slow and has high memory usage
|
||||||
|
- Implement Winograd Convolution 2D for 3x3 kernel filtering
|
||||||
- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
|
- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
|
||||||
- [ ] Implement BPE Tokenizer
|
- [ ] Implement BPE Tokenizer
|
||||||
- [ ] Add [TAESD](https://github.com/madebyollin/taesd) for faster VAE decoding
|
- [ ] Implement [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN/tree/master) upscaler
|
||||||
- [ ] k-quants support
|
- [ ] k-quants support
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
@ -122,7 +124,7 @@ cmake --build . --config Release
|
|||||||
### Run
|
### Run
|
||||||
|
|
||||||
```
|
```
|
||||||
usage: ./bin/sd [arguments]
|
usage: sd [arguments]
|
||||||
|
|
||||||
arguments:
|
arguments:
|
||||||
-h, --help show this help message and exit
|
-h, --help show this help message and exit
|
||||||
@ -131,8 +133,10 @@ arguments:
|
|||||||
If threads <= 0, then threads will be set to the number of CPU physical cores
|
If threads <= 0, then threads will be set to the number of CPU physical cores
|
||||||
-m, --model [MODEL] path to model
|
-m, --model [MODEL] path to model
|
||||||
--vae [VAE] path to vae
|
--vae [VAE] path to vae
|
||||||
|
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
|
||||||
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
|
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
|
||||||
If not specified, the default is the type of the weight file. --lora-model-dir [DIR] lora model directory
|
If not specified, the default is the type of the weight file.
|
||||||
|
--lora-model-dir [DIR] lora model directory
|
||||||
-i, --init-img [IMAGE] path to the input image, required by img2img
|
-i, --init-img [IMAGE] path to the input image, required by img2img
|
||||||
-o, --output OUTPUT path to write result image to (default: ./output.png)
|
-o, --output OUTPUT path to write result image to (default: ./output.png)
|
||||||
-p, --prompt [PROMPT] the prompt to render
|
-p, --prompt [PROMPT] the prompt to render
|
||||||
@ -218,6 +222,23 @@ Here's a simple example:
|
|||||||
| ---- |---- |
|
| ---- |---- |
|
||||||
|  | |
|
|  | |
|
||||||
|
|
||||||
|
## Using TAESD to faster decoding
|
||||||
|
|
||||||
|
You can use TAESD to accelerate the decoding of latent images by following these steps:
|
||||||
|
|
||||||
|
- Download the model [weights](https://huggingface.co/madebyollin/taesd/blob/main/diffusion_pytorch_model.safetensors).
|
||||||
|
|
||||||
|
Or curl
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -L -O https://huggingface.co/madebyollin/taesd/blob/main/diffusion_pytorch_model.safetensors
|
||||||
|
```
|
||||||
|
|
||||||
|
- Specify the model path using the `--taesd PATH` parameter. example:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --taesd ../models/diffusion_pytorch_model.safetensors
|
||||||
|
```
|
||||||
|
|
||||||
### Docker
|
### Docker
|
||||||
|
|
||||||
|
24596
common/json.hpp
24596
common/json.hpp
File diff suppressed because it is too large
Load Diff
10130
common/miniz.h
10130
common/miniz.h
File diff suppressed because it is too large
Load Diff
7987
common/stb_image.h
7987
common/stb_image.h
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
1836
common/zip.c
1836
common/zip.c
File diff suppressed because it is too large
Load Diff
509
common/zip.h
509
common/zip.h
@ -1,509 +0,0 @@
|
|||||||
/*
|
|
||||||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
|
||||||
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
|
||||||
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
|
||||||
* IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
|
||||||
* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
|
||||||
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
|
||||||
* OTHER DEALINGS IN THE SOFTWARE.
|
|
||||||
*/
|
|
||||||
|
|
||||||
#pragma once
|
|
||||||
#ifndef ZIP_H
|
|
||||||
#define ZIP_H
|
|
||||||
|
|
||||||
#include <stdint.h>
|
|
||||||
#include <string.h>
|
|
||||||
#include <sys/types.h>
|
|
||||||
|
|
||||||
#ifndef ZIP_SHARED
|
|
||||||
#define ZIP_EXPORT
|
|
||||||
#else
|
|
||||||
#ifdef _WIN32
|
|
||||||
#ifdef ZIP_BUILD_SHARED
|
|
||||||
#define ZIP_EXPORT __declspec(dllexport)
|
|
||||||
#else
|
|
||||||
#define ZIP_EXPORT __declspec(dllimport)
|
|
||||||
#endif
|
|
||||||
#else
|
|
||||||
#define ZIP_EXPORT __attribute__((visibility("default")))
|
|
||||||
#endif
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#ifdef __cplusplus
|
|
||||||
extern "C" {
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#if !defined(_POSIX_C_SOURCE) && defined(_MSC_VER)
|
|
||||||
// 64-bit Windows is the only mainstream platform
|
|
||||||
// where sizeof(long) != sizeof(void*)
|
|
||||||
#ifdef _WIN64
|
|
||||||
typedef long long ssize_t; /* byte count or error */
|
|
||||||
#else
|
|
||||||
typedef long ssize_t; /* byte count or error */
|
|
||||||
#endif
|
|
||||||
#endif
|
|
||||||
|
|
||||||
/**
|
|
||||||
* @mainpage
|
|
||||||
*
|
|
||||||
* Documentation for @ref zip.
|
|
||||||
*/
|
|
||||||
|
|
||||||
/**
|
|
||||||
* @addtogroup zip
|
|
||||||
* @{
|
|
||||||
*/
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Default zip compression level.
|
|
||||||
*/
|
|
||||||
#define ZIP_DEFAULT_COMPRESSION_LEVEL 6
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Error codes
|
|
||||||
*/
|
|
||||||
#define ZIP_ENOINIT -1 // not initialized
|
|
||||||
#define ZIP_EINVENTNAME -2 // invalid entry name
|
|
||||||
#define ZIP_ENOENT -3 // entry not found
|
|
||||||
#define ZIP_EINVMODE -4 // invalid zip mode
|
|
||||||
#define ZIP_EINVLVL -5 // invalid compression level
|
|
||||||
#define ZIP_ENOSUP64 -6 // no zip 64 support
|
|
||||||
#define ZIP_EMEMSET -7 // memset error
|
|
||||||
#define ZIP_EWRTENT -8 // cannot write data to entry
|
|
||||||
#define ZIP_ETDEFLINIT -9 // cannot initialize tdefl compressor
|
|
||||||
#define ZIP_EINVIDX -10 // invalid index
|
|
||||||
#define ZIP_ENOHDR -11 // header not found
|
|
||||||
#define ZIP_ETDEFLBUF -12 // cannot flush tdefl buffer
|
|
||||||
#define ZIP_ECRTHDR -13 // cannot create entry header
|
|
||||||
#define ZIP_EWRTHDR -14 // cannot write entry header
|
|
||||||
#define ZIP_EWRTDIR -15 // cannot write to central dir
|
|
||||||
#define ZIP_EOPNFILE -16 // cannot open file
|
|
||||||
#define ZIP_EINVENTTYPE -17 // invalid entry type
|
|
||||||
#define ZIP_EMEMNOALLOC -18 // extracting data using no memory allocation
|
|
||||||
#define ZIP_ENOFILE -19 // file not found
|
|
||||||
#define ZIP_ENOPERM -20 // no permission
|
|
||||||
#define ZIP_EOOMEM -21 // out of memory
|
|
||||||
#define ZIP_EINVZIPNAME -22 // invalid zip archive name
|
|
||||||
#define ZIP_EMKDIR -23 // make dir error
|
|
||||||
#define ZIP_ESYMLINK -24 // symlink error
|
|
||||||
#define ZIP_ECLSZIP -25 // close archive error
|
|
||||||
#define ZIP_ECAPSIZE -26 // capacity size too small
|
|
||||||
#define ZIP_EFSEEK -27 // fseek error
|
|
||||||
#define ZIP_EFREAD -28 // fread error
|
|
||||||
#define ZIP_EFWRITE -29 // fwrite error
|
|
||||||
#define ZIP_ERINIT -30 // cannot initialize reader
|
|
||||||
#define ZIP_EWINIT -31 // cannot initialize writer
|
|
||||||
#define ZIP_EWRINIT -32 // cannot initialize writer from reader
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Looks up the error message string corresponding to an error number.
|
|
||||||
* @param errnum error number
|
|
||||||
* @return error message string corresponding to errnum or NULL if error is not
|
|
||||||
* found.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT const char *zip_strerror(int errnum);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* @struct zip_t
|
|
||||||
*
|
|
||||||
* This data structure is used throughout the library to represent zip archive -
|
|
||||||
* forward declaration.
|
|
||||||
*/
|
|
||||||
struct zip_t;
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Opens zip archive with compression level using the given mode.
|
|
||||||
*
|
|
||||||
* @param zipname zip archive file name.
|
|
||||||
* @param level compression level (0-9 are the standard zlib-style levels).
|
|
||||||
* @param mode file access mode.
|
|
||||||
* - 'r': opens a file for reading/extracting (the file must exists).
|
|
||||||
* - 'w': creates an empty file for writing.
|
|
||||||
* - 'a': appends to an existing archive.
|
|
||||||
*
|
|
||||||
* @return the zip archive handler or NULL on error
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT struct zip_t *zip_open(const char *zipname, int level,
|
|
||||||
char mode);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Opens zip archive with compression level using the given mode.
|
|
||||||
* The function additionally returns @param errnum -
|
|
||||||
*
|
|
||||||
* @param zipname zip archive file name.
|
|
||||||
* @param level compression level (0-9 are the standard zlib-style levels).
|
|
||||||
* @param mode file access mode.
|
|
||||||
* - 'r': opens a file for reading/extracting (the file must exists).
|
|
||||||
* - 'w': creates an empty file for writing.
|
|
||||||
* - 'a': appends to an existing archive.
|
|
||||||
* @param errnum 0 on success, negative number (< 0) on error.
|
|
||||||
*
|
|
||||||
* @return the zip archive handler or NULL on error
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT struct zip_t *
|
|
||||||
zip_openwitherror(const char *zipname, int level, char mode, int *errnum);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Closes the zip archive, releases resources - always finalize.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT void zip_close(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Determines if the archive has a zip64 end of central directory headers.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the return code - 1 (true), 0 (false), negative number (< 0) on
|
|
||||||
* error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_is64(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Opens an entry by name in the zip archive.
|
|
||||||
*
|
|
||||||
* For zip archive opened in 'w' or 'a' mode the function will append
|
|
||||||
* a new entry. In readonly mode the function tries to locate the entry
|
|
||||||
* in global dictionary.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param entryname an entry name in local dictionary.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_open(struct zip_t *zip, const char *entryname);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Opens an entry by name in the zip archive.
|
|
||||||
*
|
|
||||||
* For zip archive opened in 'w' or 'a' mode the function will append
|
|
||||||
* a new entry. In readonly mode the function tries to locate the entry
|
|
||||||
* in global dictionary (case sensitive).
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param entryname an entry name in local dictionary (case sensitive).
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_opencasesensitive(struct zip_t *zip,
|
|
||||||
const char *entryname);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Opens a new entry by index in the zip archive.
|
|
||||||
*
|
|
||||||
* This function is only valid if zip archive was opened in 'r' (readonly) mode.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param index index in local dictionary.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_openbyindex(struct zip_t *zip, size_t index);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Closes a zip entry, flushes buffer and releases resources.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_close(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Returns a local name of the current zip entry.
|
|
||||||
*
|
|
||||||
* The main difference between user's entry name and local entry name
|
|
||||||
* is optional relative path.
|
|
||||||
* Following .ZIP File Format Specification - the path stored MUST not contain
|
|
||||||
* a drive or device letter, or a leading slash.
|
|
||||||
* All slashes MUST be forward slashes '/' as opposed to backwards slashes '\'
|
|
||||||
* for compatibility with Amiga and UNIX file systems etc.
|
|
||||||
*
|
|
||||||
* @param zip: zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the pointer to the current zip entry name, or NULL on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT const char *zip_entry_name(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Returns an index of the current zip entry.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the index on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT ssize_t zip_entry_index(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Determines if the current zip entry is a directory entry.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the return code - 1 (true), 0 (false), negative number (< 0) on
|
|
||||||
* error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_isdir(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Returns the uncompressed size of the current zip entry.
|
|
||||||
* Alias for zip_entry_uncomp_size (for backward compatibility).
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the uncompressed size in bytes.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT unsigned long long zip_entry_size(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Returns the uncompressed size of the current zip entry.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the uncompressed size in bytes.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT unsigned long long zip_entry_uncomp_size(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Returns the compressed size of the current zip entry.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the compressed size in bytes.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT unsigned long long zip_entry_comp_size(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Returns CRC-32 checksum of the current zip entry.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the CRC-32 checksum.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT unsigned int zip_entry_crc32(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Compresses an input buffer for the current zip entry.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param buf input buffer.
|
|
||||||
* @param bufsize input buffer size (in bytes).
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_write(struct zip_t *zip, const void *buf,
|
|
||||||
size_t bufsize);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Compresses a file for the current zip entry.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param filename input file.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_fwrite(struct zip_t *zip, const char *filename);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Extracts the current zip entry into output buffer.
|
|
||||||
*
|
|
||||||
* The function allocates sufficient memory for a output buffer.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param buf output buffer.
|
|
||||||
* @param bufsize output buffer size (in bytes).
|
|
||||||
*
|
|
||||||
* @note remember to release memory allocated for a output buffer.
|
|
||||||
* for large entries, please take a look at zip_entry_extract function.
|
|
||||||
*
|
|
||||||
* @return the return code - the number of bytes actually read on success.
|
|
||||||
* Otherwise a negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT ssize_t zip_entry_read(struct zip_t *zip, void **buf,
|
|
||||||
size_t *bufsize);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Extracts the current zip entry into a memory buffer using no memory
|
|
||||||
* allocation.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param buf preallocated output buffer.
|
|
||||||
* @param bufsize output buffer size (in bytes).
|
|
||||||
*
|
|
||||||
* @note ensure supplied output buffer is large enough.
|
|
||||||
* zip_entry_size function (returns uncompressed size for the current
|
|
||||||
* entry) can be handy to estimate how big buffer is needed.
|
|
||||||
* For large entries, please take a look at zip_entry_extract function.
|
|
||||||
*
|
|
||||||
* @return the return code - the number of bytes actually read on success.
|
|
||||||
* Otherwise a negative number (< 0) on error (e.g. bufsize is not large
|
|
||||||
* enough).
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT ssize_t zip_entry_noallocread(struct zip_t *zip, void *buf,
|
|
||||||
size_t bufsize);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Extracts the current zip entry into output file.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param filename output file.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_entry_fread(struct zip_t *zip, const char *filename);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Extracts the current zip entry using a callback function (on_extract).
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param on_extract callback function.
|
|
||||||
* @param arg opaque pointer (optional argument, which you can pass to the
|
|
||||||
* on_extract callback)
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int
|
|
||||||
zip_entry_extract(struct zip_t *zip,
|
|
||||||
size_t (*on_extract)(void *arg, uint64_t offset,
|
|
||||||
const void *data, size_t size),
|
|
||||||
void *arg);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Returns the number of all entries (files and directories) in the zip archive.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return the return code - the number of entries on success, negative number
|
|
||||||
* (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT ssize_t zip_entries_total(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Deletes zip archive entries.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param entries array of zip archive entries to be deleted.
|
|
||||||
* @param len the number of entries to be deleted.
|
|
||||||
* @return the number of deleted entries, or negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT ssize_t zip_entries_delete(struct zip_t *zip,
|
|
||||||
char *const entries[], size_t len);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Extracts a zip archive stream into directory.
|
|
||||||
*
|
|
||||||
* If on_extract is not NULL, the callback will be called after
|
|
||||||
* successfully extracted each zip entry.
|
|
||||||
* Returning a negative value from the callback will cause abort and return an
|
|
||||||
* error. The last argument (void *arg) is optional, which you can use to pass
|
|
||||||
* data to the on_extract callback.
|
|
||||||
*
|
|
||||||
* @param stream zip archive stream.
|
|
||||||
* @param size stream size.
|
|
||||||
* @param dir output directory.
|
|
||||||
* @param on_extract on extract callback.
|
|
||||||
* @param arg opaque pointer.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int
|
|
||||||
zip_stream_extract(const char *stream, size_t size, const char *dir,
|
|
||||||
int (*on_extract)(const char *filename, void *arg),
|
|
||||||
void *arg);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Opens zip archive stream into memory.
|
|
||||||
*
|
|
||||||
* @param stream zip archive stream.
|
|
||||||
* @param size stream size.
|
|
||||||
* @param level compression level (0-9 are the standard zlib-style levels).
|
|
||||||
* @param mode file access mode.
|
|
||||||
* - 'r': opens a file for reading/extracting (the file must exists).
|
|
||||||
* - 'w': creates an empty file for writing.
|
|
||||||
* - 'a': appends to an existing archive.
|
|
||||||
*
|
|
||||||
* @return the zip archive handler or NULL on error
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT struct zip_t *zip_stream_open(const char *stream, size_t size,
|
|
||||||
int level, char mode);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Opens zip archive stream into memory.
|
|
||||||
* The function additionally returns @param errnum -
|
|
||||||
*
|
|
||||||
* @param stream zip archive stream.
|
|
||||||
* @param size stream size.*
|
|
||||||
* @param level compression level (0-9 are the standard zlib-style levels).
|
|
||||||
* @param mode file access mode.
|
|
||||||
* - 'r': opens a file for reading/extracting (the file must exists).
|
|
||||||
* - 'w': creates an empty file for writing.
|
|
||||||
* - 'a': appends to an existing archive.
|
|
||||||
* @param errnum 0 on success, negative number (< 0) on error.
|
|
||||||
*
|
|
||||||
* @return the zip archive handler or NULL on error
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT struct zip_t *zip_stream_openwitherror(const char *stream,
|
|
||||||
size_t size, int level,
|
|
||||||
char mode,
|
|
||||||
int *errnum);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Copy zip archive stream output buffer.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
* @param buf output buffer. User should free buf.
|
|
||||||
* @param bufsize output buffer size (in bytes).
|
|
||||||
*
|
|
||||||
* @return copy size
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT ssize_t zip_stream_copy(struct zip_t *zip, void **buf,
|
|
||||||
size_t *bufsize);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Close zip archive releases resources.
|
|
||||||
*
|
|
||||||
* @param zip zip archive handler.
|
|
||||||
*
|
|
||||||
* @return
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT void zip_stream_close(struct zip_t *zip);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Creates a new archive and puts files into a single zip archive.
|
|
||||||
*
|
|
||||||
* @param zipname zip archive file.
|
|
||||||
* @param filenames input files.
|
|
||||||
* @param len: number of input files.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_create(const char *zipname, const char *filenames[],
|
|
||||||
size_t len);
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Extracts a zip archive file into directory.
|
|
||||||
*
|
|
||||||
* If on_extract_entry is not NULL, the callback will be called after
|
|
||||||
* successfully extracted each zip entry.
|
|
||||||
* Returning a negative value from the callback will cause abort and return an
|
|
||||||
* error. The last argument (void *arg) is optional, which you can use to pass
|
|
||||||
* data to the on_extract_entry callback.
|
|
||||||
*
|
|
||||||
* @param zipname zip archive file.
|
|
||||||
* @param dir output directory.
|
|
||||||
* @param on_extract_entry on extract callback.
|
|
||||||
* @param arg opaque pointer.
|
|
||||||
*
|
|
||||||
* @return the return code - 0 on success, negative number (< 0) on error.
|
|
||||||
*/
|
|
||||||
extern ZIP_EXPORT int zip_extract(const char *zipname, const char *dir,
|
|
||||||
int (*on_extract_entry)(const char *filename,
|
|
||||||
void *arg),
|
|
||||||
void *arg);
|
|
||||||
/** @} */
|
|
||||||
#ifdef __cplusplus
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
#endif
|
|
@ -58,6 +58,7 @@ struct SDParams {
|
|||||||
|
|
||||||
std::string model_path;
|
std::string model_path;
|
||||||
std::string vae_path;
|
std::string vae_path;
|
||||||
|
std::string taesd_path;
|
||||||
ggml_type wtype = GGML_TYPE_COUNT;
|
ggml_type wtype = GGML_TYPE_COUNT;
|
||||||
std::string lora_model_dir;
|
std::string lora_model_dir;
|
||||||
std::string output_path = "output.png";
|
std::string output_path = "output.png";
|
||||||
@ -86,6 +87,7 @@ void print_params(SDParams params) {
|
|||||||
printf(" model_path: %s\n", params.model_path.c_str());
|
printf(" model_path: %s\n", params.model_path.c_str());
|
||||||
printf(" wtype: %s\n", params.wtype < GGML_TYPE_COUNT ? ggml_type_name(params.wtype) : "unspecified");
|
printf(" wtype: %s\n", params.wtype < GGML_TYPE_COUNT ? ggml_type_name(params.wtype) : "unspecified");
|
||||||
printf(" vae_path: %s\n", params.vae_path.c_str());
|
printf(" vae_path: %s\n", params.vae_path.c_str());
|
||||||
|
printf(" taesd_path: %s\n", params.taesd_path.c_str());
|
||||||
printf(" output_path: %s\n", params.output_path.c_str());
|
printf(" output_path: %s\n", params.output_path.c_str());
|
||||||
printf(" init_img: %s\n", params.input_path.c_str());
|
printf(" init_img: %s\n", params.input_path.c_str());
|
||||||
printf(" prompt: %s\n", params.prompt.c_str());
|
printf(" prompt: %s\n", params.prompt.c_str());
|
||||||
@ -112,8 +114,9 @@ void print_usage(int argc, const char* argv[]) {
|
|||||||
printf(" If threads <= 0, then threads will be set to the number of CPU physical cores\n");
|
printf(" If threads <= 0, then threads will be set to the number of CPU physical cores\n");
|
||||||
printf(" -m, --model [MODEL] path to model\n");
|
printf(" -m, --model [MODEL] path to model\n");
|
||||||
printf(" --vae [VAE] path to vae\n");
|
printf(" --vae [VAE] path to vae\n");
|
||||||
|
printf(" --taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)\n");
|
||||||
printf(" --type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)\n");
|
printf(" --type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)\n");
|
||||||
printf(" If not specified, the default is the type of the weight file.");
|
printf(" If not specified, the default is the type of the weight file.\n");
|
||||||
printf(" --lora-model-dir [DIR] lora model directory\n");
|
printf(" --lora-model-dir [DIR] lora model directory\n");
|
||||||
printf(" -i, --init-img [IMAGE] path to the input image, required by img2img\n");
|
printf(" -i, --init-img [IMAGE] path to the input image, required by img2img\n");
|
||||||
printf(" -o, --output OUTPUT path to write result image to (default: ./output.png)\n");
|
printf(" -o, --output OUTPUT path to write result image to (default: ./output.png)\n");
|
||||||
@ -176,6 +179,12 @@ void parse_args(int argc, const char** argv, SDParams& params) {
|
|||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
params.vae_path = argv[i];
|
params.vae_path = argv[i];
|
||||||
|
} else if (arg == "--taesd") {
|
||||||
|
if (++i >= argc) {
|
||||||
|
invalid_arg = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
params.taesd_path = argv[i];
|
||||||
} else if (arg == "--type") {
|
} else if (arg == "--type") {
|
||||||
if (++i >= argc) {
|
if (++i >= argc) {
|
||||||
invalid_arg = true;
|
invalid_arg = true;
|
||||||
@ -449,7 +458,8 @@ int main(int argc, const char* argv[]) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
StableDiffusion sd(params.n_threads, vae_decode_only, true, params.lora_model_dir, params.rng_type);
|
StableDiffusion sd(params.n_threads, vae_decode_only, params.taesd_path, true, params.lora_model_dir, params.rng_type);
|
||||||
|
|
||||||
if (!sd.load_from_file(params.model_path, params.vae_path, params.wtype, params.schedule)) {
|
if (!sd.load_from_file(params.model_path, params.vae_path, params.wtype, params.schedule)) {
|
||||||
return 1;
|
return 1;
|
||||||
}
|
}
|
||||||
|
2
ggml
2
ggml
@ -1 +1 @@
|
|||||||
Subproject commit 03669ba9fdc5e0520e919e5c7e1b3a3359d28e59
|
Subproject commit 70474c6890c015b53dc10a2300ae35246cc73589
|
19
model.cpp
19
model.cpp
@ -1296,7 +1296,7 @@ bool ModelLoader::load_tensors(on_new_tensor_cb_t on_new_tensor_cb) {
|
|||||||
if (backend == NULL || ggml_backend_is_cpu(backend)) {
|
if (backend == NULL || ggml_backend_is_cpu(backend)) {
|
||||||
// for the CPU and Metal backend, we can copy directly into the tensor
|
// for the CPU and Metal backend, we can copy directly into the tensor
|
||||||
if (tensor_storage.type == dst_tensor->type) {
|
if (tensor_storage.type == dst_tensor->type) {
|
||||||
GGML_ASSERT(ggml_nbytes(dst_tensor) == nbytes_to_read);
|
GGML_ASSERT(ggml_nbytes(dst_tensor) == tensor_storage.nbytes());
|
||||||
read_data(tensor_storage, (char*)dst_tensor->data, nbytes_to_read);
|
read_data(tensor_storage, (char*)dst_tensor->data, nbytes_to_read);
|
||||||
|
|
||||||
if (tensor_storage.is_bf16) {
|
if (tensor_storage.is_bf16) {
|
||||||
@ -1349,16 +1349,23 @@ bool ModelLoader::load_tensors(on_new_tensor_cb_t on_new_tensor_cb) {
|
|||||||
return success;
|
return success;
|
||||||
}
|
}
|
||||||
|
|
||||||
int64_t ModelLoader::cal_mem_size() {
|
int64_t ModelLoader::cal_mem_size(ggml_backend_t backend) {
|
||||||
|
size_t alignment = 128;
|
||||||
|
if (backend != NULL) {
|
||||||
|
alignment = ggml_backend_get_alignment(backend);
|
||||||
|
}
|
||||||
int64_t mem_size = 0;
|
int64_t mem_size = 0;
|
||||||
|
std::vector<TensorStorage> processed_tensor_storages;
|
||||||
for (auto& tensor_storage : tensor_storages) {
|
for (auto& tensor_storage : tensor_storages) {
|
||||||
if (is_unused_tensor(tensor_storage.name)) {
|
if (is_unused_tensor(tensor_storage.name)) {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
preprocess_tensor(tensor_storage, processed_tensor_storages);
|
||||||
mem_size += tensor_storage.nbytes();
|
|
||||||
mem_size += GGML_MEM_ALIGN * 2; // for lora alphas
|
|
||||||
}
|
}
|
||||||
|
|
||||||
return mem_size + 10 * 1024 * 1024;
|
for (auto& tensor_storage : processed_tensor_storages) {
|
||||||
|
mem_size += tensor_storage.nbytes() + alignment;
|
||||||
|
}
|
||||||
|
|
||||||
|
return mem_size;
|
||||||
}
|
}
|
||||||
|
3
model.h
3
model.h
@ -8,6 +8,7 @@
|
|||||||
#include <vector>
|
#include <vector>
|
||||||
|
|
||||||
#include "ggml/ggml.h"
|
#include "ggml/ggml.h"
|
||||||
|
#include "ggml/ggml-backend.h"
|
||||||
#include "json.hpp"
|
#include "json.hpp"
|
||||||
#include "zip.h"
|
#include "zip.h"
|
||||||
|
|
||||||
@ -116,7 +117,7 @@ public:
|
|||||||
ggml_type get_sd_wtype();
|
ggml_type get_sd_wtype();
|
||||||
bool load_vocab(on_new_token_cb_t on_new_token_cb);
|
bool load_vocab(on_new_token_cb_t on_new_token_cb);
|
||||||
bool load_tensors(on_new_tensor_cb_t on_new_tensor_cb);
|
bool load_tensors(on_new_tensor_cb_t on_new_tensor_cb);
|
||||||
int64_t cal_mem_size();
|
int64_t cal_mem_size(ggml_backend_t backend);
|
||||||
~ModelLoader() = default;
|
~ModelLoader() = default;
|
||||||
};
|
};
|
||||||
#endif // __MODEL_H__
|
#endif // __MODEL_H__
|
File diff suppressed because it is too large
Load Diff
@ -38,6 +38,7 @@ private:
|
|||||||
public:
|
public:
|
||||||
StableDiffusion(int n_threads = -1,
|
StableDiffusion(int n_threads = -1,
|
||||||
bool vae_decode_only = false,
|
bool vae_decode_only = false,
|
||||||
|
std::string taesd_path = "",
|
||||||
bool free_params_immediately = false,
|
bool free_params_immediately = false,
|
||||||
std::string lora_model_dir = "",
|
std::string lora_model_dir = "",
|
||||||
RNGType rng_type = STD_DEFAULT_RNG);
|
RNGType rng_type = STD_DEFAULT_RNG);
|
||||||
|
Loading…
Reference in New Issue
Block a user