feat: add TAESD implementation - faster autoencoder (#88)

* add taesd implementation

* taesd gpu offloading

* show seed when generating image with -s -1

* less restrictive with larger images

* cuda: im2col speedup x2

* cuda: group norm speedup x90

* quantized models now works in cuda :)

* fix cal mem size

---------

Co-authored-by: leejet <leejet714@gmail.com>
This commit is contained in:
Steward Garcia 2023-12-05 09:40:03 -05:00 committed by GitHub
parent f99bcd1f76
commit 134883aec4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 908 additions and 46904 deletions

5
.gitignore vendored
View File

@ -8,6 +8,7 @@ test/
*.bin
*.exe
*.gguf
output*.png
models*
!taesd-model.gguf
*.log
output.png
models/

View File

@ -9,22 +9,23 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
## Features
- Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp)
- Super lightweight and without external dependencies.
- Super lightweight and without external dependencies
- SD1.x and SD2.x support
- 16-bit, 32-bit float support
- 4-bit, 5-bit and 8-bit integer quantization support
- Accelerated memory-efficient CPU inference
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~1.8GB.
- AVX, AVX2 and AVX512 support for x86 architectures
- Full CUDA backend for GPU acceleration, for now just for float16 and float32 models. There are some issues with quantized models and CUDA; it will be fixed in the future.
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models.
- Full CUDA backend for GPU acceleration.
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models
- No need to convert to `.ggml` or `.gguf` anymore!
- Flash Attention for memory usage optimization (only cpu for now).
- Flash Attention for memory usage optimization (only cpu for now)
- Original `txt2img` and `img2img` mode
- Negative prompt
- [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now)
- LoRA support, same as [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#lora)
- Latent Consistency Models support (LCM/LCM-LoRA)
- Faster and memory efficient latent decoding with [TAESD](https://github.com/madebyollin/taesd)
- Sampling method
- `Euler A`
- `Euler`
@ -47,9 +48,10 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
- [ ] More sampling methods
- [ ] Make inference faster
- The current implementation of ggml_conv_2d is slow and has high memory usage
- Implement Winograd Convolution 2D for 3x3 kernel filtering
- [ ] Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
- [ ] Implement BPE Tokenizer
- [ ] Add [TAESD](https://github.com/madebyollin/taesd) for faster VAE decoding
- [ ] Implement [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN/tree/master) upscaler
- [ ] k-quants support
## Usage
@ -122,7 +124,7 @@ cmake --build . --config Release
### Run
```
usage: ./bin/sd [arguments]
usage: sd [arguments]
arguments:
-h, --help show this help message and exit
@ -131,8 +133,10 @@ arguments:
If threads <= 0, then threads will be set to the number of CPU physical cores
-m, --model [MODEL] path to model
--vae [VAE] path to vae
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
If not specified, the default is the type of the weight file. --lora-model-dir [DIR] lora model directory
If not specified, the default is the type of the weight file.
--lora-model-dir [DIR] lora model directory
-i, --init-img [IMAGE] path to the input image, required by img2img
-o, --output OUTPUT path to write result image to (default: ./output.png)
-p, --prompt [PROMPT] the prompt to render
@ -218,6 +222,23 @@ Here's a simple example:
| ---- |---- |
| ![](./assets/without_lcm.png) |![](./assets/with_lcm.png) |
## Using TAESD to faster decoding
You can use TAESD to accelerate the decoding of latent images by following these steps:
- Download the model [weights](https://huggingface.co/madebyollin/taesd/blob/main/diffusion_pytorch_model.safetensors).
Or curl
```bash
curl -L -O https://huggingface.co/madebyollin/taesd/blob/main/diffusion_pytorch_model.safetensors
```
- Specify the model path using the `--taesd PATH` parameter. example:
```bash
sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --taesd ../models/diffusion_pytorch_model.safetensors
```
### Docker

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,509 +0,0 @@
/*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
* OTHER DEALINGS IN THE SOFTWARE.
*/
#pragma once
#ifndef ZIP_H
#define ZIP_H
#include <stdint.h>
#include <string.h>
#include <sys/types.h>
#ifndef ZIP_SHARED
#define ZIP_EXPORT
#else
#ifdef _WIN32
#ifdef ZIP_BUILD_SHARED
#define ZIP_EXPORT __declspec(dllexport)
#else
#define ZIP_EXPORT __declspec(dllimport)
#endif
#else
#define ZIP_EXPORT __attribute__((visibility("default")))
#endif
#endif
#ifdef __cplusplus
extern "C" {
#endif
#if !defined(_POSIX_C_SOURCE) && defined(_MSC_VER)
// 64-bit Windows is the only mainstream platform
// where sizeof(long) != sizeof(void*)
#ifdef _WIN64
typedef long long ssize_t; /* byte count or error */
#else
typedef long ssize_t; /* byte count or error */
#endif
#endif
/**
* @mainpage
*
* Documentation for @ref zip.
*/
/**
* @addtogroup zip
* @{
*/
/**
* Default zip compression level.
*/
#define ZIP_DEFAULT_COMPRESSION_LEVEL 6
/**
* Error codes
*/
#define ZIP_ENOINIT -1 // not initialized
#define ZIP_EINVENTNAME -2 // invalid entry name
#define ZIP_ENOENT -3 // entry not found
#define ZIP_EINVMODE -4 // invalid zip mode
#define ZIP_EINVLVL -5 // invalid compression level
#define ZIP_ENOSUP64 -6 // no zip 64 support
#define ZIP_EMEMSET -7 // memset error
#define ZIP_EWRTENT -8 // cannot write data to entry
#define ZIP_ETDEFLINIT -9 // cannot initialize tdefl compressor
#define ZIP_EINVIDX -10 // invalid index
#define ZIP_ENOHDR -11 // header not found
#define ZIP_ETDEFLBUF -12 // cannot flush tdefl buffer
#define ZIP_ECRTHDR -13 // cannot create entry header
#define ZIP_EWRTHDR -14 // cannot write entry header
#define ZIP_EWRTDIR -15 // cannot write to central dir
#define ZIP_EOPNFILE -16 // cannot open file
#define ZIP_EINVENTTYPE -17 // invalid entry type
#define ZIP_EMEMNOALLOC -18 // extracting data using no memory allocation
#define ZIP_ENOFILE -19 // file not found
#define ZIP_ENOPERM -20 // no permission
#define ZIP_EOOMEM -21 // out of memory
#define ZIP_EINVZIPNAME -22 // invalid zip archive name
#define ZIP_EMKDIR -23 // make dir error
#define ZIP_ESYMLINK -24 // symlink error
#define ZIP_ECLSZIP -25 // close archive error
#define ZIP_ECAPSIZE -26 // capacity size too small
#define ZIP_EFSEEK -27 // fseek error
#define ZIP_EFREAD -28 // fread error
#define ZIP_EFWRITE -29 // fwrite error
#define ZIP_ERINIT -30 // cannot initialize reader
#define ZIP_EWINIT -31 // cannot initialize writer
#define ZIP_EWRINIT -32 // cannot initialize writer from reader
/**
* Looks up the error message string corresponding to an error number.
* @param errnum error number
* @return error message string corresponding to errnum or NULL if error is not
* found.
*/
extern ZIP_EXPORT const char *zip_strerror(int errnum);
/**
* @struct zip_t
*
* This data structure is used throughout the library to represent zip archive -
* forward declaration.
*/
struct zip_t;
/**
* Opens zip archive with compression level using the given mode.
*
* @param zipname zip archive file name.
* @param level compression level (0-9 are the standard zlib-style levels).
* @param mode file access mode.
* - 'r': opens a file for reading/extracting (the file must exists).
* - 'w': creates an empty file for writing.
* - 'a': appends to an existing archive.
*
* @return the zip archive handler or NULL on error
*/
extern ZIP_EXPORT struct zip_t *zip_open(const char *zipname, int level,
char mode);
/**
* Opens zip archive with compression level using the given mode.
* The function additionally returns @param errnum -
*
* @param zipname zip archive file name.
* @param level compression level (0-9 are the standard zlib-style levels).
* @param mode file access mode.
* - 'r': opens a file for reading/extracting (the file must exists).
* - 'w': creates an empty file for writing.
* - 'a': appends to an existing archive.
* @param errnum 0 on success, negative number (< 0) on error.
*
* @return the zip archive handler or NULL on error
*/
extern ZIP_EXPORT struct zip_t *
zip_openwitherror(const char *zipname, int level, char mode, int *errnum);
/**
* Closes the zip archive, releases resources - always finalize.
*
* @param zip zip archive handler.
*/
extern ZIP_EXPORT void zip_close(struct zip_t *zip);
/**
* Determines if the archive has a zip64 end of central directory headers.
*
* @param zip zip archive handler.
*
* @return the return code - 1 (true), 0 (false), negative number (< 0) on
* error.
*/
extern ZIP_EXPORT int zip_is64(struct zip_t *zip);
/**
* Opens an entry by name in the zip archive.
*
* For zip archive opened in 'w' or 'a' mode the function will append
* a new entry. In readonly mode the function tries to locate the entry
* in global dictionary.
*
* @param zip zip archive handler.
* @param entryname an entry name in local dictionary.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_entry_open(struct zip_t *zip, const char *entryname);
/**
* Opens an entry by name in the zip archive.
*
* For zip archive opened in 'w' or 'a' mode the function will append
* a new entry. In readonly mode the function tries to locate the entry
* in global dictionary (case sensitive).
*
* @param zip zip archive handler.
* @param entryname an entry name in local dictionary (case sensitive).
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_entry_opencasesensitive(struct zip_t *zip,
const char *entryname);
/**
* Opens a new entry by index in the zip archive.
*
* This function is only valid if zip archive was opened in 'r' (readonly) mode.
*
* @param zip zip archive handler.
* @param index index in local dictionary.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_entry_openbyindex(struct zip_t *zip, size_t index);
/**
* Closes a zip entry, flushes buffer and releases resources.
*
* @param zip zip archive handler.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_entry_close(struct zip_t *zip);
/**
* Returns a local name of the current zip entry.
*
* The main difference between user's entry name and local entry name
* is optional relative path.
* Following .ZIP File Format Specification - the path stored MUST not contain
* a drive or device letter, or a leading slash.
* All slashes MUST be forward slashes '/' as opposed to backwards slashes '\'
* for compatibility with Amiga and UNIX file systems etc.
*
* @param zip: zip archive handler.
*
* @return the pointer to the current zip entry name, or NULL on error.
*/
extern ZIP_EXPORT const char *zip_entry_name(struct zip_t *zip);
/**
* Returns an index of the current zip entry.
*
* @param zip zip archive handler.
*
* @return the index on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT ssize_t zip_entry_index(struct zip_t *zip);
/**
* Determines if the current zip entry is a directory entry.
*
* @param zip zip archive handler.
*
* @return the return code - 1 (true), 0 (false), negative number (< 0) on
* error.
*/
extern ZIP_EXPORT int zip_entry_isdir(struct zip_t *zip);
/**
* Returns the uncompressed size of the current zip entry.
* Alias for zip_entry_uncomp_size (for backward compatibility).
*
* @param zip zip archive handler.
*
* @return the uncompressed size in bytes.
*/
extern ZIP_EXPORT unsigned long long zip_entry_size(struct zip_t *zip);
/**
* Returns the uncompressed size of the current zip entry.
*
* @param zip zip archive handler.
*
* @return the uncompressed size in bytes.
*/
extern ZIP_EXPORT unsigned long long zip_entry_uncomp_size(struct zip_t *zip);
/**
* Returns the compressed size of the current zip entry.
*
* @param zip zip archive handler.
*
* @return the compressed size in bytes.
*/
extern ZIP_EXPORT unsigned long long zip_entry_comp_size(struct zip_t *zip);
/**
* Returns CRC-32 checksum of the current zip entry.
*
* @param zip zip archive handler.
*
* @return the CRC-32 checksum.
*/
extern ZIP_EXPORT unsigned int zip_entry_crc32(struct zip_t *zip);
/**
* Compresses an input buffer for the current zip entry.
*
* @param zip zip archive handler.
* @param buf input buffer.
* @param bufsize input buffer size (in bytes).
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_entry_write(struct zip_t *zip, const void *buf,
size_t bufsize);
/**
* Compresses a file for the current zip entry.
*
* @param zip zip archive handler.
* @param filename input file.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_entry_fwrite(struct zip_t *zip, const char *filename);
/**
* Extracts the current zip entry into output buffer.
*
* The function allocates sufficient memory for a output buffer.
*
* @param zip zip archive handler.
* @param buf output buffer.
* @param bufsize output buffer size (in bytes).
*
* @note remember to release memory allocated for a output buffer.
* for large entries, please take a look at zip_entry_extract function.
*
* @return the return code - the number of bytes actually read on success.
* Otherwise a negative number (< 0) on error.
*/
extern ZIP_EXPORT ssize_t zip_entry_read(struct zip_t *zip, void **buf,
size_t *bufsize);
/**
* Extracts the current zip entry into a memory buffer using no memory
* allocation.
*
* @param zip zip archive handler.
* @param buf preallocated output buffer.
* @param bufsize output buffer size (in bytes).
*
* @note ensure supplied output buffer is large enough.
* zip_entry_size function (returns uncompressed size for the current
* entry) can be handy to estimate how big buffer is needed.
* For large entries, please take a look at zip_entry_extract function.
*
* @return the return code - the number of bytes actually read on success.
* Otherwise a negative number (< 0) on error (e.g. bufsize is not large
* enough).
*/
extern ZIP_EXPORT ssize_t zip_entry_noallocread(struct zip_t *zip, void *buf,
size_t bufsize);
/**
* Extracts the current zip entry into output file.
*
* @param zip zip archive handler.
* @param filename output file.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_entry_fread(struct zip_t *zip, const char *filename);
/**
* Extracts the current zip entry using a callback function (on_extract).
*
* @param zip zip archive handler.
* @param on_extract callback function.
* @param arg opaque pointer (optional argument, which you can pass to the
* on_extract callback)
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int
zip_entry_extract(struct zip_t *zip,
size_t (*on_extract)(void *arg, uint64_t offset,
const void *data, size_t size),
void *arg);
/**
* Returns the number of all entries (files and directories) in the zip archive.
*
* @param zip zip archive handler.
*
* @return the return code - the number of entries on success, negative number
* (< 0) on error.
*/
extern ZIP_EXPORT ssize_t zip_entries_total(struct zip_t *zip);
/**
* Deletes zip archive entries.
*
* @param zip zip archive handler.
* @param entries array of zip archive entries to be deleted.
* @param len the number of entries to be deleted.
* @return the number of deleted entries, or negative number (< 0) on error.
*/
extern ZIP_EXPORT ssize_t zip_entries_delete(struct zip_t *zip,
char *const entries[], size_t len);
/**
* Extracts a zip archive stream into directory.
*
* If on_extract is not NULL, the callback will be called after
* successfully extracted each zip entry.
* Returning a negative value from the callback will cause abort and return an
* error. The last argument (void *arg) is optional, which you can use to pass
* data to the on_extract callback.
*
* @param stream zip archive stream.
* @param size stream size.
* @param dir output directory.
* @param on_extract on extract callback.
* @param arg opaque pointer.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int
zip_stream_extract(const char *stream, size_t size, const char *dir,
int (*on_extract)(const char *filename, void *arg),
void *arg);
/**
* Opens zip archive stream into memory.
*
* @param stream zip archive stream.
* @param size stream size.
* @param level compression level (0-9 are the standard zlib-style levels).
* @param mode file access mode.
* - 'r': opens a file for reading/extracting (the file must exists).
* - 'w': creates an empty file for writing.
* - 'a': appends to an existing archive.
*
* @return the zip archive handler or NULL on error
*/
extern ZIP_EXPORT struct zip_t *zip_stream_open(const char *stream, size_t size,
int level, char mode);
/**
* Opens zip archive stream into memory.
* The function additionally returns @param errnum -
*
* @param stream zip archive stream.
* @param size stream size.*
* @param level compression level (0-9 are the standard zlib-style levels).
* @param mode file access mode.
* - 'r': opens a file for reading/extracting (the file must exists).
* - 'w': creates an empty file for writing.
* - 'a': appends to an existing archive.
* @param errnum 0 on success, negative number (< 0) on error.
*
* @return the zip archive handler or NULL on error
*/
extern ZIP_EXPORT struct zip_t *zip_stream_openwitherror(const char *stream,
size_t size, int level,
char mode,
int *errnum);
/**
* Copy zip archive stream output buffer.
*
* @param zip zip archive handler.
* @param buf output buffer. User should free buf.
* @param bufsize output buffer size (in bytes).
*
* @return copy size
*/
extern ZIP_EXPORT ssize_t zip_stream_copy(struct zip_t *zip, void **buf,
size_t *bufsize);
/**
* Close zip archive releases resources.
*
* @param zip zip archive handler.
*
* @return
*/
extern ZIP_EXPORT void zip_stream_close(struct zip_t *zip);
/**
* Creates a new archive and puts files into a single zip archive.
*
* @param zipname zip archive file.
* @param filenames input files.
* @param len: number of input files.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_create(const char *zipname, const char *filenames[],
size_t len);
/**
* Extracts a zip archive file into directory.
*
* If on_extract_entry is not NULL, the callback will be called after
* successfully extracted each zip entry.
* Returning a negative value from the callback will cause abort and return an
* error. The last argument (void *arg) is optional, which you can use to pass
* data to the on_extract_entry callback.
*
* @param zipname zip archive file.
* @param dir output directory.
* @param on_extract_entry on extract callback.
* @param arg opaque pointer.
*
* @return the return code - 0 on success, negative number (< 0) on error.
*/
extern ZIP_EXPORT int zip_extract(const char *zipname, const char *dir,
int (*on_extract_entry)(const char *filename,
void *arg),
void *arg);
/** @} */
#ifdef __cplusplus
}
#endif
#endif

View File

@ -58,6 +58,7 @@ struct SDParams {
std::string model_path;
std::string vae_path;
std::string taesd_path;
ggml_type wtype = GGML_TYPE_COUNT;
std::string lora_model_dir;
std::string output_path = "output.png";
@ -86,6 +87,7 @@ void print_params(SDParams params) {
printf(" model_path: %s\n", params.model_path.c_str());
printf(" wtype: %s\n", params.wtype < GGML_TYPE_COUNT ? ggml_type_name(params.wtype) : "unspecified");
printf(" vae_path: %s\n", params.vae_path.c_str());
printf(" taesd_path: %s\n", params.taesd_path.c_str());
printf(" output_path: %s\n", params.output_path.c_str());
printf(" init_img: %s\n", params.input_path.c_str());
printf(" prompt: %s\n", params.prompt.c_str());
@ -112,8 +114,9 @@ void print_usage(int argc, const char* argv[]) {
printf(" If threads <= 0, then threads will be set to the number of CPU physical cores\n");
printf(" -m, --model [MODEL] path to model\n");
printf(" --vae [VAE] path to vae\n");
printf(" --taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)\n");
printf(" --type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)\n");
printf(" If not specified, the default is the type of the weight file.");
printf(" If not specified, the default is the type of the weight file.\n");
printf(" --lora-model-dir [DIR] lora model directory\n");
printf(" -i, --init-img [IMAGE] path to the input image, required by img2img\n");
printf(" -o, --output OUTPUT path to write result image to (default: ./output.png)\n");
@ -176,6 +179,12 @@ void parse_args(int argc, const char** argv, SDParams& params) {
break;
}
params.vae_path = argv[i];
} else if (arg == "--taesd") {
if (++i >= argc) {
invalid_arg = true;
break;
}
params.taesd_path = argv[i];
} else if (arg == "--type") {
if (++i >= argc) {
invalid_arg = true;
@ -449,7 +458,8 @@ int main(int argc, const char* argv[]) {
}
}
StableDiffusion sd(params.n_threads, vae_decode_only, true, params.lora_model_dir, params.rng_type);
StableDiffusion sd(params.n_threads, vae_decode_only, params.taesd_path, true, params.lora_model_dir, params.rng_type);
if (!sd.load_from_file(params.model_path, params.vae_path, params.wtype, params.schedule)) {
return 1;
}

2
ggml

@ -1 +1 @@
Subproject commit 03669ba9fdc5e0520e919e5c7e1b3a3359d28e59
Subproject commit 70474c6890c015b53dc10a2300ae35246cc73589

View File

@ -1296,7 +1296,7 @@ bool ModelLoader::load_tensors(on_new_tensor_cb_t on_new_tensor_cb) {
if (backend == NULL || ggml_backend_is_cpu(backend)) {
// for the CPU and Metal backend, we can copy directly into the tensor
if (tensor_storage.type == dst_tensor->type) {
GGML_ASSERT(ggml_nbytes(dst_tensor) == nbytes_to_read);
GGML_ASSERT(ggml_nbytes(dst_tensor) == tensor_storage.nbytes());
read_data(tensor_storage, (char*)dst_tensor->data, nbytes_to_read);
if (tensor_storage.is_bf16) {
@ -1349,16 +1349,23 @@ bool ModelLoader::load_tensors(on_new_tensor_cb_t on_new_tensor_cb) {
return success;
}
int64_t ModelLoader::cal_mem_size() {
int64_t ModelLoader::cal_mem_size(ggml_backend_t backend) {
size_t alignment = 128;
if (backend != NULL) {
alignment = ggml_backend_get_alignment(backend);
}
int64_t mem_size = 0;
std::vector<TensorStorage> processed_tensor_storages;
for (auto& tensor_storage : tensor_storages) {
if (is_unused_tensor(tensor_storage.name)) {
continue;
}
mem_size += tensor_storage.nbytes();
mem_size += GGML_MEM_ALIGN * 2; // for lora alphas
preprocess_tensor(tensor_storage, processed_tensor_storages);
}
return mem_size + 10 * 1024 * 1024;
for (auto& tensor_storage : processed_tensor_storages) {
mem_size += tensor_storage.nbytes() + alignment;
}
return mem_size;
}

View File

@ -8,6 +8,7 @@
#include <vector>
#include "ggml/ggml.h"
#include "ggml/ggml-backend.h"
#include "json.hpp"
#include "zip.h"
@ -116,7 +117,7 @@ public:
ggml_type get_sd_wtype();
bool load_vocab(on_new_token_cb_t on_new_token_cb);
bool load_tensors(on_new_tensor_cb_t on_new_tensor_cb);
int64_t cal_mem_size();
int64_t cal_mem_size(ggml_backend_t backend);
~ModelLoader() = default;
};
#endif // __MODEL_H__

File diff suppressed because it is too large Load Diff

View File

@ -38,6 +38,7 @@ private:
public:
StableDiffusion(int n_threads = -1,
bool vae_decode_only = false,
std::string taesd_path = "",
bool free_params_immediately = false,
std::string lora_model_dir = "",
RNGType rng_type = STD_DEFAULT_RNG);