stable-diffusion.cpp/util.h
bssrdf a469688e30
feat: add TencentARC PhotoMaker support (#179)
* first efforts at implementing photomaker; lots more to do

* added PhotoMakerIDEncoder model in SD

* fixed soem bugs; now photomaker model weights can be loaded into their tensor buffers

* added input id image loading

* added preprocessing inpit id images

* finished get_num_tensors

* fixed a bug in remove_duplicates

* add a get_learned_condition_with_trigger function to do photomaker stuff

* add a convert_token_to_id function for photomaker to extract trigger word's token id

* making progress; need to implement tokenizer decoder

* making more progress; finishing vision model forward

* debugging vision_model outputs

* corrected clip vision model output

* continue making progress in id fusion process

* finished stacked id embedding; to be tested

* remove garbage file

* debuging graph compute

* more progress; now alloc buffer failed

* fixed wtype issue; input images can only be 1 because issue with transformer when batch size > 1 (to be investigated)

* added delayed subject conditioning; now photomaker runs and generates images

* fixed stat_merge_step

* added photomaker lora model (to be tested)

* reworked pmid lora

* finished applying pmid lora; to be tested

* finalized pmid lora

* add a few print tensor; tweak in sample again

* small tweak; still not getting ID faces

* fixed a bug in FuseBlock forward; also remove diag_mask op in for vision transformer; getting better results

* disable pmid lora apply for now; 1 input image seems working; > 1 not working

* turn pmid lora apply back on

* fixed a decode bug

* fixed a bug in ggml's conv_2d, and now > 1 input images working

* add style_ratio as a cli param; reworked encode with trigger for attention weights

* merge commit fixing lora free param buffer error

* change default style ratio to 10%

* added an option to offload vae decoder to CPU for mem-limited gpus

* removing image normalization step seems making ID fidelity much higher

* revert default style ratio back ro 20%

* added an option for normalizing input ID images; cleaned up debugging code

* more clean up

* fixed bugs; now failed with cuda error; likely out-of-mem on GPU

* free pmid model params when required

* photomaker working properly now after merging and adapting to GGMLBlock API

* remove tensor renaming;  fixing names in the photomaker model file

* updated README.md to include instructions and notes for running PhotoMaker

* a bit clean up

* remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update

* add input image requirement in README

* bring back freeing pmid lora params buffer; simply pooled output of CLIPvision

* remove MultiheadAttention2; customized MultiheadAttention

* added a WIN32 get_files_from_dir; turn off Photomakder if receiving no input images

* update docs

* fix ci error

* make stable-diffusion.h a pure c header file

This reverts commit 27887b630db6a92f269f0aef8de9bc9832ab50a9.

* fix ci error

* format code

* reuse get_learned_condition

* reuse pad_tokens

* reuse CLIPVisionModel

* reuse LoraModel

* add --clip-on-cpu

* fix lora name conversion for SDXL

---------

Co-authored-by: bssrdf <bssrdf@gmail.com>
Co-authored-by: leejet <leejet714@gmail.com>
2024-03-12 23:15:17 +08:00

60 lines
2.0 KiB
C++

#ifndef __UTIL_H__
#define __UTIL_H__
#include <cstdint>
#include <string>
#include <vector>
#include "stable-diffusion.h"
bool ends_with(const std::string& str, const std::string& ending);
bool starts_with(const std::string& str, const std::string& start);
bool contains(const std::string& str, const std::string& substr);
std::string format(const char* fmt, ...);
void replace_all_chars(std::string& str, char target, char replacement);
bool file_exists(const std::string& filename);
bool is_directory(const std::string& path);
std::string get_full_path(const std::string& dir, const std::string& filename);
std::vector<std::string> get_files_from_dir(const std::string& dir);
std::u32string utf8_to_utf32(const std::string& utf8_str);
std::string utf32_to_utf8(const std::u32string& utf32_str);
std::u32string unicode_value_to_utf32(int unicode_value);
sd_image_t* preprocess_id_image(sd_image_t* img);
// std::string sd_basename(const std::string& path);
typedef struct {
uint32_t width;
uint32_t height;
uint32_t channel;
float* data;
} sd_image_f32_t;
void normalize_sd_image_f32_t(sd_image_f32_t image, float means[3], float stds[3]);
sd_image_f32_t sd_image_t_to_sd_image_f32_t(sd_image_t image);
sd_image_f32_t resize_sd_image_f32_t(sd_image_f32_t image, int target_width, int target_height);
sd_image_f32_t clip_preprocess(sd_image_f32_t image, int size);
std::string path_join(const std::string& p1, const std::string& p2);
void pretty_progress(int step, int steps, float time);
void log_printf(sd_log_level_t level, const char* file, int line, const char* format, ...);
std::string trim(const std::string& s);
#define LOG_DEBUG(format, ...) log_printf(SD_LOG_DEBUG, __FILE__, __LINE__, format, ##__VA_ARGS__)
#define LOG_INFO(format, ...) log_printf(SD_LOG_INFO, __FILE__, __LINE__, format, ##__VA_ARGS__)
#define LOG_WARN(format, ...) log_printf(SD_LOG_WARN, __FILE__, __LINE__, format, ##__VA_ARGS__)
#define LOG_ERROR(format, ...) log_printf(SD_LOG_ERROR, __FILE__, __LINE__, format, ##__VA_ARGS__)
#endif // __UTIL_H__