* repair flash attention in _ext
this does not fix the currently broken fa behind the define, which is only used by VAE
Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
* make flash attention in the diffusion model a runtime flag
no support for sd3 or video
* remove old flash attention option and switch vae over to attn_ext
* update docs
* format code
---------
Co-authored-by: FSSRepo <FSSRepo@users.noreply.github.com>
Co-authored-by: leejet <leejet714@gmail.com>
* first attempt at updating to photomaker v2
* continue adding photomaker v2 modules
* finishing the last few pieces for photomaker v2; id_embeds need to be done by a manual step and pass as an input file
* added a name converter for Photomaker V2; build ok
* more debugging underway
* failing at cuda mat_mul
* updated chunk_half to be more efficient; redo feedforward
* fixed a bug: carefully using ggml_view_4d to get chunks of a tensor; strides need to be recalculated or set properly; still failing at soft_max cuda op
* redo weight calculation and weight*v
* fixed a bug now Photomaker V2 kinds of working
* add python script for face detection (Photomaker V2 needs)
* updated readme for photomaker
* fixed a bug causing PMV1 crashing; both V1 and V2 work
* fixed clean_input_ids for PMV2
* fixed a double counting bug in tokenize_with_trigger_token
* updated photomaker readme
* removed some commented code
* improved reconstructing class word free prompt
* changed reading id_embed to raw binary using existing load tensor function; this is more efficient than using model load and also makes it easier to work with sd server
* minor clean up
---------
Co-authored-by: bssrdf <bssrdf@gmail.com>
* mmdit-x
* add support for sd3.5 medium
* add skip layer guidance support (mmdit only)
* ignore slg if slg_scale is zero (optimization)
* init out_skip once
* slg support for flux (expermiental)
* warn if version doesn't support slg
* refactor slg cli args
* set default slg_scale to 0 (oops)
* format code
---------
Co-authored-by: leejet <leejet714@gmail.com>