Go to file

Urs Ganse 968fbf02aa feat: add option to switch the sigma schedule (#51 )

Concretely, this allows switching to the "Karras" schedule from the
Karras et al 2022 paper, equivalent to the samplers marked as "Karras"
in the AUTOMATIC1111 WebUI. This choice is in principle orthogonal to
the sampler choice and can be given independently.

2023-09-09 00:02:07 +08:00

.github/workflows

chore: fix tag_name

2023-08-21 23:17:26 +08:00

assets

feat: add img2img mode (#5 )

2023-08-16 01:48:07 +08:00

examples

feat: add option to switch the sigma schedule (#51 )

2023-09-09 00:02:07 +08:00

ggml @ 6958cd05c7

fix: avoid coredump when generating large image

2023-09-04 21:37:46 +08:00

models

feat: add SD2.x support (#40 )

2023-09-03 16:00:33 +08:00

.dockerignore

chore: add sd Dockerfile

2023-08-22 22:14:20 +08:00

.gitignore

feat: cmake improvements and simple ci (#9 )

2023-08-17 21:09:57 +08:00

.gitmodules

Initial commit

2023-08-13 16:00:22 +08:00

CMakeLists.txt

feat: cmake improvements and simple ci (#9 )

2023-08-17 21:09:57 +08:00

Dockerfile

chore: add sd Dockerfile

2023-08-22 22:14:20 +08:00

LICENSE

Initial commit

2023-08-13 16:00:22 +08:00

README.md

feat: add Euler, Heun and DPM++ (2M) samplers (#50 )

2023-09-08 23:47:28 +08:00

rng_philox.h

fix: seed should be 64 bit

2023-09-03 20:08:22 +08:00

rng.h

fix: seed should be 64 bit

2023-09-03 20:08:22 +08:00

stable-diffusion.cpp

feat: add option to switch the sigma schedule (#51 )

2023-09-09 00:02:07 +08:00

stable-diffusion.h

feat: add option to switch the sigma schedule (#51 )

2023-09-09 00:02:07 +08:00

README.md

stable-diffusion.cpp

Inference of Stable Diffusion in pure C/C++

Features

Plain C/C++ implementation based on ggml, working in the same way as llama.cpp
16-bit, 32-bit float support
4-bit, 5-bit and 8-bit integer quantization support
Accelerated memory-efficient CPU inference
- Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image
AVX, AVX2 and AVX512 support for x86 architectures
SD1.x and SD2.x support
Original txt2img and img2img mode
Negative prompt
stable-diffusion-webui style tokenizer (not all the features, only token weighting for now)
Sampling method
- Euler A
- Euler
- Heun
- DPM++ 2M
- DPM++ 2M v2
Cross-platform reproducibility (--rng cuda, consistent with the stable-diffusion-webui GPU RNG)
Supported platforms
- Linux
- Mac OS
- Windows
- Android (via Termux)

TODO

More sampling methods
GPU support
Make inference faster
- The current implementation of ggml_conv_2d is slow and has high memory usage
Continuing to reduce memory usage (quantizing the weights of ggml_conv_2d)
LoRA support
k-quants support

Usage

Get the Code

git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp

If you have already cloned the repository, you can use the following command to update the repository to the latest code.

cd stable-diffusion.cpp
git pull origin master
git submodule init
git submodule update

Convert weights

download original weights(.ckpt or .safetensors). For example

Stable Diffusion v1.4 from https://huggingface.co/CompVis/stable-diffusion-v-1-4-original
Stable Diffusion v1.5 from https://huggingface.co/runwayml/stable-diffusion-v1-5
Stable Diffuison v2.1 from https://huggingface.co/stabilityai/stable-diffusion-2-1

curl -L -O https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt
# curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
# curl -L -o https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/v2-1_768-nonema-pruned.safetensors

convert weights to ggml model format

cd models
pip install -r requirements.txt
python convert.py [path to weights] --out_type [output precision]
# For example, python convert.py sd-v1-4.ckpt --out_type f16

Quantization

You can specify the output model format using the --out_type parameter

f16 for 16-bit floating-point
f32 for 32-bit floating-point
q8_0 for 8-bit integer quantization
q5_0 or q5_1 for 5-bit integer quantization
q4_0 or q4_1 for 4-bit integer quantization

Build

Build from scratch

mkdir build
cd build
cmake ..
cmake --build . --config Release

Using OpenBLAS

cmake .. -DGGML_OPENBLAS=ON
cmake --build . --config Release

Run

usage: ./bin/sd [arguments]

arguments:
  -h, --help                         show this help message and exit
  -M, --mode [txt2img or img2img]    generation mode (default: txt2img)
  -t, --threads N                    number of threads to use during computation (default: -1).
                                     If threads <= 0, then threads will be set to the number of CPU physical cores
  -m, --model [MODEL]                path to model
  -i, --init-img [IMAGE]             path to the input image, required by img2img
  -o, --output OUTPUT                path to write result image to (default: .\output.png)
  -p, --prompt [PROMPT]              the prompt to render
  -n, --negative-prompt PROMPT       the negative prompt (default: "")
  --cfg-scale SCALE                  unconditional guidance scale: (default: 7.0)
  --strength STRENGTH                strength for noising/unnoising (default: 0.75)
                                     1.0 corresponds to full destruction of information in init image
  -H, --height H                     image height, in pixel space (default: 512)
  -W, --width W                      image width, in pixel space (default: 512)
  --sampling-method {euler, euler_a, heun, dpm++2m, dpm++2mv2}
                                     sampling method (default: "euler_a")
  --steps  STEPS                     number of sample steps (default: 20)
  --rng {std_default, cuda}          RNG (default: cuda)
  -s SEED, --seed SEED               RNG seed (default: 42, use random seed for < 0)
  -v, --verbose                      print extra info

txt2img example

./bin/sd -m ../models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat"

Using formats of different precisions will yield results of varying quality.

f32	f16	q8_0	q5_0	q5_1	q4_0	q4_1

img2img example

./output.png is the image generated from the above txt2img pipeline

./bin/sd --mode img2img -m ../models/sd-v1-4-ggml-model-f16.bin -p "cat with blue eyes" -i ./output.png -o ./img2img_output.png --strength 0.4

Docker

Building using Docker

docker build -t sd .

Run

docker run -v /path/to/models:/models -v /path/to/output/:/output sd [args...]
# For example
# docker run -v ./models:/models -v ./build:/output sd -m /models/sd-v1-4-ggml-model-f16.bin -p "a lovely cat" -v -o /output/output.png

Memory/Disk Requirements

precision	f32	f16	q8_0	q5_0	q5_1	q4_0	q4_1
Disk	2.7G	2.0G	1.7G	1.6G	1.6G	1.5G	1.5G
Memory(txt2img - 512 x 512)	~2.8G	~2.3G	~2.1G	~2.0G	~2.0G	~2.0G	~2.0G

README.md

stable-diffusion.cpp

Features

TODO

Usage

Get the Code

Convert weights

Quantization

Build

Build from scratch

Using OpenBLAS

Run

txt2img example

img2img example

Docker

Building using Docker

Run

Memory/Disk Requirements

References