diff --git a/README.md b/README.md index 30fff66..8e617a9 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# πŸ”— Comfyui : Bjornulf_custom_nodes v0.45 πŸ”— +# πŸ”— Comfyui : Bjornulf_custom_nodes v0.46 πŸ”— # Coffee : β˜•β˜•β˜•β˜•β˜• 5/5 @@ -8,6 +8,7 @@ ## πŸ‘ Display and Show πŸ‘ `1.` [πŸ‘ Show (Text, Int, Float)](#1----show-text-int-float) +`49.` [πŸ“ΉπŸ‘ Video Preview](#49) ## βœ’ Text βœ’ `2.` [βœ’ Write Text](#2----write-text) @@ -15,7 +16,7 @@ `4.` [πŸ”— Combine Texts](#4----combine-texts) `15.` [πŸ’Ύ Save Text](#15----save-text) `26.` [🎲 Random line from input](#26----random-line-from-input) -`28.` [πŸ”’ Text with random Seed](#28----text-with-random-seed) +`28.` [πŸ”’πŸŽ² Text with random Seed](#28----text-with-random-seed) `32.` [πŸ§‘πŸ“ Character Description Generator](#32----character-description-generator) `48.` [πŸ”€πŸŽ² Text scrambler (πŸ§‘ Character)](#48----text-scrambler--character) @@ -37,6 +38,7 @@ `3.` [βœ’πŸ—” Advanced Write Text (+ 🎲 random selection and πŸ…°οΈ variables)](#3----advanced-write-text---random-selection-and-πŸ…°%EF%B8%8F-variables) `5.` [🎲 Random (Texts)](#5----random-texts) `26.` [🎲 Random line from input](#26----random-line-from-input) +`28.` [πŸ”’πŸŽ² Text with random Seed](#28----text-with-random-seed) `37.` [πŸŽ²πŸ–Ό Random Image](#37----random-image) `40.` [🎲 Random (Model+Clip+Vae) - aka Checkpoint / Model](#40----random-modelclipvae---aka-checkpoint--model) `41.` [🎲 Random Load checkpoint (Model Selector)](#41----random-load-checkpoint-model-selector) @@ -69,7 +71,11 @@ ## πŸ“Ή Video πŸ“Ή `20.` [πŸ“Ή Video Ping Pong](#20----video-ping-pong) -`21.` [πŸ“Ή Images to Video](#21----images-to-video) +`21.` [πŸ“Ή Images to Video (FFmpeg)](#21----images-to-video) +`49.` [πŸ“ΉπŸ‘ Video Preview](#49) +`50.` [πŸ–ΌβžœπŸ“Ή Images to Video path (tmp video)](#50) +`51.` [πŸ“ΉβžœπŸ–Ό Video Path to Images](#51) +`52.` [πŸ”ŠπŸ“Ή Audio Video Sync](#52) ## πŸ¦™ AI πŸ¦™ `19.` [πŸ¦™ Ollama](#19----ollama) @@ -77,13 +83,14 @@ ## πŸ”Š Audio πŸ”Š `31.` [πŸ”Š TTS - Text to Speech](#31----tts---text-to-speech-100-local-any-voice-you-want-any-language) +`52.` [πŸ”ŠπŸ“Ή Audio Video Sync](#52) ## πŸ’» System πŸ’» `34.` [🧹 Free VRAM hack](#34----free-vram-hack) ## 🧍 Manual user Control 🧍 -`35.` [⏸️ Paused. Resume or Stop ?](#35---%EF%B8%8F-paused-resume-or-stop-) -`36.` [βΈοΈπŸ” Paused. Select input, Pick one](#36---%EF%B8%8F-paused-select-input-pick-one) +`35.` [⏸️ Paused. Resume or Stop, Pick πŸ‘‡](#35---%EF%B8%8F-paused-resume-or-stop-) +`36.` [⏸️ Paused. Select input, Pick πŸ‘‡](#36---%EF%B8%8F-paused-select-input-pick-one) ## 🧠 Logic / Conditional Operations 🧠 `45.` [πŸ”€ If-Else (input / compare_with)](#45----if-else-input--compare_with) @@ -217,6 +224,7 @@ cd /where/you/installed/ComfyUI && python main.py - **v0.43**: Add control_after_generate to Ollama and allow to keep in VRAM for 1 minute if needed. (For chaining quick generations.) Add fallback to 0.0.0.0 - **v0.44**: Allow ollama to have a cusom url in the file `ollama_ip.txt` in the comfyui custom nodes folder. Minor changes, add details/updates to README. - **v0.45**: Add a new node : Text scrambler (Character), change text randomly using the file `scrambler/scrambler_character.json` in the comfyui custom nodes folder. +- **v0.46**: ❗ A lot of changes to Video nodes. Save to video is now using FLOAT for fps, not INT. (A lot of other custom nodes do that as well...) Add node to preview video, add node to convert a video path to a list of images. add node to convert a list of images to a temporary video + video_path. add node to synchronize duration of audio with video. (useful for MuseTalk) change TTS node with many new outputs ("audio_path", "full_path", "duration") to reuse with other nodes like MuseTalk, also TTS rename input to "connect_to_workflow", to avoid mistakes sending text to it. # πŸ“ Nodes descriptions @@ -521,13 +529,20 @@ Also, when you select a voice with this format `fr/fake_Bjornulf.wav`, it will c So... note that if you know you have an audio file ready to play, you can still use my node but you do NOT need my TTS server to be running. My node will just play the audio file if it can find it, won't try to connect th backend TTS server. -Let's say you already use this node to create an audio file saying `workflow is done` with the Attenborough voice : +Let's say you already use this node to create an audio file saying `workflow is done` with the Attenborough voice : + ![TTS](screenshots/tts_end.png) -As long as you keep exactly the same settings, it will not use my server to play the audio file! You can safely turn in off, so it won't use your precious VRAM Duh. (TTS server should be using ~3GB of VRAM.) +As long as you keep exactly the same settings, it will not use my server to play the audio file! You can safely turn the TTS server off, so it won't use your precious VRAM Duh. (TTS server should be using ~3GB of VRAM.) + +Also `connect_to_workflow` is optional, it means that you can make a workflow with ONLY my TTS node to pre-generate the audio files with the sentences you want to use later, example : +![TTS](screenshots/tts_preload.png) + +If you want to run my TTS nodes along side image generation, i recommend you to use my PAUSE node so you can manually stop the TTS server after my TTS node. When the VRAM is freed, you can the click on the RESUME button to continue the workflow. +If you can afford to run both at the same time, good for you, but Locally I can't run my TTS server and FLUX at the same time, so I use this trick. : + +![TTS](screenshots/tts_preload_2.png) -Also input is optional, it means that you can make a workflow with ONLY my TTS node to pre-generate the audio files with the sentences you want to maybe use later, example : -![TTS](screenshots/tts_generate.png) ### 32 - πŸ§‘πŸ“ Character Description Generator ![characters](screenshots/characters.png) @@ -756,3 +771,36 @@ Here another simple example taking a few selected images from a folder and combi **Description:** Take text as input and scramble (randomize) the text by using the file `scrambler/character_scrambler.json` in the comfyui custom nodes folder. + +### 49 - πŸ“ΉπŸ‘ Video Preview + +![video preview](screenshots/video_preview.png) + +**Description:** + +### 50 - πŸ–ΌβžœπŸ“Ή Images to Video path (tmp video) + +![image to video path](screenshots/image_to_video_path.png) + +**Description:** + +### 51 - πŸ“ΉβžœπŸ–Ό Video Path to Images + +![video path to image](screenshots/video_path_to_image.png) + +**Description:** + +### 52 - πŸ”ŠπŸ“Ή Audio Video Sync + +**Description:** + +This node will basically synchronize the duration of an audio file with a video file by adding silence to the audio file if it's too short, or demultiply the video file if too long. (Video ideally need to be a loop, check my ping pong video node.) +It is good like for example with MuseTalk , If you want to chain up videos (Let's say sentence by sentence) it will always go back to the last frame. (Making the video transition smoother.) + +Here is an example without `Audio Video Sync` node (The duration of the video is shorter than the audio, so after playing it will not go back to the last frame, ideally i want to have a loop where the first frame is the same as the last frame. -See my node loop video ping pong if needed-) : + +![audio sync video](screenshots/audio_sync_video_without.png) + +Here is an example with `Audio Video Sync` node, notice that it is also convenient to recover the frames per second of the video, and send that to other nodes. : + +![audio sync video](screenshots/audio_sync_video_with.png) diff --git a/__init__.py b/__init__.py index bc91911..77a3a91 100644 --- a/__init__.py +++ b/__init__.py @@ -1,9 +1,9 @@ from .images_to_video import imagesToVideo from .write_text import WriteText -from .write_image_environment import WriteImageEnvironment -from .write_image_characters import WriteImageCharacters -from .write_image_character import WriteImageCharacter -from .write_image_allinone import WriteImageAllInOne +# from .write_image_environment import WriteImageEnvironment +# from .write_image_characters import WriteImageCharacters +# from .write_image_character import WriteImageCharacter +# from .write_image_allinone import WriteImageAllInOne from .combine_texts import CombineTexts from .loop_texts import LoopTexts from .random_texts import RandomTexts @@ -51,9 +51,17 @@ from .image_details import ImageDetails from .combine_images import CombineImages # from .pass_preview_image import PassPreviewImage from .text_scramble_character import ScramblerCharacter +from .audio_video_sync import AudioVideoSync +from .video_path_to_images import VideoToImagesList +from .images_to_video_path import ImagesListToVideo +from .video_preview import VideoPreview NODE_CLASS_MAPPINGS = { "Bjornulf_ollamaLoader": ollamaLoader, + "Bjornulf_VideoPreview": VideoPreview, + "Bjornulf_ImagesListToVideo": ImagesListToVideo, + "Bjornulf_VideoToImagesList": VideoToImagesList, + "Bjornulf_AudioVideoSync": AudioVideoSync, "Bjornulf_ScramblerCharacter": ScramblerCharacter, "Bjornulf_CombineImages": CombineImages, "Bjornulf_ImageDetails": ImageDetails, @@ -106,6 +114,10 @@ NODE_CLASS_MAPPINGS = { NODE_DISPLAY_NAME_MAPPINGS = { "Bjornulf_WriteText": "βœ’ Write Text", + "Bjornulf_VideoPreview": "πŸ“ΉπŸ‘ Video Preview", + "Bjornulf_ImagesListToVideo": "πŸ–ΌβžœπŸ“Ή Images to Video path (tmp video)", + "Bjornulf_VideoToImagesList": "πŸ“ΉβžœπŸ–Ό Video Path to Images", + "Bjornulf_AudioVideoSync": "πŸ”ŠπŸ“Ή Audio Video Sync", "Bjornulf_ScramblerCharacter": "πŸ”€πŸŽ² Text scrambler (πŸ§‘ Character)", "Bjornulf_WriteTextAdvanced": "βœ’πŸ—” Advanced Write Text", "Bjornulf_LoopWriteText": "β™» Loop (βœ’πŸ—” Advanced Write Text)", @@ -129,7 +141,7 @@ NODE_DISPLAY_NAME_MAPPINGS = { "Bjornulf_CharacterDescriptionGenerator": "πŸ§‘πŸ“ Character Description Generator", "Bjornulf_GreenScreenToTransparency": "πŸŸ©βžœβ–’ Green Screen to Transparency", "Bjornulf_SaveBjornulfLobeChat": "πŸ–ΌπŸ’¬ Save image for Bjornulf LobeChat", - "Bjornulf_TextToStringAndSeed": "πŸ”’ Text with random Seed", + "Bjornulf_TextToStringAndSeed": "πŸ”’πŸŽ² Text with random Seed", "Bjornulf_ShowText": "πŸ‘ Show (Text, Int, Float)", "Bjornulf_ImageMaskCutter": "πŸ–Όβœ‚ Cut Image with Mask", "Bjornulf_LoadImageWithTransparency": "πŸ“₯πŸ–Ό Load Image with Transparency β–’", diff --git a/audio_video_sync.py b/audio_video_sync.py new file mode 100644 index 0000000..66e2d5a --- /dev/null +++ b/audio_video_sync.py @@ -0,0 +1,156 @@ +import torch +import torchaudio +import os +import subprocess +from datetime import datetime +import math + +class AudioVideoSync: + def __init__(self): + pass + + @classmethod + def INPUT_TYPES(s): + return { + "required": { + "audio": ("AUDIO",), + "video_path": ("STRING", {"default": ""}), + }, + } + + RETURN_TYPES = ("AUDIO", "STRING", "STRING", "FLOAT") + RETURN_NAMES = ("synced_audio", "audio_path", "synced_video_path", "video_fps") + FUNCTION = "sync_audio_video" + CATEGORY = "audio" + + # def get_video_duration(self, video_path): + # cmd = ['ffprobe', '-v', 'error', '-show_entries', 'format=duration', '-of', 'default=noprint_wrappers=1:nokey=1', video_path] + # result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) + # duration = float(result.stdout) + # return math.ceil(duration * 10) / 10 + + def get_video_duration(self, video_path): + cmd = ['ffprobe', '-v', 'error', '-show_entries', 'format=duration', '-of', 'default=noprint_wrappers=1:nokey=1', video_path] + result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) + return float(result.stdout) + + def get_video_fps(self, video_path): + cmd = ['ffprobe', '-v', 'error', '-select_streams', 'v:0', '-count_packets', '-show_entries', 'stream=r_frame_rate', '-of', 'csv=p=0', video_path] + result = subprocess.run(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) + fps = result.stdout.strip() + if '/' in fps: + num, den = map(float, fps.split('/')) + return num / den + return float(fps) + + def sync_audio_video(self, audio, video_path): + if not isinstance(audio, dict) or 'waveform' not in audio or 'sample_rate' not in audio: + raise ValueError("Expected audio input to be a dictionary with 'waveform' and 'sample_rate' keys") + + audio_data = audio['waveform'] + sample_rate = audio['sample_rate'] + + print(f"Audio data shape: {audio_data.shape}") + print(f"Sample rate: {sample_rate}") + + # Calculate video duration + video_duration = self.get_video_duration(video_path) + + # Calculate audio duration + audio_duration = audio_data.shape[-1] / sample_rate + + print(f"Video duration: {video_duration}") + print(f"Audio duration: {audio_duration}") + + # Calculate the desired audio duration and number of video repetitions + if audio_duration <= video_duration: + target_duration = video_duration + repetitions = 1 + else: + repetitions = math.ceil(audio_duration / video_duration) + target_duration = video_duration * repetitions + + # Calculate the number of samples to add + current_samples = audio_data.shape[-1] + target_samples = int(target_duration * sample_rate) + samples_to_add = target_samples - current_samples + + print(f"Current samples: {current_samples}, Target samples: {target_samples}, Samples to add: {samples_to_add}") + + if samples_to_add > 0: + # Create silence + if audio_data.dim() == 3: + silence_shape = (audio_data.shape[0], audio_data.shape[1], samples_to_add) + else: # audio_data.dim() == 2 + silence_shape = (audio_data.shape[0], samples_to_add) + + silence = torch.zeros(silence_shape, dtype=audio_data.dtype, device=audio_data.device) + + # Append silence to the audio + synced_audio = torch.cat((audio_data, silence), dim=-1) + else: + synced_audio = audio_data + + print(f"Synced audio shape: {synced_audio.shape}") + + # Save the synced audio file and get the file path + audio_path = self.save_audio(synced_audio, sample_rate) + + # Create and save the synced video + synced_video_path = self.create_synced_video(video_path, repetitions) + + video_fps = self.get_video_fps(video_path) + + # Return the synced audio data, audio file path, and synced video path + return ({"waveform": synced_audio, "sample_rate": sample_rate}, audio_path, synced_video_path, video_fps) + + def save_audio(self, audio_tensor, sample_rate): + # Create the sync_audio folder if it doesn't exist + os.makedirs("Bjornulf/sync_audio", exist_ok=True) + + # Generate a unique filename using the current timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + filename = f"Bjornulf/sync_audio/synced_audio_{timestamp}.wav" + + # Ensure audio_tensor is 2D + if audio_tensor.dim() == 3: + audio_tensor = audio_tensor.squeeze(0) # Remove batch dimension + elif audio_tensor.dim() == 1: + audio_tensor = audio_tensor.unsqueeze(0) # Add channel dimension + + # Save the audio file + torchaudio.save(filename, audio_tensor, sample_rate) + print(f"Synced audio saved to: {filename}") + + # Return the full path to the saved audio file + return os.path.abspath(filename) + + def create_synced_video(self, video_path, repetitions): + # Create the sync_video folder if it doesn't exist + os.makedirs("Bjornulf/sync_video", exist_ok=True) + + # Generate a unique filename using the current timestamp + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + output_path = f"Bjornulf/sync_video/synced_video_{timestamp}.mp4" + + # Create a temporary file with the list of input video files + with open("Bjornulf/temp_video_list.txt", "w") as f: + for _ in range(repetitions): + f.write(f"file '{video_path}'\n") + + # Use ffmpeg to concatenate the video multiple times + cmd = [ + 'ffmpeg', + '-f', 'concat', + '-safe', '0', + '-i', 'Bjornulf/temp_video_list.txt', + '-c', 'copy', + output_path + ] + subprocess.run(cmd, check=True) + + # Remove the temporary file + os.remove("Bjornulf/temp_video_list.txt") + + print(f"Synced video saved to: {output_path}") + return os.path.abspath(output_path) \ No newline at end of file diff --git a/images_to_video.py b/images_to_video.py index c04115a..ba58326 100644 --- a/images_to_video.py +++ b/images_to_video.py @@ -12,7 +12,7 @@ class imagesToVideo: return { "required": { "images": ("IMAGE",), - "fps": ("INT", {"default": 24, "min": 1, "max": 60}), + "fps": ("FLOAT", {"default": 24, "min": 1, "max": 120}), "name_prefix": ("STRING", {"default": "output/imgs2video/me"}), "format": (["mp4", "webm"], {"default": "mp4"}), "mp4_encoder": (["libx264 (H.264)", "h264_nvenc (H.264 / NVIDIA GPU)", "libx265 (H.265)", "hevc_nvenc (H.265 / NVIDIA GPU)"], {"default": "h264_nvenc (H.264 / NVIDIA GPU)"}), @@ -47,7 +47,7 @@ class imagesToVideo: # Create the new filename with the incremented number output_file = f"{name_prefix}_{next_num:04d}.{format}" - temp_dir = "temp_images_imgs2video" + temp_dir = "Bjornulf/temp_images_imgs2video" # Clean up temp dir if os.path.exists(temp_dir) and os.path.isdir(temp_dir): for file in os.listdir(temp_dir): diff --git a/images_to_video_path.py b/images_to_video_path.py new file mode 100644 index 0000000..8159f3c --- /dev/null +++ b/images_to_video_path.py @@ -0,0 +1,91 @@ +import os +import uuid +import subprocess +import tempfile +import torch +import numpy as np +from PIL import Image + +class ImagesListToVideo: + @classmethod + def INPUT_TYPES(s): + return { + "required": { + "images": ("IMAGE",), + "frames_per_second": ("FLOAT", {"default": 30, "min": 1, "max": 120, "step": 1}), + } + } + + RETURN_TYPES = ("STRING",) + RETURN_NAMES = ("video_path",) + FUNCTION = "images_to_video" + CATEGORY = "Bjornulf" + + def images_to_video(self, images, frames_per_second=30): + # Create the output directory if it doesn't exist + output_dir = os.path.join("Bjornulf", "images_to_video") + os.makedirs(output_dir, exist_ok=True) + + # Generate a unique filename for the video + video_filename = f"video_{uuid.uuid4().hex}.mp4" + video_path = os.path.join(output_dir, video_filename) + + # Create a temporary directory to store image files + with tempfile.TemporaryDirectory() as temp_dir: + # Save each image as a PNG file in the temporary directory + for i, img in enumerate(images): + # Convert the image to the correct format + img_np = self.convert_to_numpy(img) + + # Ensure the image is in RGB format + if img_np.shape[-1] != 3: + img_np = self.convert_to_rgb(img_np) + + # Convert to PIL Image + img_pil = Image.fromarray(img_np) + img_path = os.path.join(temp_dir, f"frame_{i:05d}.png") + img_pil.save(img_path) + + # Use FFmpeg to create a video from the image sequence + ffmpeg_cmd = [ + "ffmpeg", + "-framerate", str(frames_per_second), + "-i", os.path.join(temp_dir, "frame_%05d.png"), + "-c:v", "libx264", + "-pix_fmt", "yuv420p", + "-crf", "23", + "-y", # Overwrite output file if it exists + video_path + ] + + try: + subprocess.run(ffmpeg_cmd, check=True, capture_output=True, text=True) + except subprocess.CalledProcessError as e: + print(f"FFmpeg error: {e.stderr}") + return ("",) # Return empty string if video creation fails + + return (video_path,) + + def convert_to_numpy(self, img): + if isinstance(img, torch.Tensor): + img = img.cpu().numpy() + if img.dtype == np.uint8: + return img + elif img.dtype == np.float32 or img.dtype == np.float64: + return (img * 255).astype(np.uint8) + else: + raise ValueError(f"Unsupported data type: {img.dtype}") + + def convert_to_rgb(self, img): + if img.shape[-1] == 1: # Grayscale + return np.repeat(img, 3, axis=-1) + elif img.shape[-1] == 768: # Latent space representation + # This is a placeholder. You might need a more sophisticated method to convert latent space to RGB + img = img.reshape((-1, 3)) # Reshape to (H*W, 3) + img = (img - img.min()) / (img.max() - img.min()) # Normalize to [0, 1] + img = (img * 255).astype(np.uint8) + return img.reshape((img.shape[0], -1, 3)) # Reshape back to (H, W, 3) + elif len(img.shape) == 2: # 2D array + return np.stack([img, img, img], axis=-1) + else: + raise ValueError(f"Unsupported image shape: {img.shape}") \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 81e8676..a900a23 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [project] name = "bjornulf_custom_nodes" description = "Nodes: Ollama, Text to Speech, Combine Texts, Random Texts, Save image for Bjornulf LobeChat, Text with random Seed, Random line from input, Combine images, Image to grayscale (black & white), Remove image Transparency (alpha), Resize Image, ..." -version = "0.45" +version = "0.46" license = {file = "LICENSE"} [project.urls] diff --git a/screenshots/audio_sync_video_with.png b/screenshots/audio_sync_video_with.png new file mode 100644 index 0000000..54cb1ea Binary files /dev/null and b/screenshots/audio_sync_video_with.png differ diff --git a/screenshots/audio_sync_video_without.png b/screenshots/audio_sync_video_without.png new file mode 100644 index 0000000..4eebc29 Binary files /dev/null and b/screenshots/audio_sync_video_without.png differ diff --git a/screenshots/image_to_video_path.png b/screenshots/image_to_video_path.png new file mode 100644 index 0000000..9122aaf Binary files /dev/null and b/screenshots/image_to_video_path.png differ diff --git a/screenshots/tts.png b/screenshots/tts.png index a3f7480..c438223 100644 Binary files a/screenshots/tts.png and b/screenshots/tts.png differ diff --git a/screenshots/tts_end.png b/screenshots/tts_end.png index d46cc18..7dd01db 100644 Binary files a/screenshots/tts_end.png and b/screenshots/tts_end.png differ diff --git a/screenshots/tts_generate.png b/screenshots/tts_generate.png deleted file mode 100644 index a8b1fe6..0000000 Binary files a/screenshots/tts_generate.png and /dev/null differ diff --git a/screenshots/tts_preload.png b/screenshots/tts_preload.png new file mode 100644 index 0000000..b3eaec3 Binary files /dev/null and b/screenshots/tts_preload.png differ diff --git a/screenshots/tts_preload_2.png b/screenshots/tts_preload_2.png new file mode 100644 index 0000000..b59b894 Binary files /dev/null and b/screenshots/tts_preload_2.png differ diff --git a/screenshots/video_path_to_image.png b/screenshots/video_path_to_image.png new file mode 100644 index 0000000..b04021b Binary files /dev/null and b/screenshots/video_path_to_image.png differ diff --git a/screenshots/video_preview.png b/screenshots/video_preview.png new file mode 100644 index 0000000..fea3ce6 Binary files /dev/null and b/screenshots/video_preview.png differ diff --git a/text_to_speech.py b/text_to_speech.py index 9d88361..c8c47fd 100644 --- a/text_to_speech.py +++ b/text_to_speech.py @@ -9,53 +9,34 @@ import os import sys import random import re +from typing import Dict, Any, List, Tuple class Everything(str): def __ne__(self, __value: object) -> bool: return False language_map = { - "ar": "Arabic", - "cs": "Czech", - "de": "German", - "en": "English", - "es": "Spanish", - "fr": "French", - "hi": "Hindi", - "hu": "Hungarian", - "it": "Italian", - "ja": "Japanese", - "ko": "Korean", - "nl": "Dutch", - "pl": "Polish", - "pt": "Portuguese", - "ru": "Russian", - "tr": "Turkish", + "ar": "Arabic", "cs": "Czech", "de": "German", "en": "English", + "es": "Spanish", "fr": "French", "hi": "Hindi", "hu": "Hungarian", + "it": "Italian", "ja": "Japanese", "ko": "Korean", "nl": "Dutch", + "pl": "Polish", "pt": "Portuguese", "ru": "Russian", "tr": "Turkish", "zh-cn": "Chinese" } class TextToSpeech: - @classmethod - def INPUT_TYPES(cls): + def INPUT_TYPES(cls) -> Dict[str, Any]: speakers_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "speakers") - speaker_options = [] - - for root, dirs, files in os.walk(speakers_dir): - for file in files: - if file.endswith(".wav"): - rel_path = os.path.relpath(os.path.join(root, file), speakers_dir) - speaker_options.append(rel_path) - - if not speaker_options: - speaker_options.append("No WAV files found") - - language_options = list(language_map.values()) - + speaker_options = [os.path.relpath(os.path.join(root, file), speakers_dir) + for root, _, files in os.walk(speakers_dir) + for file in files if file.endswith(".wav")] + + speaker_options = speaker_options or ["No WAV files found"] + return { "required": { "text": ("STRING", {"multiline": True}), - "language": (language_options, { + "language": (list(language_map.values()), { "default": language_map["en"], "display": "dropdown" }), @@ -69,44 +50,45 @@ class TextToSpeech: "seed": ("INT", {"default": 0}), }, "optional": { - "input": (Everything("*"), {"forceInput": True}), + "connect_to_workflow": (Everything("*"), {"forceInput": True}), } } - RETURN_TYPES = ("AUDIO",) + RETURN_TYPES = ("AUDIO", "STRING", "STRING", "FLOAT") + RETURN_NAMES = ("AUDIO", "audio_path", "full_path", "duration") FUNCTION = "generate_audio" CATEGORY = "Bjornulf" @staticmethod - def get_language_code(language_name): - for code, name in language_map.items(): - if name == language_name: - return code - return "en" + def get_language_code(language_name: str) -> str: + return next((code for code, name in language_map.items() if name == language_name), "en") @staticmethod - def sanitize_text(text): - sanitized = re.sub(r'[^\w\s-]', '', text).replace(' ', '_') - return sanitized[:50] + def sanitize_text(text: str) -> str: + return re.sub(r'[^\w\s-]', '', text).replace(' ', '_')[:50] - def generate_audio(self, text, language, autoplay, seed, save_audio, overwrite, speaker_wav, input=None): + def generate_audio(self, text: str, language: str, autoplay: bool, seed: int, + save_audio: bool, overwrite: bool, speaker_wav: str, + connect_to_workflow: Any = None) -> Tuple[Dict[str, Any], str, str, float]: language_code = self.get_language_code(language) sanitized_text = self.sanitize_text(text) save_path = os.path.join("Bjornulf_TTS", language, speaker_wav, f"{sanitized_text}.wav") - os.makedirs(os.path.dirname(save_path), exist_ok=True) + full_path = os.path.abspath(save_path) + os.makedirs(os.path.dirname(full_path), exist_ok=True) - if os.path.exists(save_path) and not overwrite: - print(f"Using existing audio file: {save_path}") - audio_data = self.load_audio_file(save_path) + if os.path.exists(full_path) and not overwrite: + print(f"Using existing audio file: {full_path}") + audio_data = self.load_audio_file(full_path) else: audio_data = self.create_new_audio(text, language_code, speaker_wav, seed) if save_audio: - self.save_audio_file(audio_data, save_path) + self.save_audio_file(audio_data, full_path) - return self.process_audio_data(autoplay, audio_data) + audio_output, _, duration = self.process_audio_data(autoplay, audio_data, full_path if save_audio else None) + return (audio_output, save_path, full_path, duration) - def create_new_audio(self, text, language_code, speaker_wav, seed): + def create_new_audio(self, text: str, language_code: str, speaker_wav: str, seed: int) -> io.BytesIO: random.seed(seed) if speaker_wav == "No WAV files found": print("Error: No WAV files available for text-to-speech.") @@ -133,17 +115,17 @@ class TextToSpeech: print(f"Unexpected error: {e}") return io.BytesIO() - def play_audio(self, audio): + def play_audio(self, audio: AudioSegment) -> None: if sys.platform.startswith('win'): try: import winsound - winsound.PlaySound(audio, winsound.SND_MEMORY) + winsound.PlaySound(audio.raw_data, winsound.SND_MEMORY) except Exception as e: print(f"An error occurred: {e}") else: play(audio) - def process_audio_data(self, autoplay, audio_data): + def process_audio_data(self, autoplay: bool, audio_data: io.BytesIO, save_path: str) -> Tuple[Dict[str, Any], str, float]: try: audio = AudioSegment.from_mp3(audio_data) sample_rate = audio.frame_rate @@ -151,23 +133,22 @@ class TextToSpeech: audio_np = np.array(audio.get_array_of_samples()).astype(np.float32) audio_np /= np.iinfo(np.int16).max - if num_channels == 1: - audio_np = audio_np.reshape(1, -1) - else: - audio_np = audio_np.reshape(-1, num_channels).T + audio_np = audio_np.reshape(-1, num_channels).T if num_channels > 1 else audio_np.reshape(1, -1) audio_tensor = torch.from_numpy(audio_np) if autoplay: self.play_audio(audio) - return ({"waveform": audio_tensor.unsqueeze(0), "sample_rate": sample_rate},) + duration = len(audio) / 1000.0 # Convert milliseconds to seconds + + return ({"waveform": audio_tensor.unsqueeze(0), "sample_rate": sample_rate}, save_path or "", duration) except Exception as e: print(f"Error processing audio data: {e}") - return ({"waveform": torch.zeros(1, 1, 1, dtype=torch.float32), "sample_rate": 22050},) + return ({"waveform": torch.zeros(1, 1, 1, dtype=torch.float32), "sample_rate": 22050}, "", 0.0) - def save_audio_file(self, audio_data, save_path): + def save_audio_file(self, audio_data: io.BytesIO, save_path: str) -> None: try: with open(save_path, 'wb') as f: f.write(audio_data.getvalue()) @@ -175,11 +156,11 @@ class TextToSpeech: except Exception as e: print(f"Error saving audio file: {e}") - def load_audio_file(self, file_path): + def load_audio_file(self, file_path: str) -> io.BytesIO: try: with open(file_path, 'rb') as f: audio_data = io.BytesIO(f.read()) return audio_data except Exception as e: print(f"Error loading audio file: {e}") - return io.BytesIO() + return io.BytesIO() \ No newline at end of file diff --git a/video_path_to_images.py b/video_path_to_images.py new file mode 100644 index 0000000..a615f58 --- /dev/null +++ b/video_path_to_images.py @@ -0,0 +1,62 @@ +import os +import cv2 +import numpy as np +import torch +from PIL import Image + +class VideoToImagesList: + @classmethod + def INPUT_TYPES(s): + return { + "required": { + "video_path": ("STRING", {"forceInput": True}), + "frame_interval": ("INT", {"default": 1, "min": 1, "max": 100}), + "max_frames": ("INT", {"default": 0, "min": 0, "max": 10000}) + } + } + + RETURN_TYPES = ("IMAGE", "FLOAT", "FLOAT", "INT") + RETURN_NAMES = ("IMAGE", "initial_fps", "new_fps", "total_frames") + FUNCTION = "video_to_images" + CATEGORY = "Bjornulf" + + def video_to_images(self, video_path, frame_interval=1, max_frames=0): + if not os.path.exists(video_path): + raise FileNotFoundError(f"Video file not found: {video_path}") + + cap = cv2.VideoCapture(video_path) + frame_count = 0 + images = [] + + # Get the initial fps of the video + initial_fps = cap.get(cv2.CAP_PROP_FPS) + + # Get the total number of frames in the video + total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) + + while True: + ret, frame = cap.read() + if not ret or (max_frames > 0 and len(images) >= max_frames): + break + + if frame_count % frame_interval == 0: + # Convert BGR to RGB + rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) + pil_image = Image.fromarray(rgb_frame) + + # Convert PIL Image to tensor + tensor_image = torch.from_numpy(np.array(pil_image).astype(np.float32) / 255.0).unsqueeze(0) + images.append(tensor_image) + + frame_count += 1 + + cap.release() + + if not images: + raise ValueError("No frames were extracted from the video") + + # Calculate the new fps + new_fps = initial_fps / frame_interval + + # Stack all images into a single tensor + return (torch.cat(images, dim=0), initial_fps, new_fps, total_frames) diff --git a/video_preview.py b/video_preview.py new file mode 100644 index 0000000..519b7e4 --- /dev/null +++ b/video_preview.py @@ -0,0 +1,49 @@ +import os +import shutil +# import logging + +class VideoPreview: + @classmethod + def INPUT_TYPES(cls): + return { + "required": { + "video_path": ("STRING", {"forceInput": True}), + }, + } + + RETURN_TYPES = () + FUNCTION = "preview_video" + CATEGORY = "Bjornulf" + OUTPUT_NODE = True + + def preview_video(self, video_path): + if not video_path: + return {"ui": {"error": "No video path provided."}} + + # Keep the "output" folder structure for copying + dest_dir = os.path.join("output", "Bjornulf", "preview_video") + os.makedirs(dest_dir, exist_ok=True) + + video_name = os.path.basename(video_path) + dest_path = os.path.join(dest_dir, video_name) + + if os.path.abspath(video_path) != os.path.abspath(dest_path): + shutil.copy2(video_path, dest_path) + print(f"Video copied successfully to {dest_path}") + else: + print(f"Video is already in the destination folder: {dest_path}") + + # Determine the video type based on file extension + _, file_extension = os.path.splitext(dest_path) + video_type = file_extension.lower()[1:] # Remove the dot from extension + + # logging.info(f"Video type: {video_type}") + # logging.info(f"Video path: {dest_path}") + # logging.info(f"Destination directory: {dest_dir}") + # logging.info(f"Video name: {video_name}") + + # Create a new variable for the return value without "output" + return_dest_dir = os.path.join("Bjornulf", "preview_video") + + # Return the video name and the modified destination directory + return {"ui": {"video": [video_name, return_dest_dir]}} diff --git a/web/js/video_preview.js b/web/js/video_preview.js new file mode 100644 index 0000000..02346ba --- /dev/null +++ b/web/js/video_preview.js @@ -0,0 +1,83 @@ +import { api } from '../../../scripts/api.js'; +import { app } from "../../../scripts/app.js"; + +function displayVideoPreview(component, filename, category) { + let videoWidget = component._videoWidget; + if (!videoWidget) { + // Create the widget if it doesn't exist + var container = document.createElement("div"); + const currentNode = component; + videoWidget = component.addDOMWidget("videopreview", "preview", container, { + serialize: false, + hideOnZoom: false, + getValue() { + return container.value; + }, + setValue(v) { + container.value = v; + }, + }); + videoWidget.computeSize = function(width) { + if (this.aspectRatio && !this.parentElement.hidden) { + let height = (currentNode.size[0] - 20) / this.aspectRatio + 10; + if (!(height > 0)) { + height = 0; + } + return [width, height]; + } + return [width, -4]; + }; + videoWidget.value = { hidden: false, paused: false, params: {} }; + videoWidget.parentElement = document.createElement("div"); + videoWidget.parentElement.className = "video_preview"; + videoWidget.parentElement.style['width'] = "100%"; + container.appendChild(videoWidget.parentElement); + videoWidget.videoElement = document.createElement("video"); + videoWidget.videoElement.controls = true; + videoWidget.videoElement.loop = false; + videoWidget.videoElement.muted = false; + videoWidget.videoElement.style['width'] = "100%"; + videoWidget.videoElement.addEventListener("loadedmetadata", () => { + videoWidget.aspectRatio = videoWidget.videoElement.videoWidth / videoWidget.videoElement.videoHeight; + adjustSize(component); + }); + videoWidget.videoElement.addEventListener("error", () => { + videoWidget.parentElement.hidden = true; + adjustSize(component); + }); + + videoWidget.parentElement.hidden = videoWidget.value.hidden; + videoWidget.videoElement.autoplay = !videoWidget.value.paused && !videoWidget.value.hidden; + videoWidget.videoElement.hidden = false; + videoWidget.parentElement.appendChild(videoWidget.videoElement); + component._videoWidget = videoWidget; // Store the widget for future reference + } + + // Update the video source + let params = { + "filename": filename, + "subfolder": category, + "type": "output", + "rand": Math.random().toString().slice(2, 12) + }; + const urlParams = new URLSearchParams(params); + videoWidget.videoElement.src = `http://localhost:8188/api/view?${urlParams.toString()}`; + + adjustSize(component); // Adjust the component size +} + +function adjustSize(component) { + component.setSize([component.size[0], component.computeSize([component.size[0], component.size[1]])[1]]); + component?.graph?.setDirtyCanvas(true); +} + +app.registerExtension({ + name: "Bjornulf.VideoPreview", + async beforeRegisterNodeDef(nodeType, nodeData, appInstance) { + if (nodeData?.name == "Bjornulf_VideoPreview") { + nodeType.prototype.onExecuted = function (data) { + displayVideoPreview(this, data.video[0], data.video[1]); + }; + } + } +});