0.35

2026-07-02 04:41:16 -03:00 · 2024-09-16 16:10:59 +02:00
parent 5f47d20f75
commit 002cf1220b
8 changed files with 166 additions and 45 deletions
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# 🔗 Comfyui : Bjornulf_custom_nodes v0.34 🔗
+# 🔗 Comfyui : Bjornulf_custom_nodes v0.35 🔗

 # ❤️ Coffee : ☕☕☕☕☕ 5/5

@@ -82,6 +82,7 @@ wget --content-disposition -P /workspace/ComfyUI/models/checkpoints "https://civ
 - **v0.32**: Quick rename to avoid breaking loop_text node.
 - **v0.33**: Control random on paused nodes, fix pydub sound bug permissions on Windows.
 - **v0.34**: Two new nodes : Load Images from output folder and Select an Image, Pick.
+- **v0.35**: Great improvements of the TTS node 31. It will also save the audio file in the "ComfyUI/Bjornulf_TTS/" folder. - Not tested on windows yet -

 # 📝 Nodes descriptions

@@ -91,7 +92,7 @@ wget --content-disposition -P /workspace/ComfyUI/models/checkpoints "https://civ

 **Description:**  
 The show node will only display text, or a list of several texts. (read only node)  
-3 types are managed : Green is for STRING type, Orange is for FLOAT type and blue is for INT type.  
+3 types are managed : Green is for STRING type, Orange is for FLOAT type and blue is for INT type. I put colors so I/you don't try to edit them. 🤣  

 ## 2 - ✒ Write Text

@@ -118,7 +119,7 @@ Picked text: photo of a green house
 ![Combine Texts](screenshots/combine_texts.png)

 **Description:**  
-Combine multiple text inputs into a single output. (can have separation with : comma, space, new line.)
+Combine multiple text inputs into a single output. (can have separation with : comma, space, new line or nothing.)

 ## 5 - 🎲 Random (Texts)
 ![Random Text](screenshots/random_text.png)
@@ -162,8 +163,8 @@ Here is an example of usage with ksampler (Notice that with "steps" this node is
 ![Widget to Input](screenshots/example_loop_integer.png)

 ## 9 - ♻ Loop Float
-![Loop Float](screenshots/loop_float.png)
 ![Loop Float + Show Text](screenshots/loop_float+show_text.png)
+![Loop Float](screenshots/loop_float.png)

 **Description:**  
 Loop through a range of floating-point numbers, good for `cfg`, `denoise`, etc...  
@@ -342,16 +343,43 @@ The default `Load Image` node will not load the transparency.
 **Description:**  
 Cut an image from a mask.  

-## 31 - 🔊 TTS - Text to Speech
+## 31 - 🔊 TTS - Text to Speech (100% local, any voice you want, any language)
 ![TTS](screenshots/tts.png)

 **Description:**  
-Use my TTS server to generate speech from text.  
-❗ Of course you need to use my TTS server : <https://github.com/justUmen/Bjornulf_XTTS>  
-After having that installed, you NEED to create a link in my Comfyui custom node folder called `speakers` : `ComfyUI/custom_nodes/Bjornulf_custom_nodes/speakers`  
-That link must be a link to the folder where you store the voice samples you use for my TTS, like `default.wav`.  
+[Listen to the audio example](./example_tts.wav)  
+
+❗ Node never tested on windows, only on linux for now. ❗  
+
+Use my TTS server to generate speech from text, based on XTTS v2.  
+❗ Of course to use this comfyui node (frontend) you need to use my TTS server (backend) : <https://github.com/justUmen/Bjornulf_XTTS>  
+I made this backend for <https://github.com/justUmen/Bjornulf_lobe-chat>, but you can use it with comfyui too with this node.  
+After having `Bjornulf_XTTS` installed, you NEED to create a link in my Comfyui custom node folder called `speakers` : `ComfyUI/custom_nodes/Bjornulf_custom_nodes/speakers`  
+That link must be a link to the folder where you installed/stored the voice samples you use for my TTS, like `default.wav`.  
 If my TTS server is running on port 8020 (You can test in browser with the link <http://localhost:8020/tts_stream?language=en&speaker_wav=default&text=Hello>) and voice samples are good, you can use this node to generate speech from text.  

+**Details**  
+This node should always be connected to a core node : `Preview audio`.  
+
+My node will generate and save the audio files in the `ComfyUI/Bjornulf_TTS/` folder, followed by the language selected, the name of the voice sample, and the text.  
+Example of audio file from the screenshot above : `ComfyUI/Bjornulf_TTS/Chinese/default.wav/你吃了吗.wav`  
+You can notice that you don't NEED to select a chinese voice to speak chinese. Yes it will work, you can record yourself and make yourself speak whatever language you want.  
+Also, when you select a voice with this format `fr/fake_Bjornulf.wav`, it will create an extra folder `fr` of course. : `ComfyUI/Bjornulf_TTS/English/fr/fake_Bjornulf.wav/hello_im_me.wav`. Easy to see that you are using a french voice sample for an english recording.  
+
+`control_after_generate` as usual, it is used to force the node to rerun for every workflow run. (Even if there is no modification of the node or its inputs.)  
+`overwrite` is used to overwrite the audio file if it already exists. (For example if you don't like the generation, just set overwrite to True and run the workflow again, until you have a good result. After you can set it to back to False. (Paraphrasing : without overwrite set to True, It won't generate the audio file again if it already exists in the `Bjornulf_TTS` folder.)  
+`autoplay` is used to play the audio file inside the node when it is executed. (Manual replay or save is done in the `preview audio` node.)  
+
+So... note that if you know you have an audio file ready to play, you can still use my node but you do NOT need my TTS server to be running.
+My node will just play the audio file if it can find it, won't try to connect th backend TTS server.  
+Let's say you already use this node to create an audio file saying `workflow is done` with the Attenborough voice  :
+![TTS](screenshots/tts_end.png)  
+
+As long as you keep exactly the same settings, it will not use my server to play the audio file! You can safely turn in off, so it won't use your precious VRAM Duh. (TTS server should be using ~3GB of VRAM.)  
+
+Also input is optional, it means that you can make a workflow with ONLY my TTS node to pre-generate the audio files with the sentences you want to maybe use later, example :  
+![TTS](screenshots/tts_generate.png)  
+
 ### 32 - 🧑📝 Character Description Generator
 ![characters](screenshots/characters.png)
 ![characters](screenshots/characters2.png)
@@ -419,7 +447,7 @@ Just take a random image from a list of images.
 **Description:**  
 Loop over a list of images.  
 Usage example : You have a list of images, and you want to apply the same process to all of them.  
-Above is an example of the loop images node sending them to an Ipadapter style transfer workflow. (Same seed of course.)  
+Above is an example of the loop images node sending them to an Ipadapter workflow. (Same seed of course.)  

 ### 39 - ♻ Loop (✒🗔 Advanced Write Text)

@@ -427,7 +455,7 @@ Above is an example of the loop images node sending them to an Ipadapter style t

 **Description:**  
 If you need a quick loop but you don't want something too complex with a loop node, you can use this combined write text + loop.  
-It will take the same special syntax as the write text node `{blue|red}`, but it will loop over ALL the possibilities instead of taking one at random.  
+It will take the same special syntax as the Advanced write text node `{blue|red}`, but it will loop over ALL the possibilities instead of taking one at random.  

 ### 40 - 🎲 Random (Model+Clip+Vae) - aka Checkpoint / Model

@@ -459,7 +487,7 @@ If you want to know how i personnaly save my images for a specific character, he
 ![pick input](screenshots/character_save.png)  
 In this example I put "character/" as a string and then combine with "nothing". But it's the same if you do "character" and then combine with "/". (I just like having a / at the end of my folder's name...)  

-If you are satisfied with this logic, you can then select all these nodes, right click and `Convert to Group Node`, you can then have you own customized "save character node" :  
+If you are satisfied with this logic, you can then select all these nodes, right click and `Convert to Group Node`, you can then have your own customized "save character node" :  
 ![pick input](screenshots/bjornulf_save_character_group.png)

 Here is another example of the same thing but excluding the save folder node :  
--- a/example_tts.wav
+++ b/example_tts.wav
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,7 +1,7 @@
 [project]
 name = "bjornulf_custom_nodes"
 description = "Nodes: Ollama, Text to Speech, Combine Texts, Random Texts, Save image for Bjornulf LobeChat, Text with random Seed, Random line from input, Combine images, Image to grayscale (black & white), Remove image Transparency (alpha), Resize Image, ..."
-version = "0.34"
+version = "0.35"
 license = {file = "LICENSE"}

 [project.urls]
--- a/screenshots/tts.png
+++ b/screenshots/tts.png
--- a/screenshots/tts_end.png
+++ b/screenshots/tts_end.png
--- a/screenshots/tts_generate.png
+++ b/screenshots/tts_generate.png
--- a/text_to_speech.py
+++ b/text_to_speech.py
@@ -3,13 +3,41 @@ import numpy as np
 import io
 import torch
 from pydub import AudioSegment
+from pydub.playback import play
 import urllib.parse
 import os
+import sys
+import random
+import re
+
+class Everything(str):
+    def __ne__(self, __value: object) -> bool:
+        return False
+
+language_map = {
+    "ar": "Arabic",
+    "cs": "Czech",
+    "de": "German",
+    "en": "English",
+    "es": "Spanish",
+    "fr": "French",
+    "hi": "Hindi",
+    "hu": "Hungarian",
+    "it": "Italian",
+    "ja": "Japanese",
+    "ko": "Korean",
+    "nl": "Dutch",
+    "pl": "Polish",
+    "pt": "Portuguese",
+    "ru": "Russian",
+    "tr": "Turkish",
+    "zh-cn": "Chinese"
+}

 class TextToSpeech:
+        
    @classmethod
    def INPUT_TYPES(cls):
-        # speakers_dir = "speakers"
        speakers_dir = os.path.join(os.path.dirname(os.path.realpath(__file__)), "speakers")
        speaker_options = []

@@ -19,22 +47,29 @@ class TextToSpeech:
                    rel_path = os.path.relpath(os.path.join(root, file), speakers_dir)
                    speaker_options.append(rel_path)

-        # If no .wav files are found, add a default option
        if not speaker_options:
            speaker_options.append("No WAV files found")
            
+        language_options = list(language_map.values())
+
        return {
            "required": {
                "text": ("STRING", {"multiline": True}),
-                "language": (["ar", "cs", "de", "en", "es", "fr", "hi", "hu", "it", "ja", "ko", "nl", "pl", "pt", "ru", "tr", "zh-cn"], {
-                    "default": "en",
-                    "display": "dropdown",
-                    "labels": ["Arabic", "Czech", "German", "English", "Spanish", "French", "Hindi", "Hungarian", "Italian", "Japanese", "Korean", "Dutch", "Polish", "Portuguese", "Russian", "Turkish", "Chinese"]
+                "language": (language_options, {
+                    "default": language_map["en"],
+                    "display": "dropdown"
                }),
                "speaker_wav": (speaker_options, {
                    "default": speaker_options[0],
                    "display": "dropdown"
                }),
+                "autoplay": ("BOOLEAN", {"default": True}),
+                "save_audio": ("BOOLEAN", {"default": True}),
+                "overwrite": ("BOOLEAN", {"default": False}),
+                "seed": ("INT", {"default": 0}),
+            },
+            "optional": {
+                "input": (Everything("*"), {"forceInput": True}),
            }
        }

@@ -42,13 +77,44 @@ class TextToSpeech:
    FUNCTION = "generate_audio"
    CATEGORY = "Bjornulf"
    
-    def generate_audio(self, text, language, speaker_wav):
-        # Check if a valid speaker_wav was selected
+    @staticmethod
+    def get_language_code(language_name):
+        for code, name in language_map.items():
+            if name == language_name:
+                return code
+        return "en"
+    
+    @staticmethod
+    def sanitize_text(text):
+        sanitized = re.sub(r'[^\w\s-]', '', text).replace(' ', '_')
+        return sanitized[:50]
+    
+    def generate_audio(self, text, language, autoplay, seed, save_audio, overwrite, speaker_wav, input=None):
+        language_code = self.get_language_code(language)
+        sanitized_text = self.sanitize_text(text)
+
+        save_path = os.path.join("Bjornulf_TTS", language, speaker_wav, f"{sanitized_text}.wav")
+        os.makedirs(os.path.dirname(save_path), exist_ok=True)
+
+        if os.path.exists(save_path) and not overwrite:
+            print(f"Using existing audio file: {save_path}")
+            audio_data = self.load_audio_file(save_path)
+        else:
+            audio_data = self.create_new_audio(text, language_code, speaker_wav, seed)
+            if save_audio:
+                self.save_audio_file(audio_data, save_path)
+
+        return self.process_audio_data(autoplay, audio_data)
+
+    def create_new_audio(self, text, language_code, speaker_wav, seed):
+        random.seed(seed)
        if speaker_wav == "No WAV files found":
            print("Error: No WAV files available for text-to-speech.")
-            return ({"waveform": torch.zeros(1, 1, 1, dtype=torch.float32), "sample_rate": 22050},)
-        encoded_text = urllib.parse.quote(text)  # Encode spaces and special characters
-        url = f"http://localhost:8020/tts_stream?language={language}&speaker_wav={speaker_wav}&text={encoded_text}"
+            return io.BytesIO()
+
+        encoded_text = urllib.parse.quote(text)
+        url = f"http://localhost:8020/tts_stream?language={language_code}&speaker_wav={speaker_wav}&text={encoded_text}"
+        
        try:
            response = requests.get(url, stream=True)
            response.raise_for_status()
@@ -58,49 +124,62 @@ class TextToSpeech:
                audio_data.write(chunk)
            
            audio_data.seek(0)
-            return self.process_audio_data(audio_data)
+            return audio_data

        except requests.RequestException as e:
            print(f"Error generating audio: {e}")
-            return ({"waveform": torch.zeros(1, 1, 1, dtype=torch.float32), "sample_rate": 22050},)
+            return io.BytesIO()
        except Exception as e:
            print(f"Unexpected error: {e}")
-            return ({"waveform": torch.zeros(1, 1, 1, dtype=torch.float32), "sample_rate": 22050},)
+            return io.BytesIO()

-    def process_audio_data(self, audio_data):
+    def play_audio(self, audio):
+        if sys.platform.startswith('win'):
            try:
-            # Load MP3 data
-            audio = AudioSegment.from_mp3(audio_data)
+                import winsound
+                winsound.PlaySound(audio, winsound.SND_MEMORY)
+            except Exception as e:
+                print(f"An error occurred: {e}")
+        else:
+            play(audio)
            
-            # Get audio properties
+    def process_audio_data(self, autoplay, audio_data):
+        try:
+            audio = AudioSegment.from_mp3(audio_data)
            sample_rate = audio.frame_rate
            num_channels = audio.channels
-            
-            # Convert to numpy array
            audio_np = np.array(audio.get_array_of_samples()).astype(np.float32)
-            
-            # Normalize to [-1, 1]
            audio_np /= np.iinfo(np.int16).max
            
-            print(f"Raw audio data shape: {audio_np.shape}")
-            
-            # Reshape to (num_channels, num_samples)
            if num_channels == 1:
                audio_np = audio_np.reshape(1, -1)
            else:
                audio_np = audio_np.reshape(-1, num_channels).T
            
-            # Convert to torch tensor
            audio_tensor = torch.from_numpy(audio_np)
            
-            print(f"Final audio tensor shape: {audio_tensor.shape}")
-            print(f"Audio data type: {audio_tensor.dtype}")
-            print(f"Audio data min: {audio_tensor.min()}, max: {audio_tensor.max()}")
+            if autoplay:
+                self.play_audio(audio)
            
-            # Wrap the tensor in a list to match the expected format
            return ({"waveform": audio_tensor.unsqueeze(0), "sample_rate": sample_rate},)
    
        except Exception as e:
            print(f"Error processing audio data: {e}")
-            raise
+            return ({"waveform": torch.zeros(1, 1, 1, dtype=torch.float32), "sample_rate": 22050},)

+    def save_audio_file(self, audio_data, save_path):
+        try:
+            with open(save_path, 'wb') as f:
+                f.write(audio_data.getvalue())
+            print(f"Audio saved to: {save_path}")
+        except Exception as e:
+            print(f"Error saving audio file: {e}")
+
+    def load_audio_file(self, file_path):
+        try:
+            with open(file_path, 'rb') as f:
+                audio_data = io.BytesIO(f.read())
+            return audio_data
+        except Exception as e:
+            print(f"Error loading audio file: {e}")
+            return io.BytesIO()
--- a/web/js/text_to_speech.js
+++ b/web/js/text_to_speech.js
@@ -0,0 +1,14 @@
+import { app } from "../../../scripts/app.js";
+
+app.registerExtension({
+    name: "Bjornulf.TextToSpeech",
+    async nodeCreated(node) {
+        if (node.comfyClass === "Bjornulf_TextToSpeech") {
+            // Set seed widget to hidden input
+            const seedWidget = node.widgets.find((w) => w.name === "seed");
+            if (seedWidget) {
+              seedWidget.type = "HIDDEN";
+            }
+        }
+    }
+});