Add TI training support

2023-01-26 16:22:58 -05:00 · 2023-01-26 16:22:58 -05:00 · 03bd2e9b01
commit 03bd2e9b01
parent 49bada0d25
14 changed files with 1655 additions and 20 deletions
--- a/README-ja.md
+++ b/README-ja.md
@ -0,0 +1,138 @@
 ## リポジトリについて
 Stable Diffusionの学習、画像生成、その他のスクリプトを入れたリポジトリです。
 [README in English](./README.md) ←更新情報はこちらにあります
 GUIやPowerShellスクリプトなど、より使いやすくする機能が[bmaltais氏のリポジトリ](https://github.com/bmaltais/kohya_ss)で提供されています（英語です）のであわせてご覧ください。bmaltais氏に感謝します。
 以下のスクリプトがあります。
 * DreamBooth、U-NetおよびText Encoderの学習をサポート
 * fine-tuning、同上
 * 画像生成
 * モデル変換（Stable Diffision ckpt/safetensorsとDiffusersの相互変換）
 ## 使用法について
 当リポジトリ内およびnote.comに記事がありますのでそちらをご覧ください（将来的にはすべてこちらへ移すかもしれません）。
 * [DreamBoothの学習について](./train_db_README-ja.md)
 * [fine-tuningのガイド](./fine_tune_README_ja.md):
 BLIPによるキャプショニングと、DeepDanbooruまたはWD14 taggerによるタグ付けを含みます
 * [LoRAの学習について](./train_network_README-ja.md)
 * [Textual Inversionの学習について](./train_ti_README-ja.md)
 * note.com [画像生成スクリプト](https://note.com/kohya_ss/n/n2693183a798e)
 * note.com [モデル変換スクリプト](https://note.com/kohya_ss/n/n374f316fe4ad)
 ## Windowsでの動作に必要なプログラム
 Python 3.10.6およびGitが必要です。
 - Python 3.10.6: https://www.python.org/ftp/python/3.10.6/python-3.10.6-amd64.exe
 - git: https://git-scm.com/download/win
 PowerShellを使う場合、venvを使えるようにするためには以下の手順でセキュリティ設定を変更してください。
 （venvに限らずスクリプトの実行が可能になりますので注意してください。）
 - PowerShellを管理者として開きます。
 - 「Set-ExecutionPolicy Unrestricted」と入力し、Yと答えます。
 - 管理者のPowerShellを閉じます。
 ## Windows環境でのインストール
 以下の例ではPyTorchは1.12.1／CUDA 11.6版をインストールします。CUDA 11.3版やPyTorch 1.13を使う場合は適宜書き換えください。
 （なお、python -m venv～の行で「python」とだけ表示された場合、py -m venv～のようにpythonをpyに変更してください。）
 通常の（管理者ではない）PowerShellを開き以下を順に実行します。
 ```powershell
 git clone https://github.com/kohya-ss/sd-scripts.git
 cd sd-scripts
 python -m venv venv
 .\venv\Scripts\activate
 pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
 pip install --upgrade -r requirements.txt
 pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
 cp .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
 cp .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
 cp .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
 accelerate config
 ```
 コマンドプロンプトでは以下になります。
 ```bat
 git clone https://github.com/kohya-ss/sd-scripts.git
 cd sd-scripts
 python -m venv venv
 .\venv\Scripts\activate
 pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
 pip install --upgrade -r requirements.txt
 pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
 copy /y .\bitsandbytes_windows\*.dll .\venv\Lib\site-packages\bitsandbytes\
 copy /y .\bitsandbytes_windows\cextension.py .\venv\Lib\site-packages\bitsandbytes\cextension.py
 copy /y .\bitsandbytes_windows\main.py .\venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py
 accelerate config
 ```
 （注:``python -m venv venv`` のほうが ``python -m venv --system-site-packages venv`` より安全そうなため書き換えました。globalなpythonにパッケージがインストールしてあると、後者だといろいろと問題が起きます。）
 accelerate configの質問には以下のように答えてください。（bf16で学習する場合、最後の質問にはbf16と答えてください。）
 ※0.15.0から日本語環境では選択のためにカーソルキーを押すと落ちます（……）。数字キーの0、1、2……で選択できますので、そちらを使ってください。
 ```txt
 - This machine
 - No distributed training
 - NO
 - NO
 - NO
 - all
 - fp16
 ```
 ※場合によって ``ValueError: fp16 mixed precision requires a GPU`` というエラーが出ることがあるようです。この場合、6番目の質問（
 ``What GPU(s) (by id) should be used for training on this machine as a comma-separated list? [all]:``）に「0」と答えてください。（id `0`のGPUが使われます。）
 ### PyTorchとxformersのバージョンについて
 他のバージョンでは学習がうまくいかない場合があるようです。特に他の理由がなければ指定のバージョンをお使いください。
 ## アップグレード
 新しいリリースがあった場合、以下のコマンドで更新できます。
 ```powershell
 cd sd-scripts
 git pull
 .\venv\Scripts\activate
 pip install --upgrade -r <requirement file name>
 ```
 コマンドが成功すれば新しいバージョンが使用できます。
 ## 謝意
 LoRAの実装は[cloneofsimo氏のリポジトリ](https://github.com/cloneofsimo/lora)を基にしたものです。感謝申し上げます。
 ## ライセンス
 スクリプトのライセンスはASL 2.0ですが（Diffusersおよびcloneofsimo氏のリポジトリ由来のものも同様）、一部他のライセンスのコードを含みます。
 [Memory Efficient Attention Pytorch](https://github.com/lucidrains/memory-efficient-attention-pytorch): MIT
 [bitsandbytes](https://github.com/TimDettmers/bitsandbytes): MIT
 [BLIP](https://github.com/salesforce/BLIP): BSD-3-Clause
--- a/README.md
+++ b/README.md
@ -6,8 +6,9 @@ This repository repository is providing a Gradio GUI for kohya's Stable Diffusio
 Python 3.10.6+ and Git:
- Python 3.10.6+: https://www.python.org/ftp/python/3.10.9/python-3.10.9-amd64.exe
+- Install Python 3.10 using https://www.python.org/ftp/python/3.10.9/python-3.10.9-amd64.exe (make sure to tick the box to add Python to the environment path)
 - git: https://git-scm.com/download/win
 - Visual Studio 2015, 2017, 2019, and 2022 redistributable: https://aka.ms/vs/17/release/vc_redist.x64.exe
 ## Installation
@ -23,7 +24,7 @@ Open a regular user Powershell terminal and type the following inside:
 git clone https://github.com/bmaltais/kohya_ss.git
 cd kohya_ss
-python -m venv --system-site-packages venv
+python -m venv venv
 .\venv\Scripts\activate
 pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 --extra-index-url https://download.pytorch.org/whl/cu116
@ -40,7 +41,7 @@ accelerate config
 ### Optional: CUDNN 8.6
-This step is optional but can improve the learning speed for NVidia 4090 owners...
+This step is optional but can improve the learning speed for NVidia 30X0/40X0 owners... It allows larger training batch size and faster training speed
 Due to the filesize I can't host the DLLs needed for CUDNN 8.6 on Github, I strongly advise you download them for a speed boost in sample generation (almost 50% on 4090) you can download them from here: https://b1.thefileditch.ch/mwxKTEtelILoIbMbruuM.zip
@ -130,6 +131,9 @@ Then redo the installation instruction within the kohya_ss venv.
 ## Change history
 * 2023/01/26 (v20.5.0):
    - Add new `Dreambooth TI` tab for training of Textual Inversion embeddings
    - Add Textual Inversion training. Documentation is [here](./train_ti_README-ja.md) (in Japanese.)
 * 2023/01/22 (v20.4.1):
    - Add new tool to verify LoRA weights produced by the trainer. Can be found under "Dreambooth LoRA/Tools/Verify LoRA"
 * 2023/01/22 (v20.4.0):
--- a/kohya_gui.py
+++ b/kohya_gui.py
@ -3,6 +3,7 @@ import os
 import argparse
 from dreambooth_gui import dreambooth_tab
 from finetune_gui import finetune_tab
 from textual_inversion_gui import ti_tab
 from library.utilities import utilities_tab
 from library.extract_lora_gui import gradio_extract_lora_tab
 from library.merge_lora_gui import gradio_merge_lora_tab
@ -30,6 +31,8 @@ def UI(username, password):
            ) = dreambooth_tab()
        with gr.Tab('Dreambooth LoRA'):
            lora_tab()
        with gr.Tab('Dreambooth TI'):
            ti_tab()
        with gr.Tab('Finetune'):
            finetune_tab()
        with gr.Tab('Utilities'):
--- a/library/common_gui.py
+++ b/library/common_gui.py
@ -424,8 +424,8 @@ def gradio_training(learning_rate_value='1e-6', lr_scheduler_value='constant', l
            minimum=1,
            maximum=os.cpu_count(),
            step=1,
-            label='Number of CPU threads per process',
+            label='Number of CPU threads per core',
-            value=os.cpu_count(),
+            value=2,
        )
        seed = gr.Textbox(label='Seed', value=1234)
    with gr.Row():
--- a/library/train_util.py
+++ b/library/train_util.py
@ -12,6 +12,7 @@ import math
 import os
 import random
 import hashlib
 from io import BytesIO
 from tqdm import tqdm
 import torch
@ -25,6 +26,7 @@ from PIL import Image
 import cv2
 from einops import rearrange
 from torch import einsum
 import safetensors.torch
 import library.model_util as model_util
@ -85,6 +87,7 @@ class BaseDataset(torch.utils.data.Dataset):
    self.enable_bucket = False
    self.min_bucket_reso = None
    self.max_bucket_reso = None
    self.bucket_info = None
    self.tokenizer_max_length = self.tokenizer.model_max_length if max_token_length is None else max_token_length + 2
@ -110,9 +113,14 @@ class BaseDataset(torch.utils.data.Dataset):
    self.image_data: dict[str, ImageInfo] = {}
    self.replacements = {}
  def disable_token_padding(self):
    self.token_padding_disabled = True
  def add_replacement(self, str_from, str_to):
    self.replacements[str_from] = str_to
  def process_caption(self, caption):
    if self.shuffle_caption:
      tokens = caption.strip().split(",")
@ -125,6 +133,17 @@ class BaseDataset(torch.utils.data.Dataset):
          random.shuffle(tokens)
          tokens = keep_tokens + tokens
      caption = ",".join(tokens).strip()
    for str_from, str_to in self.replacements.items():
      if str_from == "":
        # replace all
        if type(str_to) == list:
          caption = random.choice(str_to)
        else:
          caption = str_to                                      
      else:
        caption = caption.replace(str_from, str_to)
    return caption
  def get_input_ids(self, caption):
@ -217,11 +236,17 @@ class BaseDataset(torch.utils.data.Dataset):
        self.buckets[bucket_index].append(image_info.image_key)
    if self.enable_bucket:
      self.bucket_info = {"buckets": {}}
      print("number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）")
      for i, (reso, img_keys) in enumerate(zip(bucket_resos, self.buckets)):
        self.bucket_info["buckets"][i] = {"resolution": reso, "count": len(img_keys)}
        print(f"bucket {i}: resolution {reso}, count: {len(img_keys)}")
      img_ar_errors = np.array(img_ar_errors)
-      print(f"mean ar error (without repeats): {np.mean(np.abs(img_ar_errors))}")
+      mean_img_ar_error = np.mean(np.abs(img_ar_errors))
      self.bucket_info["mean_img_ar_error"] = mean_img_ar_error
      print(f"mean ar error (without repeats): {mean_img_ar_error}")
    # 参照用indexを作る
    self.buckets_indices: list(BucketBatchIndex) = []
@ -599,7 +624,7 @@ class FineTuningDataset(BaseDataset):
      else:
        # わりといい加減だがいい方法が思いつかん
        abs_path = glob_images(train_data_dir, image_key)
-        assert len(abs_path) >= 1, f"no image / 画像がありません: {abs_path}"
+        assert len(abs_path) >= 1, f"no image / 画像がありません: {image_key}"
        abs_path = abs_path[0]
      caption = img_md.get('caption')
@ -706,15 +731,17 @@ class FineTuningDataset(BaseDataset):
    return npz_file_norm, npz_file_flip
-def debug_dataset(train_dataset):
+def debug_dataset(train_dataset, show_input_ids=False):
  print(f"Total dataset length (steps) / データセットの長さ（ステップ数）: {len(train_dataset)}")
  print("Escape for exit. / Escキーで中断、終了します")
  k = 0
  for example in train_dataset:
    if example['latents'] is not None:
      print("sample has latents from npz file")
-    for j, (ik, cap, lw) in enumerate(zip(example['image_keys'], example['captions'], example['loss_weights'])):
+    for j, (ik, cap, lw, iid) in enumerate(zip(example['image_keys'], example['captions'], example['loss_weights'], example['input_ids'])):
      print(f'{ik}, size: {train_dataset.image_data[ik].image_size}, caption: "{cap}", loss weight: {lw}')
      if show_input_ids:
        print(f"input ids: {iid}")
      if example['images'] is not None:
        im = example['images'][j]
        im = ((im.numpy() + 1.0) * 127.5).astype(np.uint8)
@ -790,6 +817,49 @@ def calculate_sha256(filename):
  return hash_sha256.hexdigest()
 def precalculate_safetensors_hashes(tensors, metadata):
  """Precalculate the model hashes needed by sd-webui-additional-networks to
  save time on indexing the model later."""
  # Because writing user metadata to the file can change the result of
  # sd_models.model_hash(), only retain the training metadata for purposes of
  # calculating the hash, as they are meant to be immutable
  metadata = {k: v for k, v in metadata.items() if k.startswith("ss_")}
  bytes = safetensors.torch.save(tensors, metadata)
  b = BytesIO(bytes)
  model_hash = addnet_hash_safetensors(b)
  legacy_hash = addnet_hash_legacy(b)
  return model_hash, legacy_hash
 def addnet_hash_legacy(b):
  """Old model hash used by sd-webui-additional-networks for .safetensors format files"""
  m = hashlib.sha256()
  b.seek(0x100000)
  m.update(b.read(0x10000))
  return m.hexdigest()[0:8]
 def addnet_hash_safetensors(b):
  """New model hash used by sd-webui-additional-networks for .safetensors format files"""
  hash_sha256 = hashlib.sha256()
  blksize = 1024 * 1024
  b.seek(0)
  header = b.read(8)
  n = int.from_bytes(header, "little")
  offset = n + 8
  b.seek(offset)
  for chunk in iter(lambda: b.read(blksize), b""):
    hash_sha256.update(chunk)
  return hash_sha256.hexdigest()
 # flash attention forwards and backwards
 # https://arxiv.org/abs/2205.14135
@ -1057,6 +1127,8 @@ def add_training_arguments(parser: argparse.ArgumentParser, support_dreambooth:
                      choices=[None, "float", "fp16", "bf16"], help="precision in saving / 保存時に精度を変更して保存する")
  parser.add_argument("--save_every_n_epochs", type=int, default=None,
                      help="save checkpoint every N epochs / 学習中のモデルを指定エポックごとに保存する")
  parser.add_argument("--save_n_epoch_ratio", type=int, default=None,
                      help="save checkpoint N epoch ratio (for example 5 means save at least 5 files total) / 学習中のモデルを指定のエポック割合で保存する（たとえば5を指定すると最低5個のファイルが保存される）")
  parser.add_argument("--save_last_n_epochs", type=int, default=None, help="save last N checkpoints / 最大Nエポック保存する")
  parser.add_argument("--save_last_n_epochs_state", type=int, default=None,
                      help="save last N checkpoints of state (overrides the value of --save_last_n_epochs)/ 最大Nエポックstateを保存する(--save_last_n_epochsの指定を上書きします)")
--- a/lora_gui.py
+++ b/lora_gui.py
@ -275,6 +275,9 @@ def train_model(
        msgbox('Output folder path is missing')
        return
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    if stop_text_encoder_training_pct > 0:
        msgbox('Output "stop text encoder training" is not yet supported. Ignoring')
        stop_text_encoder_training_pct = 0
--- a/networks/lora.py
+++ b/networks/lora.py
@ -7,6 +7,8 @@ import math
 import os
 import torch
 from library import train_util
 class LoRAModule(torch.nn.Module):
  """
@ -31,7 +33,7 @@ class LoRAModule(torch.nn.Module):
      self.lora_up = torch.nn.Linear(lora_dim, out_dim, bias=False)
    if type(alpha) == torch.Tensor:
-      alpha = alpha.detach().numpy()
+      alpha = alpha.detach().float().numpy()                              # without casting, bf16 causes error
    alpha = lora_dim if alpha is None or alpha == 0 else alpha
    self.scale = alpha / self.lora_dim
    self.register_buffer('alpha', torch.tensor(alpha))                    # 定数として扱える
@ -221,6 +223,14 @@ class LoRANetwork(torch.nn.Module):
    if os.path.splitext(file)[1] == '.safetensors':
      from safetensors.torch import save_file
      # Precalculate model hashes to save time on indexing
      if metadata is None:
        metadata = {}
      model_hash, legacy_hash = train_util.precalculate_safetensors_hashes(state_dict, metadata)
      metadata["sshs_model_hash"] = model_hash
      metadata["sshs_legacy_hash"] = legacy_hash
      save_file(state_dict, file, metadata)
    else:
      torch.save(state_dict, file)
--- a/textual_inversion_gui.py
+++ b/textual_inversion_gui.py
@ -0,0 +1,777 @@
 # v1: initial release
 # v2: add open and save folder icons
 # v3: Add new Utilities tab for Dreambooth folder preparation
 # v3.1: Adding captionning of images to utilities
 import gradio as gr
 import json
 import math
 import os
 import subprocess
 import pathlib
 import argparse
 from library.common_gui import (
    get_folder_path,
    remove_doublequote,
    get_file_path,
    get_any_file_path,
    get_saveasfile_path,
    color_aug_changed,
    save_inference_file,
    gradio_advanced_training,
    run_cmd_advanced_training,
    run_cmd_training,
    gradio_training,
    gradio_config,
    gradio_source_model,
 )
 from library.dreambooth_folder_creation_gui import (
    gradio_dreambooth_folder_creation_tab,
 )
 from library.utilities import utilities_tab
 from easygui import msgbox
 folder_symbol = '\U0001f4c2'  # 📂
 refresh_symbol = '\U0001f504'  # 🔄
 save_style_symbol = '\U0001f4be'  # 💾
 document_symbol = '\U0001F4C4'   # 📄
 def save_configuration(
    save_as,
    file_path,
    pretrained_model_name_or_path,
    v2,
    v_parameterization,
    logging_dir,
    train_data_dir,
    reg_data_dir,
    output_dir,
    max_resolution,
    learning_rate,
    lr_scheduler,
    lr_warmup,
    train_batch_size,
    epoch,
    save_every_n_epochs,
    mixed_precision,
    save_precision,
    seed,
    num_cpu_threads_per_process,
    cache_latents,
    caption_extension,
    enable_bucket,
    gradient_checkpointing,
    full_fp16,
    no_token_padding,
    stop_text_encoder_training,
    use_8bit_adam,
    xformers,
    save_model_as,
    shuffle_caption,
    save_state,
    resume,
    prior_loss_weight,
    color_aug,
    flip_aug,
    clip_skip,
    vae,
    output_name,
    max_token_length,
    max_train_epochs,
    max_data_loader_n_workers,
    mem_eff_attn,
    gradient_accumulation_steps,
    model_list, token_string, init_word, num_vectors_per_token, max_train_steps, weights, template,
 ):
    # Get list of function parameters and values
    parameters = list(locals().items())
    original_file_path = file_path
    save_as_bool = True if save_as.get('label') == 'True' else False
    if save_as_bool:
        print('Save as...')
        file_path = get_saveasfile_path(file_path)
    else:
        print('Save...')
        if file_path == None or file_path == '':
            file_path = get_saveasfile_path(file_path)
    # print(file_path)
    if file_path == None or file_path == '':
        return original_file_path  # In case a file_path was provided and the user decide to cancel the open action
    # Return the values of the variables as a dictionary
    variables = {
        name: value
        for name, value in parameters  # locals().items()
        if name
        not in [
            'file_path',
            'save_as',
        ]
    }
    # Save the data to the selected file
    with open(file_path, 'w') as file:
        json.dump(variables, file, indent=2)
    return file_path
 def open_configuration(
    file_path,
    pretrained_model_name_or_path,
    v2,
    v_parameterization,
    logging_dir,
    train_data_dir,
    reg_data_dir,
    output_dir,
    max_resolution,
    learning_rate,
    lr_scheduler,
    lr_warmup,
    train_batch_size,
    epoch,
    save_every_n_epochs,
    mixed_precision,
    save_precision,
    seed,
    num_cpu_threads_per_process,
    cache_latents,
    caption_extension,
    enable_bucket,
    gradient_checkpointing,
    full_fp16,
    no_token_padding,
    stop_text_encoder_training,
    use_8bit_adam,
    xformers,
    save_model_as,
    shuffle_caption,
    save_state,
    resume,
    prior_loss_weight,
    color_aug,
    flip_aug,
    clip_skip,
    vae,
    output_name,
    max_token_length,
    max_train_epochs,
    max_data_loader_n_workers,
    mem_eff_attn,
    gradient_accumulation_steps,
    model_list, token_string, init_word, num_vectors_per_token, max_train_steps, weights, template,
 ):
    # Get list of function parameters and values
    parameters = list(locals().items())
    original_file_path = file_path
    file_path = get_file_path(file_path)
    if not file_path == '' and not file_path == None:
        # load variables from JSON file
        with open(file_path, 'r') as f:
            my_data_db = json.load(f)
            print('Loading config...')
    else:
        file_path = original_file_path  # In case a file_path was provided and the user decide to cancel the open action
        my_data_db = {}
    values = [file_path]
    for key, value in parameters:
        # Set the value in the dictionary to the corresponding value in `my_data`, or the default value if not found
        if not key in ['file_path']:
            values.append(my_data_db.get(key, value))
    return tuple(values)
 def train_model(
    pretrained_model_name_or_path,
    v2,
    v_parameterization,
    logging_dir,
    train_data_dir,
    reg_data_dir,
    output_dir,
    max_resolution,
    learning_rate,
    lr_scheduler,
    lr_warmup,
    train_batch_size,
    epoch,
    save_every_n_epochs,
    mixed_precision,
    save_precision,
    seed,
    num_cpu_threads_per_process,
    cache_latents,
    caption_extension,
    enable_bucket,
    gradient_checkpointing,
    full_fp16,
    no_token_padding,
    stop_text_encoder_training_pct,
    use_8bit_adam,
    xformers,
    save_model_as,
    shuffle_caption,
    save_state,
    resume,
    prior_loss_weight,
    color_aug,
    flip_aug,
    clip_skip,
    vae,
    output_name,
    max_token_length,
    max_train_epochs,
    max_data_loader_n_workers,
    mem_eff_attn,
    gradient_accumulation_steps,
    model_list,  # Keep this. Yes, it is unused here but required given the common list used
    token_string, init_word, num_vectors_per_token, max_train_steps, weights, template,
 ):
    if pretrained_model_name_or_path == '':
        msgbox('Source model information is missing')
        return
    if train_data_dir == '':
        msgbox('Image folder path is missing')
        return
    if not os.path.exists(train_data_dir):
        msgbox('Image folder does not exist')
        return
    if reg_data_dir != '':
        if not os.path.exists(reg_data_dir):
            msgbox('Regularisation folder does not exist')
            return
    if output_dir == '':
        msgbox('Output folder path is missing')
        return
    if token_string == '':
        msgbox('Token string is missing')
        return
    if init_word == '':
        msgbox('Init word is missing')
        return
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    # Get a list of all subfolders in train_data_dir
    subfolders = [
        f
        for f in os.listdir(train_data_dir)
        if os.path.isdir(os.path.join(train_data_dir, f))
    ]
    total_steps = 0
    # Loop through each subfolder and extract the number of repeats
    for folder in subfolders:
        # Extract the number of repeats from the folder name
        repeats = int(folder.split('_')[0])
        # Count the number of images in the folder
        num_images = len(
            [
                f
                for f in os.listdir(os.path.join(train_data_dir, folder))
                if f.endswith('.jpg')
                or f.endswith('.jpeg')
                or f.endswith('.png')
                or f.endswith('.webp')
            ]
        )
        # Calculate the total number of steps for this folder
        steps = repeats * num_images
        total_steps += steps
        # Print the result
        print(f'Folder {folder}: {steps} steps')
    # Print the result
    # print(f"{total_steps} total steps")
    if reg_data_dir == '':
        reg_factor = 1
    else:
        print(
            'Regularisation images are used... Will double the number of steps required...'
        )
        reg_factor = 2
    # calculate max_train_steps
    if max_train_steps == '':
        max_train_steps = int(
            math.ceil(
                float(total_steps)
                / int(train_batch_size)
                * int(epoch)
                * int(reg_factor)
            )
        )
    else:
        max_train_steps = int(max_train_steps)
    print(f'max_train_steps = {max_train_steps}')
    # calculate stop encoder training
    if stop_text_encoder_training_pct == None:
        stop_text_encoder_training = 0
    else:
        stop_text_encoder_training = math.ceil(
            float(max_train_steps) / 100 * int(stop_text_encoder_training_pct)
        )
    print(f'stop_text_encoder_training = {stop_text_encoder_training}')
    lr_warmup_steps = round(float(int(lr_warmup) * int(max_train_steps) / 100))
    print(f'lr_warmup_steps = {lr_warmup_steps}')
    run_cmd = f'accelerate launch --num_cpu_threads_per_process={num_cpu_threads_per_process} "train_textual_inversion.py"'
    if v2:
        run_cmd += ' --v2'
    if v_parameterization:
        run_cmd += ' --v_parameterization'
    if enable_bucket:
        run_cmd += ' --enable_bucket'
    if no_token_padding:
        run_cmd += ' --no_token_padding'
    run_cmd += (
        f' --pretrained_model_name_or_path="{pretrained_model_name_or_path}"'
    )
    run_cmd += f' --train_data_dir="{train_data_dir}"'
    if len(reg_data_dir):
        run_cmd += f' --reg_data_dir="{reg_data_dir}"'
    run_cmd += f' --resolution={max_resolution}'
    run_cmd += f' --output_dir="{output_dir}"'
    run_cmd += f' --logging_dir="{logging_dir}"'
    if not stop_text_encoder_training == 0:
        run_cmd += (
            f' --stop_text_encoder_training={stop_text_encoder_training}'
        )
    if not save_model_as == 'same as source model':
        run_cmd += f' --save_model_as={save_model_as}'
    # if not resume == '':
    #     run_cmd += f' --resume={resume}'
    if not float(prior_loss_weight) == 1.0:
        run_cmd += f' --prior_loss_weight={prior_loss_weight}'
    if not vae == '':
        run_cmd += f' --vae="{vae}"'
    if not output_name == '':
        run_cmd += f' --output_name="{output_name}"'
    if int(max_token_length) > 75:
        run_cmd += f' --max_token_length={max_token_length}'
    if not max_train_epochs == '':
        run_cmd += f' --max_train_epochs="{max_train_epochs}"'
    if not max_data_loader_n_workers == '':
        run_cmd += (
            f' --max_data_loader_n_workers="{max_data_loader_n_workers}"'
        )
    if int(gradient_accumulation_steps) > 1:
        run_cmd += f' --gradient_accumulation_steps={int(gradient_accumulation_steps)}'
    run_cmd += run_cmd_training(
        learning_rate=learning_rate,
        lr_scheduler=lr_scheduler,
        lr_warmup_steps=lr_warmup_steps,
        train_batch_size=train_batch_size,
        max_train_steps=max_train_steps,
        save_every_n_epochs=save_every_n_epochs,
        mixed_precision=mixed_precision,
        save_precision=save_precision,
        seed=seed,
        caption_extension=caption_extension,
        cache_latents=cache_latents,
    )
    run_cmd += run_cmd_advanced_training(
        max_train_epochs=max_train_epochs,
        max_data_loader_n_workers=max_data_loader_n_workers,
        max_token_length=max_token_length,
        resume=resume,
        save_state=save_state,
        mem_eff_attn=mem_eff_attn,
        clip_skip=clip_skip,
        flip_aug=flip_aug,
        color_aug=color_aug,
        shuffle_caption=shuffle_caption,
        gradient_checkpointing=gradient_checkpointing,
        full_fp16=full_fp16,
        xformers=xformers,
        use_8bit_adam=use_8bit_adam,
    )
    run_cmd += f' --token_string={token_string}'
    run_cmd += f' --init_word={init_word}'
    run_cmd += f' --num_vectors_per_token={num_vectors_per_token}'
    if not weights == '':
        run_cmd += f' --weights="{weights}"'
    if template == 'object template':
        run_cmd += f' --use_object_template'
    elif template == 'style template':
        run_cmd += f' --use_style_template'
    print(run_cmd)
    # Run the command
    subprocess.run(run_cmd)
    # check if output_dir/last is a folder... therefore it is a diffuser model
    last_dir = pathlib.Path(f'{output_dir}/{output_name}')
    if not last_dir.is_dir():
        # Copy inference model for v2 if required
        save_inference_file(output_dir, v2, v_parameterization, output_name)
 def UI(username, password):
    css = ''
    if os.path.exists('./style.css'):
        with open(os.path.join('./style.css'), 'r', encoding='utf8') as file:
            print('Load CSS...')
            css += file.read() + '\n'
    interface = gr.Blocks(css=css)
    with interface:
        with gr.Tab('Dreambooth TI'):
            (
                train_data_dir_input,
                reg_data_dir_input,
                output_dir_input,
                logging_dir_input,
            ) = ti_tab()
        with gr.Tab('Utilities'):
            utilities_tab(
                train_data_dir_input=train_data_dir_input,
                reg_data_dir_input=reg_data_dir_input,
                output_dir_input=output_dir_input,
                logging_dir_input=logging_dir_input,
                enable_copy_info_button=True,
            )
        # Show the interface
    if not username == '':
        interface.launch(auth=(username, password))
    else:
        interface.launch()
 def ti_tab(
    train_data_dir=gr.Textbox(),
    reg_data_dir=gr.Textbox(),
    output_dir=gr.Textbox(),
    logging_dir=gr.Textbox(),
 ):
    dummy_db_true = gr.Label(value=True, visible=False)
    dummy_db_false = gr.Label(value=False, visible=False)
    gr.Markdown('Train a TI using kohya textual inversion python code...')
    (
        button_open_config,
        button_save_config,
        button_save_as_config,
        config_file_name,
    ) = gradio_config()
    (
        pretrained_model_name_or_path,
        v2,
        v_parameterization,
        save_model_as,
        model_list,
    ) = gradio_source_model()
    with gr.Tab('Folders'):
        with gr.Row():
            train_data_dir = gr.Textbox(
                label='Image folder',
                placeholder='Folder where the training folders containing the images are located',
            )
            train_data_dir_input_folder = gr.Button(
                '📂', elem_id='open_folder_small'
            )
            train_data_dir_input_folder.click(
                get_folder_path, outputs=train_data_dir
            )
            reg_data_dir = gr.Textbox(
                label='Regularisation folder',
                placeholder='(Optional) Folder where where the regularization folders containing the images are located',
            )
            reg_data_dir_input_folder = gr.Button(
                '📂', elem_id='open_folder_small'
            )
            reg_data_dir_input_folder.click(
                get_folder_path, outputs=reg_data_dir
            )
        with gr.Row():
            output_dir = gr.Textbox(
                label='Model output folder',
                placeholder='Folder to output trained model',
            )
            output_dir_input_folder = gr.Button(
                '📂', elem_id='open_folder_small'
            )
            output_dir_input_folder.click(get_folder_path, outputs=output_dir)
            logging_dir = gr.Textbox(
                label='Logging folder',
                placeholder='Optional: enable logging and output TensorBoard log to this folder',
            )
            logging_dir_input_folder = gr.Button(
                '📂', elem_id='open_folder_small'
            )
            logging_dir_input_folder.click(
                get_folder_path, outputs=logging_dir
            )
        with gr.Row():
            output_name = gr.Textbox(
                label='Model output name',
                placeholder='Name of the model to output',
                value='last',
                interactive=True,
            )
        train_data_dir.change(
            remove_doublequote,
            inputs=[train_data_dir],
            outputs=[train_data_dir],
        )
        reg_data_dir.change(
            remove_doublequote,
            inputs=[reg_data_dir],
            outputs=[reg_data_dir],
        )
        output_dir.change(
            remove_doublequote,
            inputs=[output_dir],
            outputs=[output_dir],
        )
        logging_dir.change(
            remove_doublequote,
            inputs=[logging_dir],
            outputs=[logging_dir],
        )
    with gr.Tab('Training parameters'):
        with gr.Row():
            weights = gr.Textbox(
                label='Resume TI training',
                placeholder='(Optional) Path to existing TI embeding file to keep training',
            )
            weights_file_input = gr.Button(
                '📂', elem_id='open_folder_small'
            )
            weights_file_input.click(get_file_path, outputs=weights)
        with gr.Row():
            token_string = gr.Textbox(
                label='Token string',
                placeholder='eg: cat',
            )
            init_word = gr.Textbox(
                label='Init word',
                value='*',
            )
            num_vectors_per_token = gr.Slider(
                minimum=1,
                maximum=75,
                value=1,
                step=1,
                label='Vectors',
            )
            max_train_steps = gr.Textbox(
                label='Max train steps',
                placeholder='(Optional) Maximum number of steps',
            )
            template = gr.Dropdown(
                label='Template',
                choices=[
                    'caption',
                    'object template',
                    'style template',
                ],
                value='caption',
            )
        (
            learning_rate,
            lr_scheduler,
            lr_warmup,
            train_batch_size,
            epoch,
            save_every_n_epochs,
            mixed_precision,
            save_precision,
            num_cpu_threads_per_process,
            seed,
            caption_extension,
            cache_latents,
        ) = gradio_training(
            learning_rate_value='1e-5',
            lr_scheduler_value='cosine',
            lr_warmup_value='10',
        )
        with gr.Row():
            max_resolution = gr.Textbox(
                label='Max resolution',
                value='512,512',
                placeholder='512,512',
            )
            stop_text_encoder_training = gr.Slider(
                minimum=0,
                maximum=100,
                value=0,
                step=1,
                label='Stop text encoder training',
            )
            enable_bucket = gr.Checkbox(label='Enable buckets', value=True)
        with gr.Accordion('Advanced Configuration', open=False):
            with gr.Row():
                no_token_padding = gr.Checkbox(
                    label='No token padding', value=False
                )
                gradient_accumulation_steps = gr.Number(
                    label='Gradient accumulate steps', value='1'
                )
            with gr.Row():
                prior_loss_weight = gr.Number(
                    label='Prior loss weight', value=1.0
                )
                vae = gr.Textbox(
                    label='VAE',
                    placeholder='(Optiona) path to checkpoint of vae to replace for training',
                )
                vae_button = gr.Button('📂', elem_id='open_folder_small')
                vae_button.click(get_any_file_path, outputs=vae)
            (
                use_8bit_adam,
                xformers,
                full_fp16,
                gradient_checkpointing,
                shuffle_caption,
                color_aug,
                flip_aug,
                clip_skip,
                mem_eff_attn,
                save_state,
                resume,
                max_token_length,
                max_train_epochs,
                max_data_loader_n_workers,
            ) = gradio_advanced_training()
            color_aug.change(
                color_aug_changed,
                inputs=[color_aug],
                outputs=[cache_latents],
            )
    with gr.Tab('Tools'):
        gr.Markdown(
            'This section provide Dreambooth tools to help setup your dataset...'
        )
        gradio_dreambooth_folder_creation_tab(
            train_data_dir_input=train_data_dir,
            reg_data_dir_input=reg_data_dir,
            output_dir_input=output_dir,
            logging_dir_input=logging_dir,
        )
    button_run = gr.Button('Train TI')
    settings_list = [
        pretrained_model_name_or_path,
        v2,
        v_parameterization,
        logging_dir,
        train_data_dir,
        reg_data_dir,
        output_dir,
        max_resolution,
        learning_rate,
        lr_scheduler,
        lr_warmup,
        train_batch_size,
        epoch,
        save_every_n_epochs,
        mixed_precision,
        save_precision,
        seed,
        num_cpu_threads_per_process,
        cache_latents,
        caption_extension,
        enable_bucket,
        gradient_checkpointing,
        full_fp16,
        no_token_padding,
        stop_text_encoder_training,
        use_8bit_adam,
        xformers,
        save_model_as,
        shuffle_caption,
        save_state,
        resume,
        prior_loss_weight,
        color_aug,
        flip_aug,
        clip_skip,
        vae,
        output_name,
        max_token_length,
        max_train_epochs,
        max_data_loader_n_workers,
        mem_eff_attn,
        gradient_accumulation_steps,
        model_list,
        token_string, init_word, num_vectors_per_token, max_train_steps, weights, template,
    ]
    button_open_config.click(
        open_configuration,
        inputs=[config_file_name] + settings_list,
        outputs=[config_file_name] + settings_list,
    )
    button_save_config.click(
        save_configuration,
        inputs=[dummy_db_false, config_file_name] + settings_list,
        outputs=[config_file_name],
    )
    button_save_as_config.click(
        save_configuration,
        inputs=[dummy_db_true, config_file_name] + settings_list,
        outputs=[config_file_name],
    )
    button_run.click(
        train_model,
        inputs=settings_list,
    )
    return (
        train_data_dir,
        reg_data_dir,
        output_dir,
        logging_dir,
    )
 if __name__ == '__main__':
    # torch.cuda.set_per_process_memory_fraction(0.48)
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--username', type=str, default='', help='Username for authentication'
    )
    parser.add_argument(
        '--password', type=str, default='', help='Password for authentication'
    )
    args = parser.parse_args()
    UI(username=args.username, password=args.password)
--- a/train_db_README-ja.md
+++ b/train_db_README-ja.md
@ -72,7 +72,7 @@ identifierとclassを使い、たとえば「shs dog」などでモデルを学
 ※LoRA等の追加ネットワークを学習する場合のコマンドは ``train_db.py`` ではなく ``train_network.py`` となります。また追加でnetwork_\*オプションが必要となりますので、LoRAのガイドを参照してください。
 ```
-accelerate launch --num_cpu_threads_per_process 8 train_db.py 
+accelerate launch --num_cpu_threads_per_process 1 train_db.py 
    --pretrained_model_name_or_path=<.ckptまたは.safetensordまたはDiffusers版モデルのディレクトリ> 
    --train_data_dir=<学習用データのディレクトリ> 
    --reg_data_dir=<正則化画像のディレクトリ> 
@ -89,7 +89,7 @@ accelerate launch --num_cpu_threads_per_process 8 train_db.py
    --gradient_checkpointing
 ```
-num_cpu_threads_per_processにはCPUコア数を指定するとよいようです。
+num_cpu_threads_per_processには通常は1を指定するとよいようです。
 pretrained_model_name_or_pathに追加学習を行う元となるモデルを指定します。Stable Diffusionのcheckpointファイル（.ckptまたは.safetensors）、Diffusersのローカルディスクにあるモデルディレクトリ、DiffusersのモデルID（"stabilityai/stable-diffusion-2"など）が指定できます。学習後のモデルの保存形式はデフォルトでは元のモデルと同じになります（save_model_asオプションで変更できます）。
@ -159,7 +159,7 @@ v2.xモデルでWebUIで画像生成する場合、モデルの仕様が記述
 ![image](https://user-images.githubusercontent.com/52813779/210776915-061d79c3-6582-42c2-8884-8b91d2f07313.png)
-各yamlファイルは[https://github.com/Stability-AI/stablediffusion/tree/main/configs/stable-diffusion](Stability AIのSD2.0のリポジトリ)にあります。
+各yamlファイルは[Stability AIのSD2.0のリポジトリ](https://github.com/Stability-AI/stablediffusion/tree/main/configs/stable-diffusion)にあります。
 # その他の学習オプション
--- a/train_network.py
+++ b/train_network.py
@ -212,6 +212,8 @@ def train(args):
  # epoch数を計算する
  num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
  num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
  if (args.save_n_epoch_ratio is not None) and (args.save_n_epoch_ratio > 0):
    args.save_every_n_epochs = math.floor(num_train_epochs / args.save_n_epoch_ratio) or 1
  # 学習する
  total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
@ -264,6 +266,7 @@ def train(args):
      "ss_keep_tokens": args.keep_tokens,
      "ss_dataset_dirs": json.dumps(train_dataset.dataset_dirs_info),
      "ss_reg_dataset_dirs": json.dumps(train_dataset.reg_dataset_dirs_info),
      "ss_bucket_info": json.dumps(train_dataset.bucket_info),
      "ss_training_comment": args.training_comment        # will not be updated after training
  }
@ -437,8 +440,8 @@ if __name__ == '__main__':
  train_util.add_training_arguments(parser, True)
  parser.add_argument("--no_metadata", action='store_true', help="do not save metadata in output model / メタデータを出力先モデルに保存しない")
-  parser.add_argument("--save_model_as", type=str, default="pt", choices=[None, "ckpt", "pt", "safetensors"],
+  parser.add_argument("--save_model_as", type=str, default="safetensors", choices=[None, "ckpt", "pt", "safetensors"],
-                      help="format to save the model (default is .pt) / モデル保存時の形式（デフォルトはpt）")
+                      help="format to save the model (default is .safetensors) / モデル保存時の形式（デフォルトはsafetensors）")
  parser.add_argument("--unet_lr", type=float, default=None, help="learning rate for U-Net / U-Netの学習率")
  parser.add_argument("--text_encoder_lr", type=float, default=None, help="learning rate for Text Encoder / Text Encoderの学習率")
--- a/train_network_README-ja.md
+++ b/train_network_README-ja.md
@ -24,7 +24,7 @@ DreamBoothの手法（identifier（sksなど）とclass、オプションで正
 [DreamBoothのガイド](./train_db_README-ja.md) を参照してデータを用意してください。
-学習するとき、train_db.pyの代わりにtrain_network.pyを指定してください。
+学習するとき、train_db.pyの代わりにtrain_network.pyを指定してください。そして「LoRAの学習のためのオプション」にあるようにLoRA関連のオプション（``network_dim``や``network_alpha``など）を追加してください。
 ほぼすべてのオプション（Stable Diffusionのモデル保存関係を除く）が使えますが、stop_text_encoder_trainingはサポートしていません。
@ -32,7 +32,7 @@ DreamBoothの手法（identifier（sksなど）とclass、オプションで正
 [fine-tuningのガイド](./fine_tune_README_ja.md) を参照し、各手順を実行してください。
-学習するとき、fine_tune.pyの代わりにtrain_network.pyを指定してください。ほぼすべてのオプション（モデル保存関係を除く）がそのまま使えます。
+学習するとき、fine_tune.pyの代わりにtrain_network.pyを指定してください。ほぼすべてのオプション（モデル保存関係を除く）がそのまま使えます。そして「LoRAの学習のためのオプション」にあるようにLoRA関連のオプション（``network_dim``や``network_alpha``など）を追加してください。
 なお「latentsの事前取得」は行わなくても動作します。VAEから学習時（またはキャッシュ時）にlatentを取得するため学習速度は遅くなりますが、代わりにcolor_augが使えるようになります。
@ -45,7 +45,7 @@ train_network.pyでは--network_moduleオプションに、学習対象のモジ
 以下はコマンドラインの例です（DreamBooth手法）。
 ```
-accelerate launch --num_cpu_threads_per_process 12 train_network.py 
+accelerate launch --num_cpu_threads_per_process 1 train_network.py 
    --pretrained_model_name_or_path=..\models\model.ckpt 
    --train_data_dir=..\data\db\char1 --output_dir=..\lora_train1 
    --reg_data_dir=..\data\db\reg1 --prior_loss_weight=1.0 
@ -60,7 +60,9 @@ accelerate launch --num_cpu_threads_per_process 12 train_network.py
 その他、以下のオプションが指定できます。
 * --network_dim
-  * LoRAの次元数を指定します（``--networkdim=4``など）。省略時は4になります。数が多いほど表現力は増しますが、学習に必要なメモリ、時間は増えます。また闇雲に増やしても良くないようです。
+  * LoRAのRANKを指定します（``--networkdim=4``など）。省略時は4になります。数が多いほど表現力は増しますが、学習に必要なメモリ、時間は増えます。また闇雲に増やしても良くないようです。
 * --network_alpha
  *  アンダーフローを防ぎ安定して学習するための ``alpha`` 値を指定します。デフォルトは1です。``network_dim``と同じ値を指定すると以前のバージョンと同じ動作になります。
 * --network_weights
  * 学習前に学習済みのLoRAの重みを読み込み、そこから追加で学習します。
 * --network_train_unet_only
@ -126,7 +128,7 @@ python networks\merge_lora.py
 --ratiosにそれぞれのモデルの比率（どのくらい重みを元モデルに反映するか）を0~1.0の数値で指定します。二つのモデルを一対一でマージす場合は、「0.5 0.5」になります。「1.0 1.0」では合計の重みが大きくなりすぎて、恐らく結果はあまり望ましくないものになると思われます。
-v1で学習したLoRAとv2で学習したLoRA、次元数の異なるLoRAはマージできません。U-NetだけのLoRAとU-Net+Text EncoderのLoRAはマージできるはずですが、結果は未知数です。
+v1で学習したLoRAとv2で学習したLoRA、rank（次元数）や``alpha``の異なるLoRAはマージできません。U-NetだけのLoRAとU-Net+Text EncoderのLoRAはマージできるはずですが、結果は未知数です。
 ### その他のオプション
--- a/train_textual_inversion.py
+++ b/train_textual_inversion.py
@ -0,0 +1,498 @@
 import importlib
 import argparse
 import gc
 import math
 import os
 from tqdm import tqdm
 import torch
 from accelerate.utils import set_seed
 import diffusers
 from diffusers import DDPMScheduler
 import library.train_util as train_util
 from library.train_util import DreamBoothDataset, FineTuningDataset
 imagenet_templates_small = [
    "a photo of a {}",
    "a rendering of a {}",
    "a cropped photo of the {}",
    "the photo of a {}",
    "a photo of a clean {}",
    "a photo of a dirty {}",
    "a dark photo of the {}",
    "a photo of my {}",
    "a photo of the cool {}",
    "a close-up photo of a {}",
    "a bright photo of the {}",
    "a cropped photo of a {}",
    "a photo of the {}",
    "a good photo of the {}",
    "a photo of one {}",
    "a close-up photo of the {}",
    "a rendition of the {}",
    "a photo of the clean {}",
    "a rendition of a {}",
    "a photo of a nice {}",
    "a good photo of a {}",
    "a photo of the nice {}",
    "a photo of the small {}",
    "a photo of the weird {}",
    "a photo of the large {}",
    "a photo of a cool {}",
    "a photo of a small {}",
 ]
 imagenet_style_templates_small = [
    "a painting in the style of {}",
    "a rendering in the style of {}",
    "a cropped painting in the style of {}",
    "the painting in the style of {}",
    "a clean painting in the style of {}",
    "a dirty painting in the style of {}",
    "a dark painting in the style of {}",
    "a picture in the style of {}",
    "a cool painting in the style of {}",
    "a close-up painting in the style of {}",
    "a bright painting in the style of {}",
    "a cropped painting in the style of {}",
    "a good painting in the style of {}",
    "a close-up painting in the style of {}",
    "a rendition in the style of {}",
    "a nice painting in the style of {}",
    "a small painting in the style of {}",
    "a weird painting in the style of {}",
    "a large painting in the style of {}",
 ]
 def collate_fn(examples):
  return examples[0]
 def train(args):
  if args.output_name is None:
    args.output_name = args.token_string
  use_template = args.use_object_template or args.use_style_template
  train_util.verify_training_args(args)
  train_util.prepare_dataset_args(args, True)
  cache_latents = args.cache_latents
  use_dreambooth_method = args.in_json is None
  if args.seed is not None:
    set_seed(args.seed)
  tokenizer = train_util.load_tokenizer(args)
  # acceleratorを準備する
  print("prepare accelerator")
  accelerator, unwrap_model = train_util.prepare_accelerator(args)
  # mixed precisionに対応した型を用意しておき適宜castする
  weight_dtype, save_dtype = train_util.prepare_dtype(args)
  # モデルを読み込む
  text_encoder, vae, unet, _ = train_util.load_target_model(args, weight_dtype)
  # Convert the init_word to token_id
  if args.init_word is not None:
    init_token_id = tokenizer.encode(args.init_word, add_special_tokens=False)
    assert len(
        init_token_id) == 1, f"init word {args.init_word} is not converted to single token / 初期化単語が二つ以上のトークンに変換されます。別の単語を使ってください"
    init_token_id = init_token_id[0]
  else:
    init_token_id = None
  # add new word to tokenizer, count is num_vectors_per_token
  token_strings = [args.token_string] + [f"{args.token_string}{i+1}" for i in range(args.num_vectors_per_token - 1)]
  num_added_tokens = tokenizer.add_tokens(token_strings)
  assert num_added_tokens == args.num_vectors_per_token, f"tokenizer has same word to token string. please use another one / 指定したargs.token_stringは既に存在します。別の単語を使ってください: {args.token_string}"
  token_ids = tokenizer.convert_tokens_to_ids(token_strings)
  print(f"tokens are added: {token_ids}")
  assert min(token_ids) == token_ids[0] and token_ids[-1] == token_ids[0] + len(token_ids) - 1, f"token ids is not ordered"
  assert len(tokenizer) - 1 == token_ids[-1], f"token ids is not end of tokenize: {len(tokenizer)}"
  # Resize the token embeddings as we are adding new special tokens to the tokenizer
  text_encoder.resize_token_embeddings(len(tokenizer))
  # Initialise the newly added placeholder token with the embeddings of the initializer token
  token_embeds = text_encoder.get_input_embeddings().weight.data
  if init_token_id is not None:
    for token_id in token_ids:
      token_embeds[token_id] = token_embeds[init_token_id]
      # print(token_id, token_embeds[token_id].mean(), token_embeds[token_id].min())
  # load weights
  if args.weights is not None:
    embeddings = load_weights(args.weights)
    assert len(token_ids) == len(
        embeddings), f"num_vectors_per_token is mismatch for weights / 指定した重みとnum_vectors_per_tokenの値が異なります: {len(embeddings)}"
    # print(token_ids, embeddings.size())
    for token_id, embedding in zip(token_ids, embeddings):
      token_embeds[token_id] = embedding
      # print(token_id, token_embeds[token_id].mean(), token_embeds[token_id].min())
    print(f"weighs loaded")
  print(f"create embeddings for {args.num_vectors_per_token} tokens, for {args.token_string}")
  # データセットを準備する
  if use_dreambooth_method:
    print("Use DreamBooth method.")
    train_dataset = DreamBoothDataset(args.train_batch_size, args.train_data_dir, args.reg_data_dir,
                                      tokenizer, args.max_token_length, args.caption_extension, args.shuffle_caption, args.keep_tokens,
                                      args.resolution, args.enable_bucket, args.min_bucket_reso, args.max_bucket_reso, args.prior_loss_weight,
                                      args.flip_aug, args.color_aug, args.face_crop_aug_range, args.random_crop, args.debug_dataset)
  else:
    print("Train with captions.")
    train_dataset = FineTuningDataset(args.in_json, args.train_batch_size, args.train_data_dir,
                                      tokenizer, args.max_token_length, args.shuffle_caption, args.keep_tokens,
                                      args.resolution, args.enable_bucket, args.min_bucket_reso, args.max_bucket_reso,
                                      args.flip_aug, args.color_aug, args.face_crop_aug_range, args.random_crop,
                                      args.dataset_repeats, args.debug_dataset)
  # make captions: tokenstring tokenstring1 tokenstring2 ...tokenstringn という文字列に書き換える超乱暴な実装
  if use_template:
    print("use template for training captions. is object: {args.use_object_template}")
    templates = imagenet_templates_small if args.use_object_template else imagenet_style_templates_small
    replace_to = " ".join(token_strings)
    captions = []
    for tmpl in templates:
      captions.append(tmpl.format(replace_to))
    train_dataset.add_replacement("", captions)
  elif args.num_vectors_per_token > 1:
    replace_to = " ".join(token_strings)
    train_dataset.add_replacement(args.token_string, replace_to)
  train_dataset.make_buckets()
  if args.debug_dataset:
    train_util.debug_dataset(train_dataset, show_input_ids=True)
    return
  if len(train_dataset) == 0:
    print("No data found. Please verify arguments / 画像がありません。引数指定を確認してください")
    return
  # モデルに xformers とか memory efficient attention を組み込む
  train_util.replace_unet_modules(unet, args.mem_eff_attn, args.xformers)
  # 学習を準備する
  if cache_latents:
    vae.to(accelerator.device, dtype=weight_dtype)
    vae.requires_grad_(False)
    vae.eval()
    with torch.no_grad():
      train_dataset.cache_latents(vae)
    vae.to("cpu")
    if torch.cuda.is_available():
      torch.cuda.empty_cache()
    gc.collect()
  if args.gradient_checkpointing:
    unet.enable_gradient_checkpointing()
    text_encoder.gradient_checkpointing_enable()
  # 学習に必要なクラスを準備する
  print("prepare optimizer, data loader etc.")
  # 8-bit Adamを使う
  if args.use_8bit_adam:
    try:
      import bitsandbytes as bnb
    except ImportError:
      raise ImportError("No bitsand bytes / bitsandbytesがインストールされていないようです")
    print("use 8-bit Adam optimizer")
    optimizer_class = bnb.optim.AdamW8bit
  else:
    optimizer_class = torch.optim.AdamW
  trainable_params = text_encoder.get_input_embeddings().parameters()
  # betaやweight decayはdiffusers DreamBoothもDreamBooth SDもデフォルト値のようなのでオプションはとりあえず省略
  optimizer = optimizer_class(trainable_params, lr=args.learning_rate)
  # dataloaderを準備する
  # DataLoaderのプロセス数：0はメインプロセスになる
  n_workers = min(args.max_data_loader_n_workers, os.cpu_count() - 1)      # cpu_count-1 ただし最大で指定された数まで
  train_dataloader = torch.utils.data.DataLoader(
      train_dataset, batch_size=1, shuffle=False, collate_fn=collate_fn, num_workers=n_workers)
  # 学習ステップ数を計算する
  if args.max_train_epochs is not None:
    args.max_train_steps = args.max_train_epochs * len(train_dataloader)
    print(f"override steps. steps for {args.max_train_epochs} epochs is / 指定エポックまでのステップ数: {args.max_train_steps}")
  # lr schedulerを用意する
  lr_scheduler = diffusers.optimization.get_scheduler(
      args.lr_scheduler, optimizer, num_warmup_steps=args.lr_warmup_steps, num_training_steps=args.max_train_steps * args.gradient_accumulation_steps)
  # acceleratorがなんかよろしくやってくれるらしい
  text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
      text_encoder, optimizer, train_dataloader, lr_scheduler)
  index_no_updates = torch.arange(len(tokenizer)) < token_ids[0]
  print(len(index_no_updates), torch.sum(index_no_updates))
  orig_embeds_params = unwrap_model(text_encoder).get_input_embeddings().weight.data.detach().clone()
  # Freeze all parameters except for the token embeddings in text encoder
  text_encoder.requires_grad_(True)
  text_encoder.text_model.encoder.requires_grad_(False)
  text_encoder.text_model.final_layer_norm.requires_grad_(False)
  text_encoder.text_model.embeddings.position_embedding.requires_grad_(False)
  # text_encoder.text_model.embeddings.token_embedding.requires_grad_(True)
  unet.requires_grad_(False)
  unet.to(accelerator.device, dtype=weight_dtype)
  if args.gradient_checkpointing:                       # according to TI example in Diffusers, train is required
    unet.train()
  else:
    unet.eval()
  if not cache_latents:
    vae.requires_grad_(False)
    vae.eval()
    vae.to(accelerator.device, dtype=weight_dtype)
  # 実験的機能：勾配も含めたfp16学習を行う　PyTorchにパッチを当ててfp16でのgrad scaleを有効にする
  if args.full_fp16:
    train_util.patch_accelerator_for_fp16_training(accelerator)
    text_encoder.to(weight_dtype)
  # resumeする
  if args.resume is not None:
    print(f"resume training from state: {args.resume}")
    accelerator.load_state(args.resume)
  # epoch数を計算する
  num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
  num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
  if (args.save_n_epoch_ratio is not None) and (args.save_n_epoch_ratio > 0):
    args.save_every_n_epochs = math.floor(num_train_epochs / args.save_n_epoch_ratio) or 1
  # 学習する
  total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
  print("running training / 学習開始")
  print(f"  num train images * repeats / 学習画像の数×繰り返し回数: {train_dataset.num_train_images}")
  print(f"  num reg images / 正則化画像の数: {train_dataset.num_reg_images}")
  print(f"  num batches per epoch / 1epochのバッチ数: {len(train_dataloader)}")
  print(f"  num epochs / epoch数: {num_train_epochs}")
  print(f"  batch size per device / バッチサイズ: {args.train_batch_size}")
  print(f"  total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: {total_batch_size}")
  print(f"  gradient ccumulation steps / 勾配を合計するステップ数 = {args.gradient_accumulation_steps}")
  print(f"  total optimization steps / 学習ステップ数: {args.max_train_steps}")
  progress_bar = tqdm(range(args.max_train_steps), smoothing=0, disable=not accelerator.is_local_main_process, desc="steps")
  global_step = 0
  noise_scheduler = DDPMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear",
                                  num_train_timesteps=1000, clip_sample=False)
  if accelerator.is_main_process:
    accelerator.init_trackers("textual_inversion")
  for epoch in range(num_train_epochs):
    print(f"epoch {epoch+1}/{num_train_epochs}")
    text_encoder.train()
    loss_total = 0
    bef_epo_embs = unwrap_model(text_encoder).get_input_embeddings().weight[token_ids].data.detach().clone()
    for step, batch in enumerate(train_dataloader):
      with accelerator.accumulate(text_encoder):
        with torch.no_grad():
          if "latents" in batch and batch["latents"] is not None:
            latents = batch["latents"].to(accelerator.device)
          else:
            # latentに変換
            latents = vae.encode(batch["images"].to(dtype=weight_dtype)).latent_dist.sample()
          latents = latents * 0.18215
        b_size = latents.shape[0]
        # Get the text embedding for conditioning
        input_ids = batch["input_ids"].to(accelerator.device)
        encoder_hidden_states = train_util.get_hidden_states(args, input_ids, tokenizer, text_encoder, torch.float) # weight_dtype) use float instead of fp16/bf16 because text encoder is float
        # Sample noise that we'll add to the latents
        noise = torch.randn_like(latents, device=latents.device)
        # Sample a random timestep for each image
        timesteps = torch.randint(0, noise_scheduler.config.num_train_timesteps, (b_size,), device=latents.device)
        timesteps = timesteps.long()
        # Add noise to the latents according to the noise magnitude at each timestep
        # (this is the forward diffusion process)
        noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
        # Predict the noise residual
        noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
        if args.v_parameterization:
          # v-parameterization training
          target = noise_scheduler.get_velocity(latents, noise, timesteps)
        else:
          target = noise
        loss = torch.nn.functional.mse_loss(noise_pred.float(), target.float(), reduction="none")
        loss = loss.mean([1, 2, 3])
        loss_weights = batch["loss_weights"]                      # 各sampleごとのweight
        loss = loss * loss_weights
        loss = loss.mean()                # 平均なのでbatch_sizeで割る必要なし
        accelerator.backward(loss)
        if accelerator.sync_gradients:
          params_to_clip = text_encoder.get_input_embeddings().parameters()
          accelerator.clip_grad_norm_(params_to_clip, 1.0)  # args.max_grad_norm)
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad(set_to_none=True)
        # Let's make sure we don't update any embedding weights besides the newly added token
        with torch.no_grad():
          unwrap_model(text_encoder).get_input_embeddings().weight[index_no_updates] = orig_embeds_params[index_no_updates]
      # Checks if the accelerator has performed an optimization step behind the scenes
      if accelerator.sync_gradients:
        progress_bar.update(1)
        global_step += 1
      current_loss = loss.detach().item()
      if args.logging_dir is not None:
        logs = {"loss": current_loss, "lr": lr_scheduler.get_last_lr()[0]}
        accelerator.log(logs, step=global_step)
      loss_total += current_loss
      avr_loss = loss_total / (step+1)
      logs = {"loss": avr_loss}  # , "lr": lr_scheduler.get_last_lr()[0]}
      progress_bar.set_postfix(**logs)
      if global_step >= args.max_train_steps:
        break
    if args.logging_dir is not None:
      logs = {"loss/epoch": loss_total / len(train_dataloader)}
      accelerator.log(logs, step=epoch+1)
    accelerator.wait_for_everyone()
    updated_embs = unwrap_model(text_encoder).get_input_embeddings().weight[token_ids].data.detach().clone()
    d = updated_embs - bef_epo_embs
    print(bef_epo_embs.size(), updated_embs.size(), d.mean(), d.min())
    if args.save_every_n_epochs is not None:
      model_name = train_util.DEFAULT_EPOCH_NAME if args.output_name is None else args.output_name
      def save_func():
        ckpt_name = train_util.EPOCH_FILE_NAME.format(model_name, epoch + 1) + '.' + args.save_model_as
        ckpt_file = os.path.join(args.output_dir, ckpt_name)
        print(f"saving checkpoint: {ckpt_file}")
        save_weights(ckpt_file, updated_embs, save_dtype)
      def remove_old_func(old_epoch_no):
        old_ckpt_name = train_util.EPOCH_FILE_NAME.format(model_name, old_epoch_no) + '.' + args.save_model_as
        old_ckpt_file = os.path.join(args.output_dir, old_ckpt_name)
        if os.path.exists(old_ckpt_file):
          print(f"removing old checkpoint: {old_ckpt_file}")
          os.remove(old_ckpt_file)
      saving = train_util.save_on_epoch_end(args, save_func, remove_old_func, epoch + 1, num_train_epochs)
      if saving and args.save_state:
        train_util.save_state_on_epoch_end(args, accelerator, model_name, epoch + 1)
    # end of epoch
  is_main_process = accelerator.is_main_process
  if is_main_process:
    text_encoder = unwrap_model(text_encoder)
  accelerator.end_training()
  if args.save_state:
    train_util.save_state_on_train_end(args, accelerator)
  updated_embs = text_encoder.get_input_embeddings().weight[token_ids].data.detach().clone()
  del accelerator                         # この後メモリを使うのでこれは消す
  if is_main_process:
    os.makedirs(args.output_dir, exist_ok=True)
    model_name = train_util.DEFAULT_LAST_OUTPUT_NAME if args.output_name is None else args.output_name
    ckpt_name = model_name + '.' + args.save_model_as
    ckpt_file = os.path.join(args.output_dir, ckpt_name)
    print(f"save trained model to {ckpt_file}")
    save_weights(ckpt_file, updated_embs, save_dtype)
    print("model saved.")
 def save_weights(file, updated_embs, save_dtype):
  state_dict = {"emb_params": updated_embs}
  if save_dtype is not None:
    for key in list(state_dict.keys()):
      v = state_dict[key]
      v = v.detach().clone().to("cpu").to(save_dtype)
      state_dict[key] = v
  if os.path.splitext(file)[1] == '.safetensors':
    from safetensors.torch import save_file
    save_file(state_dict, file)
  else:
    torch.save(state_dict, file)                    # can be loaded in Web UI
 def load_weights(file):
  if os.path.splitext(file)[1] == '.safetensors':
    from safetensors.torch import load_file
    data = load_file(file)
  else:
    # compatible to Web UI's file format
    data = torch.load(file, map_location='cpu')
    if type(data) != dict:
      raise ValueError(f"weight file is not dict / 重みファイルがdict形式ではありません: {file}")
    if 'string_to_param' in data:                           # textual inversion embeddings
      data = data['string_to_param']
      if hasattr(data, '_parameters'):                      # support old PyTorch?
        data = getattr(data, '_parameters')
  emb = next(iter(data.values()))
  if type(emb) != torch.Tensor:
    raise ValueError(f"weight file does not contains Tensor / 重みファイルのデータがTensorではありません: {file}")
  if len(emb.size()) == 1:
    emb = emb.unsqueeze(0)
  return emb
 if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  train_util.add_sd_models_arguments(parser)
  train_util.add_dataset_arguments(parser, True, True)
  train_util.add_training_arguments(parser, True)
  parser.add_argument("--save_model_as", type=str, default="pt", choices=[None, "ckpt", "pt", "safetensors"],
                      help="format to save the model (default is .pt) / モデル保存時の形式（デフォルトはpt）")
  parser.add_argument("--weights", type=str, default=None,
                      help="embedding weights to initialize / 学習するネットワークの初期重み")
  parser.add_argument("--num_vectors_per_token", type=int, default=1,
                      help='number of vectors per token / トークンに割り当てるembeddingsの要素数')
  parser.add_argument("--token_string", type=str, default=None,
                      help="token string used in training, must not exist in tokenizer / 学習時に使用されるトークン文字列、tokenizerに存在しない文字であること")
  parser.add_argument("--init_word", type=str, default=None,
                      help="word to initialize vector / ベクトルを初期化に使用する単語、tokenizerで一語になること")
  parser.add_argument("--use_object_template", action='store_true',
                      help="ignore caption and use default templates for object / キャプションは使わずデフォルトの物体用テンプレートで学習する")
  parser.add_argument("--use_style_template", action='store_true',
                      help="ignore caption and use default templates for stype / キャプションは使わずデフォルトのスタイル用テンプレートで学習する")
  args = parser.parse_args()
  train(args)
--- a/train_ti_README-ja.md
+++ b/train_ti_README-ja.md
@ -0,0 +1,63 @@
 ## Textual Inversionの学習について
 [Textual Inversion](https://textual-inversion.github.io/)です。実装に当たっては https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion を大いに参考にしました。
 学習したモデルはWeb UIでもそのまま使えます。
 なお恐らくSD2.xにも対応していますが現時点では未テストです。
 ## 学習方法
 ``train_textual_inversion.py`` を用います。
 データの準備については ``train_network.py`` と全く同じですので、[そちらのドキュメント](./train_network_README-ja.md)を参照してください。
 ## オプション
 以下はコマンドラインの例です（DreamBooth手法）。
 ```
 accelerate launch --num_cpu_threads_per_process 1 train_textual_inversion.py 
    --pretrained_model_name_or_path=..\models\model.ckpt 
    --train_data_dir=..\data\db\char1 --output_dir=..\ti_train1 
    --resolution=448,640 --train_batch_size=1 --learning_rate=1e-4 
    --max_train_steps=400 --use_8bit_adam --xformers --mixed_precision=fp16 
    --save_every_n_epochs=1 --save_model_as=safetensors --clip_skip=2 --seed=42 --color_aug 
    --token_string=mychar4 --init_word=cute --num_vectors_per_token=4
 ```
 ``--token_string`` に学習時のトークン文字列を指定します。__学習時のプロンプトは、この文字列を含むようにしてください（token_stringがmychar4なら、``mychar4 1girl`` など）__。プロンプトのこの文字列の部分が、Textual Inversionの新しいtokenに置換されて学習されます。
 プロンプトにトークン文字列が含まれているかどうかは、``--debug_dataset`` で置換後のtoken idが表示されますので、以下のように ``49408`` 以降のtokenが存在するかどうかで確認できます。
 ```
 input ids: tensor([[49406, 49408, 49409, 49410, 49411, 49412, 49413, 49414, 49415, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
         49407, 49407, 49407, 49407, 49407, 49407, 49407]])
 ```
 tokenizerがすでに持っている単語（一般的な単語）は使用できません。
 ``--init_word`` にembeddingsを初期化するときのコピー元トークンの文字列を指定します。学ばせたい概念が近いものを選ぶとよいようです。二つ以上のトークンになる文字列は指定できません。
 ``--num_vectors_per_token`` にいくつのトークンをこの学習で使うかを指定します。多いほうが表現力が増しますが、その分多くのトークンを消費します。たとえばnum_vectors_per_token=8の場合、指定したトークン文字列は（一般的なプロンプトの77トークン制限のうち）8トークンを消費します。
 その他、以下のオプションが指定できます。
 * --weights
  * 学習前に学習済みのembeddingsを読み込み、そこから追加で学習します。
 * --use_object_template
  * キャプションではなく既定の物体用テンプレート文字列（``a photo of a {}``など）で学習します。公式実装と同じになります。キャプションは無視されます。
 * --use_style_template
  * キャプションではなく既定のスタイル用テンプレート文字列で学習します（``a painting in the style of {}``など）。公式実装と同じになります。キャプションは無視されます。
 ## 当リポジトリ内の画像生成スクリプトで生成する
 gen_img_diffusers.pyに、``--textual_inversion_embeddings`` オプションで学習したembeddingsファイルを指定してください（複数可）。プロンプトでembeddingsファイルのファイル名（拡張子を除く）を使うと、そのembeddingsが適用されます。
--- a/train_ti_README.md
+++ b/train_ti_README.md
@ -0,0 +1,62 @@
 ## About learning Textual Inversion
 [Textual Inversion](https://textual-inversion.github.io/). I heavily referenced https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion for the implementation.
 The trained model can be used as is on the Web UI.
 In addition, it is probably compatible with SD2.x, but it has not been tested at this time.
 ## Learning method
 Use ``train_textual_inversion.py``.
 Data preparation is exactly the same as ``train_network.py``, so please refer to [their document](./train_network_README-en.md).
 ## options
 Below is an example command line (DreamBooth technique).
 ```
 accelerate launch --num_cpu_threads_per_process 1 train_textual_inversion.py
     --pretrained_model_name_or_path=..\models\model.ckpt
     --train_data_dir=..\data\db\char1 --output_dir=..\ti_train1
     --resolution=448,640 --train_batch_size=1 --learning_rate=1e-4
     --max_train_steps=400 --use_8bit_adam --xformers --mixed_precision=fp16
     --save_every_n_epochs=1 --save_model_as=safetensors --clip_skip=2 --seed=42 --color_aug
     --token_string=mychar4 --init_word=cute --num_vectors_per_token=4
 ```
 ``--token_string`` specifies the token string for learning. __The learning prompt should contain this string (eg ``mychar4 1girl`` if token_string is mychar4)__. This string part of the prompt is replaced with a new token for Textual Inversion and learned.
 ``--debug_dataset`` will display the token id after substitution, so you can check if the token string after ``49408`` exists as shown below. I can confirm.
 ```
 input ids: tensor([[49406, 49408, 49409, 49410, 49411, 49412, 49413, 49414, 49415, 49407,
          49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
          49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
          49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
          49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
          49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
          49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407, 49407,
          49407, 49407, 49407, 49407, 49407, 49407, 49407]])
 ```
 Words that the tokenizer already has (common words) cannot be used.
 In ``--init_word``, specify the string of the copy source token when initializing embeddings. It seems to be a good idea to choose something that has a similar concept to what you want to learn. You cannot specify a character string that becomes two or more tokens.
 ``--num_vectors_per_token`` specifies how many tokens to use for this training. The higher the number, the more expressive it is, but it consumes more tokens. For example, if num_vectors_per_token=8, then the specified token string will consume 8 tokens (out of the 77 token limit for a typical prompt).
 In addition, the following options can be specified.
 * --weights
   * Load learned embeddings before learning and learn additionally from there.
 * --use_object_template
   * Learn with default object template strings (such as ``a photo of a {}``) instead of captions. It will be the same as the official implementation. Captions are ignored.
 * --use_style_template
   * Learn with default style template strings instead of captions (such as ``a painting in the style of {}``). It will be the same as the official implementation. Captions are ignored.
 ## Generate with the image generation script in this repository
 In gen_img_diffusers.py, specify the learned embeddings file with the ``--textual_inversion_embeddings`` option. Using the filename (without the extension) of the embeddings file at the prompt will apply the embeddings.