If you would rather use model finetuning rather than the dreambooth method you can use a command similat to the following. The advantage of fine tuning is that you do not need to worry about regularization images... but you need to provide captions for every images. The caption will be used to train the model. You can use auto1111 to preprocess your training images and add either BLIP or danbooru captions to them. You then need to edit those to add the name of the model and correct any wrong description.
* 11/7 (v7): Text Encoder supports checkpoint files in different storage formats (it is converted at the time of import, so export will be in normal format). Changed the average value of EPOCH loss to output to the screen. Added a function to save epoch and global step in checkpoint in SD format (add values if there is existing data). The reg_data_dir option is enabled during fine tuning (fine tuning while mixing regularized images). Added dataset_repeats option that is valid for fine tuning (specified when the number of teacher images is small and the epoch is extremely short).
- Added a function that allows captions to learning images and regularized images.
* 11/27 (v11) update:
- DiffUsers 0.9.0 is required. Update as `pip install -U diffusers[torch]==0.9.0` in the virtual environment.
- The way captions are handled in DreamBooth has changed. When a caption file existed, the file's caption was added to the folder caption until v10, but from v11 it is only the file's caption. Please be careful.
- Fixed a bug where prior_loss_weight was applied to learning images. We apologize for the inconvenience.
- Compatible with Stable Diffusion v2.0. Add the --v2 option. If you are using 768-v-ema.ckpt or stable-diffusion-2 instead of stable-diffusion-v2-base, add --v_parameterization as well. Learn more about other options.
- Added options related to the learning rate scheduler.
- You can download and use DiffUsers models directly from Hugging Face. In addition, DiffUsers models can be saved during training.