Specify the folder of the source model and the destination .ckpt file as follows (actually written on one line). The v1/v2 version is automatically determined.
Note that v2 Diffusers' Text Encoder has only 22 layers, and if you convert it to Stable Diffusion as it is, the weights will be insufficient, so the weights of the 22 layers will be copied as the 23rd layer. The weight of the 23rd layer is not used during image generation, so it has no effect. Similarly, text_projection logit_scale also adds dummy weights (it doesn't seem to be used for image generation).
Also, since `.ckpt` does not contain scheduler and tokenizer information, you need to copy them from some existing Diffusers model. Please specify with `--reference_model`. You can specify the HuggingFace id or a local model directory.