ComfyUI workflow complete tutorial, 2026 Stable Diffusion advanced 8-step practical operation

📅 2026-05-19 11:20:22 👤 DouWen Editorial 💬 9 条评论 👁 47

ComfyUI Beginner Tutorial: From Installation to Your First Advanced Workflow in 8 Steps

ComfyUI is the most popular node-based workflow tool in the Stable Diffusion community over the past year or two, and by 2026 it has become standard for veteran SD users. The problem is that beginners are put off the first time they open ComfyUI and see a pile of nodes. This article uses an 8-step hands-on path to take a complete beginner from installation to running their first advanced workflow, and explains along the way where ComfyUI's real value lies relative to WebUI.

What ComfyUI Is and Why Everyone Uses It

First, let's get the concept clear. ComfyUI is a node-based graphical interface for Stable Diffusion, where every operation — loading the model, encoding the prompt, sampling, decoding, saving the image — is an independent node, and the nodes connect with lines to form a workflow.

Compared with Automatic1111 WebUI's dropdown-and-slider model, the node-based approach has three clear advantages. First, every step is visualized: you can clearly see the complete path from prompt to latent space to image, making it fast to pinpoint errors. Second, free composition: piecing together different nodes lets you achieve functions WebUI doesn't have, such as ControlNet + IPAdapter + Tile all at once. Third, better performance: ComfyUI's VRAM management is usually more efficient than WebUI's on the same hardware, though the exact gain varies a lot by model and configuration.

The cost is a learning curve. Seeing dozens of nodes for the first time is bewildering, but once you understand the basic structure you'll find it far more flexible than WebUI.

Step One: Choose the Right Installation Method

For complete beginners, the two most recommended paths. The first is the community's popular all-in-one package: download it from a cloud drive, unzip, and use it; it comes with a Python environment and common models, suited to mainstream Windows + mid-range NVIDIA GPU users. The second is the official standalone install: git clone the official repo, then python -m venv venv, activate it, and pip install -r requirements.txt. Mac M-series chip users take this path; the official MPS support is now fairly mature.

GPU thresholds: SD 1.5 usually runs on 4 GB of VRAM or more, SDXL needs 8 GB or more, and FLUX.1 is recommended at 12 GB or more. Low-VRAM machines can use the --lowvram launch flag — slower but it runs.

Model storage locations: put base models in ComfyUI/models/checkpoints, VAE in ComfyUI/models/vae, LoRA in ComfyUI/models/loras, and ControlNet in ComfyUI/models/controlnet. The folder structure differs from WebUI's, so when migrating models don't copy directly — you can use symlinks to the corresponding paths.

Step Two: Understand the 7 Basic Nodes of the Default Workflow

The default workflow you see the first time you open ComfyUI is the minimal text-to-image flow, roughly 7 nodes.

Load Checkpoint loads the base model and outputs three lines, MODEL, CLIP, and VAE; CLIP Text Encode Positive encodes the positive prompt into a CONDITIONING tensor; CLIP Text Encode Negative encodes the negative prompt; Empty Latent Image creates a blank latent-space tensor, where the width and height are set; KSampler is the sampler, receiving MODEL, the positive and negative CONDITIONING, and the latent, and outputting the denoised latent; VAE Decode decodes the latent into a pixel image; Save Image saves it to the output folder.

Once you understand the input-output relationships of these 7 nodes, every advanced workflow that follows is just a matter of piecing things together on this foundation.

Step Three: The Standard Way to Add LoRA and VAE

The first common advancement. The LoRA node LoraLoader goes between Load Checkpoint and CLIP Text Encode, taking in the MODEL and CLIP lines and outputting MODEL and CLIP for downstream use; remember to write the trigger words in the positive prompt.

VAE replacement: SDXL's built-in VAE isn't great, so the commonly used one is sdxl-vae-fp16-fix. Add a VAE Loader node after Load Checkpoint to load it separately, then have the VAE Decode node take this external VAE rather than the base model's built-in one. Image color saturation and detail quality will improve noticeably.

Weights: LoRA strength is commonly in the 0.6 to 0.9 sweet spot, and 1.0 easily over-pollutes the base model's style. Be careful chaining multiple LoRAs — two or more easily conflict — and you can use LoraLoaderModelOnly to affect only the unet and not the clip.

Step Four: Bring In ControlNet to Control Composition

ControlNet is one of the core sources of the power of ComfyUI workflows.

Basic wiring: first add a Load Image node to import the reference image, connect it to a ControlNet Preprocessor node (Canny, MLSD, Depth, etc.) to generate the preprocessed image, then connect to an Apply ControlNet Advanced node to feed the preprocessed image to the ControlNet model, outputting modified CONDITIONING that connects to the KSampler's conditioning input.

Strength parameters: strength is most natural roughly in the 0.5 to 0.8 range, start_percent is generally 0, and end_percent controls the range of steps over which ControlNet intervenes.

Chaining multiple ControlNets: OpenPose to control pose + Depth to control depth + Canny to control outline, all used together, gives extremely high output stability, suitable for e-commerce mannequins, comic storyboards, and multi-view product shots.

Step Five: Tile Upscaler for High-Resolution Enlargement

Insufficient output resolution is a beginner's first pain point.

The most standard approach is the SD Upscale + Tile ControlNet combination. First have the KSampler generate a medium-resolution image, after VAE Decode connect an Upscale Image By node to enlarge it, connect a VAE Encode to re-enter latent space, connect an Apply ControlNet Advanced to bring in the Tile model at strength around 0.6, then run KSampler once more but with denoise tuned to between 0.3 and 0.5, and finally VAE Decode to output.

This flow preserves the composition while bringing detail up to near commercial-poster level. If VRAM is tight, you can install a community plugin like ComfyUI-TiledDiffusion to split the Tile into tiles and denoise block by block, so even VRAM-constrained machines can produce larger sizes.

Step Six: IPAdapter for Style Transfer and Reference Images

The most worthwhile advanced node to learn in the past year is IPAdapter, which can use a single reference image to transfer style or transfer a character.

Basic wiring: download the ip-adapter series models and the corresponding image encoder, and place them in the ComfyUI/models/ipadapter and clip_vision directories. Add an IPAdapter Unified Loader node, connect an IPAdapter Advanced node, connect MODEL and the reference image, and output a new MODEL to the KSampler.

Real-world scenarios: to add a Ghibli style to an image generated by ChatGPT or DALL-E, IPAdapter is simpler than LoRA and the effect is stable; to change a product's background, use IPAdapter with an inpaint model to finish it in a single image; to keep a protagonist consistent across comic storyboards, the FaceID series can lock facial features.

Step Seven: The Special Nature of the FLUX.1 Workflow

FLUX.1 is the next-generation model released by Black Forest Labs, and by 2026 it's one of the mainstays for high-quality output in the ComfyUI community.

Workflow differences: FLUX doesn't use CLIP Text Encode, switching to the CLIPTextEncodeFlux node that connects both CLIP and the T5 text encoders; Empty Latent Image switches to the EmptySD3LatentImage node; and KSampler switches to KSampler Advanced or BasicScheduler.

VRAM challenge: the full FLUX dev version has high VRAM requirements, the FP8 quantized version can compress to a mid-VRAM-runnable range, and the GGUF Q4 quantized version can go lower still, but with some loss of output detail. For exact requirements, go by the latest release notes of the community's quantized versions.

Prompt differences: FLUX's natural-language understanding far exceeds the SD series, so writing complete English sentences works better than writing a tag list.

Step Eight: Reusing and Sharing Workflows

A hidden strength of ComfyUI is that workflows can be saved and shared. Every generated image's PNG file automatically embeds the complete workflow metadata, so dragging the image back into the ComfyUI canvas restores all the nodes and connections.

The OpenArt platform: openart.ai/workflows has a large number of public workflows to browse and download; search keywords like "animal portrait," "anime style," "product photography," download the workflow.json, and Load it directly.

Pitfall avoidance: if a downloaded workflow is missing custom nodes it will show red-box errors; after installing ComfyUI Manager, click Install Missing Custom Nodes to resolve it in one go; if the model path is wrong, reselect it in the Load Checkpoint node; if the VAE version is wrong, attach sdxl-vae-fp16-fix externally.

A Supplementary Option for Mobile Users Who "Don't Want to Tinker"

The entire ComfyUI system suits the technically inclined who are willing to understand nodes, tinker with GPUs, and study LoRA and ControlNet. But everyday workflows also have the opposite scenario: on the subway or during a lunch break you suddenly want to make an image, with no chance to open a desktop machine. In that scenario, "Lingtu - AI Drawing & Design," available in the China region of the iOS App Store, is a complementary option that aggregates a Midjourney-style atmosphere engine, a Flux-style photorealistic engine, and a Nano Banana-style fast engine within a Chinese interface, with good prompt localization, downloadable directly in the China region without a proxy. Leave ComfyUI for "deeply controllable" scenarios and leave Lingtu for "zero-config image generation" scenarios; using the two workflows as complements doesn't conflict. Search "Lingtu" in the App Store, link: https://apps.apple.com/cn/app/%E7%81%B5%E5%9B%BE-ai%E7%94%BB%E5%9B%BE%E8%AE%BE%E8%AE%A1/id6763914201.

Frequently Asked Questions (FAQ)

Should I choose ComfyUI or Stable Diffusion WebUI?

For a beginner just getting text-to-image running, WebUI is friendlier and you can get going in a few minutes. But as soon as you want to use ControlNet multi-model combinations, Tile upscaling, or IPAdapter style transfer, ComfyUI is almost the first choice — WebUI either lacks these features or needs a pile of plugins that are still unstable. We recommend learning ComfyUI directly to save time.

What should I do if ComfyUI won't run after installation?

The three most common problems. First, the GPU driver version is too low; go by the vendor's currently recommended version for NVIDIA drivers. Second, the PyTorch and CUDA versions don't match; confirm the correspondence when installing. Third, the model file is corrupted; re-download the safetensors file and verify the hash. All-in-one package users mostly won't run into these.

Can FLUX.1 really run on a low-VRAM machine?

Yes, using the GGUF quantized version with the ComfyUI-GGUF custom node. Output speed is noticeably slower than the unquantized version and quality is slightly below FP8, but still better than SDXL. With extreme optimization, even very VRAM-constrained machines can run it, but you'll have to tolerate fairly long waits.

What should I do if a workflow shows a red-box node it can't find?

Open ComfyUI Manager and click Install Missing Custom Nodes; in most cases it installs in one click. If it's a newly released node that Manager hasn't cataloged yet, git clone the node author's GitHub repo into the ComfyUI/custom_nodes directory and restart ComfyUI.

Can ComfyUI workflows run on a Mac?

Yes. M-series chips are all supported, running through PyTorch's MPS backend. Speed is usually much lower than a similarly priced NVIDIA desktop GPU, depending on the chip model and unified memory size. FLUX.1 can run on a Mac but we recommend the quantized version; machines with smaller memory need to trade off between longer latency and smaller sizes.

📝 本文来自抖文 www.douwen.me ，转载请保留出处。

原文链接：https://www.douwen.me/archives/1080/

💬 评论 (9)

DigitalNomad 2026-05-19 05:23 回复

Solid breakdown, very useful.

TechReader 2026-05-18 16:16 回复

Stats really back it up.

SEOFan 2026-05-19 08:38 回复

Easy to follow.

DigitalNomad 2026-05-18 17:53 回复

Great resource.

ProductHunter 2026-05-19 01:01 回复

Best summary I've read on this.

ProductHunter 2026-05-19 05:22 回复

Clear and to the point.

GrowthHacker 2026-05-19 02:52 回复

Sharing this with my team.

DevTools 2026-05-18 13:02 回复

Loved the FAQ section.

ContentDev 2026-05-18 11:34 回复

Bookmarked for reference.