mirror of https://github.com/vladmandic/automatic synced 2026-04-09 10:11:53 +02:00

Files

vladmandic 90b5e7de30 update todo/changelog

Signed-off-by: vladmandic <mandic00@live.com>

2026-04-04 08:56:37 +02:00

11 KiB

Raw Permalink Blame History

TODO

https://github.com/huggingface/diffusers/pull/13317

Internal

Feature: implement unload_auxiliary_models
Feature: RIFE update
Feature: RIFE in processing
Feature: SeedVR2 in processing
Feature: Add video models to Reference
Deploy: Lite vs Expert mode
Engine: mmgp
Engine: TensorRT acceleration
Feature: Auto handle scheduler prediction_type
Feature: Cache models in memory
Feature: JSON image metadata
Validate: Control tab add overrides handling
Feature: Integrate natural language image search ImageDB
Feature: Multi-user support
Feature: Settings profile manager
Feature: Video tab add full API support
Refactor: Unify huggingface and diffusers model folders
Refactor: GGUF
Reimplement llama remover for Kanvas
Integrate: Depth3D

OnHold

Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
Feature: Remote Text-Encoder support, sidelined for the moment

Modular

Pending finalization of modular pipelines implementation and development of compatibility layer

Switch to modular pipelines
Feature: Transformers unified cache handler
Refactor: Modular pipelines and guiders
MagCache
SmoothCache
STG

New models / Pipelines

TODO: Investigate which models are diffusers-compatible and prioritize!

Image-Base

Chroma Zeta: Image and video generator for creative effects and professional filters
Chroma Radiance: Pixel-space model eliminating VAE artifacts for high visual fidelity
Bria FIBO: Fully JSON based
Liquid: Unified vision-language auto-regressive generation paradigm
Lumina-DiMOO: Foundational multi-modal generation and understanding via discrete diffusion
nVidia Cosmos-Predict-2.5: Physics-aware world foundation model for consistent scene prediction
Liquid (unified multimodal generator): Auto-regressive generation paradigm across vision and language
Lumina-DiMOO: foundational multi-modal multi-task generation and understanding

Image-Edit

Bria FIBO-Edit: Fully JSON-based instruction-following image editing framework
Meituan LongCat-Image-Edit-Turbo:6B instruction-following image editing with high visual consistency
VIBE Image-Edit: (Sana+Qwen-VL)Fast visual instruction-based image editing framework
LucyEdit:Instruction-guided video editing while preserving motion and identity
Step1X-Edit:Multimodal image editing decoding MLLM tokens via DiT
OneReward:Reinforcement learning grounded generative reward model for image editing
ByteDance DreamO: image customization framework for IP adaptation and virtual try-on
nVidia Cosmos-Transfer-2.5

Video

LTX-Condition
LTX-Distilled
OpenMOSS MOVA: Unified foundation model for synchronized high-fidelity video and audio
Wan family (Wan2.1 / Wan2.2 variants): MoE-based foundational tools for cinematic T2V/I2V/TI2V example: Wan2.1-T2V-14B-CausVid distill / step-distill examples: Wan2.1-StepDistill-CfgDistill
Krea Realtime Video: (Wan2.1)Distilled real-time video diffusion using self-forcing techniques
MAGI-1 (autoregressive video): Autoregressive video generation allowing infinite and timeline control
MUG-V 10B (video generation): large-scale DiT-based video generation system trained via flow-matching
Ovi (audio/video generation): (Wan2.2)Speech-to-video with synchronized sound effects and music
HunyuanVideo-Avatar / HunyuanCustom: (HunyuanVideo)MM-DiT based dynamic emotion-controllable dialogue generation
Sana Image→Video (Sana-I2V): (Sana)Compact Linear DiT framework for efficient high-resolution video
Wan-2.2 S2V (diffusers PR): (Wan2.2)Audio-driven cinematic speech-to-video generation
LongCat-Video: Unified framework for minutes-long coherent video generation via Block Sparse Attention
LTXVideo / LTXVideo LongMulti (diffusers PR): Real-time DiT-based generation with production-ready camera controls
DiffSynth-Studio (ModelScope): (Wan2.2)Comprehensive training and quantization tools for Wan video models
Phantom (Phantom HuMo): Human-centric video generation framework focus on subject ID consistency
CausVid-Plus / WAN-CausVid-Plus: (Wan2.1)Causal diffusion for high-quality temporally consistent long videos
Wan2GP (workflow/GUI for Wan): (Wan)Web-based UI focused on running complex video models for GPU-poor setups
LivePortrait: Efficient portrait animation system with high stitching and retargeting control
Magi (SandAI): High-quality autoregressive video generation framework
Ming (inclusionAI): Unified multimodal model for processing text, audio, image, and video

Other/Unsorted

DiffusionForcing: Full-sequence diffusion with autoregressive next-token prediction
Self-Forcing: Framework for improving temporal consistency in long-horizon video generation
SEVA: Stable Virtual Camera for novel view synthesis and 3D-consistent video
ByteDance USO: Unified Style-Subject Optimized framework for personalized image generation
ByteDance Lynx: State-of-the-art high-fidelity personalized video generation based on DiT
LanDiff: Coarse-to-fine text-to-video integrating Language and Diffusion Models
Video Inpaint Pipeline: Unified inpainting pipeline implementation within Diffusers library
Sonic Inpaint: Audio-driven portrait animation system focus on global audio perception
Make-It-Count: CountGen method for precise numerical control of objects via object identity features
ControlNeXt: Lightweight architecture for efficient controllable image and video generation
MS-Diffusion: Layout-guided multi-subject image personalization framework
UniRef: Unified model for segmentation tasks designed as foundation model plug-in
FlashFace: High-fidelity human image customization and face swapping framework
ReNO: Reward-based Noise Optimization to improve text-to-image quality during inference

Not Planned

LoRAdapter: Not recently updated
SD3 UltraEdit: Based on SD3
PowerPaint: Based on SD15
FreeCustom: Based on SD15
AnyDoor: Based on SD21
AnyText2: Based on SD15
DragonDiffusion: Based on SD15
DenseDiffusion: Based on SD15
IC-Light: Based on SD15

Code TODO

npm run todo

installer.py:TODO rocm: switch to pytorch source when it becomes available
modules/control/run.py:TODO modernui: monkey-patch for missing tabs.select event
modules/history.py:TODO: apply metadata, preview, load/save
modules/image/resize.py:TODO resize image: enable full VAE mode for resize-latent
modules/lora/lora_apply.py:TODO lora: add other quantization types
modules/lora/lora_apply.py:TODO lora: maybe force imediate quantization
modules/lora/lora_extract.py:TODO: lora: support pre-quantized flux
modules/lora/lora_load.py:TODO lora: add t5 key support for sd35/f1
modules/masking.py:TODO: additional masking algorithms
modules/modular_guiders.py:TODO: guiders
modules/processing_class.py:TODO processing: remove duplicate mask params
modules/sd_hijack_hypertile.py:TODO hypertile: vae breaks when using non-standard sizes
modules/sd_models.py:TODO model load: implement model in-memory caching
modules/sd_samplers_diffusers.py:TODO enso-required
modules/sd_unet.py:TODO model load: force-reloading entire model as loading transformers only leads to massive memory usage
modules/transformer_cache.py:TODO fc: autodetect distilled based on model
modules/transformer_cache.py:TODO fc: autodetect tensor format based on model
modules/ui_models_load.py:TODO loader: load receipe
modules/ui_models_load.py:TODO loader: save receipe
modules/video_models/video_save.py:TODO audio set time-base

11 KiB Raw Permalink Blame History