mirror of
https://github.com/vladmandic/automatic
synced 2026-04-09 10:11:53 +02:00
11 KiB
11 KiB
TODO
https://github.com/huggingface/diffusers/pull/13317
Internal
- Feature: implement
unload_auxiliary_models - Feature: RIFE update
- Feature: RIFE in processing
- Feature: SeedVR2 in processing
- Feature: Add video models to
Reference - Deploy: Lite vs Expert mode
- Engine: mmgp
- Engine:
TensorRTacceleration - Feature: Auto handle scheduler
prediction_type - Feature: Cache models in memory
- Feature: JSON image metadata
- Validate: Control tab add overrides handling
- Feature: Integrate natural language image search ImageDB
- Feature: Multi-user support
- Feature: Settings profile manager
- Feature: Video tab add full API support
- Refactor: Unify huggingface and diffusers model folders
- Refactor: GGUF
- Reimplement
llamaremover for Kanvas - Integrate: Depth3D
OnHold
- Feature: LoRA add OMI format support for SD35/FLUX.1, on-hold
- Feature: Remote Text-Encoder support, sidelined for the moment
Modular
Pending finalization of modular pipelines implementation and development of compatibility layer
- Switch to modular pipelines
- Feature: Transformers unified cache handler
- Refactor: Modular pipelines and guiders
- MagCache
- SmoothCache
- STG
New models / Pipelines
TODO: Investigate which models are diffusers-compatible and prioritize!
Image-Base
- Chroma Zeta: Image and video generator for creative effects and professional filters
- Chroma Radiance: Pixel-space model eliminating VAE artifacts for high visual fidelity
- Bria FIBO: Fully JSON based
- Liquid: Unified vision-language auto-regressive generation paradigm
- Lumina-DiMOO: Foundational multi-modal generation and understanding via discrete diffusion
- nVidia Cosmos-Predict-2.5: Physics-aware world foundation model for consistent scene prediction
- Liquid (unified multimodal generator): Auto-regressive generation paradigm across vision and language
- Lumina-DiMOO: foundational multi-modal multi-task generation and understanding
Image-Edit
- Bria FIBO-Edit: Fully JSON-based instruction-following image editing framework
- Meituan LongCat-Image-Edit-Turbo:6B instruction-following image editing with high visual consistency
- VIBE Image-Edit: (Sana+Qwen-VL)Fast visual instruction-based image editing framework
- LucyEdit:Instruction-guided video editing while preserving motion and identity
- Step1X-Edit:Multimodal image editing decoding MLLM tokens via DiT
- OneReward:Reinforcement learning grounded generative reward model for image editing
- ByteDance DreamO: image customization framework for IP adaptation and virtual try-on
- nVidia Cosmos-Transfer-2.5
Video
- LTX-Condition
- LTX-Distilled
- OpenMOSS MOVA: Unified foundation model for synchronized high-fidelity video and audio
- Wan family (Wan2.1 / Wan2.2 variants): MoE-based foundational tools for cinematic T2V/I2V/TI2V example: Wan2.1-T2V-14B-CausVid distill / step-distill examples: Wan2.1-StepDistill-CfgDistill
- Krea Realtime Video: (Wan2.1)Distilled real-time video diffusion using self-forcing techniques
- MAGI-1 (autoregressive video): Autoregressive video generation allowing infinite and timeline control
- MUG-V 10B (video generation): large-scale DiT-based video generation system trained via flow-matching
- Ovi (audio/video generation): (Wan2.2)Speech-to-video with synchronized sound effects and music
- HunyuanVideo-Avatar / HunyuanCustom: (HunyuanVideo)MM-DiT based dynamic emotion-controllable dialogue generation
- Sana Image→Video (Sana-I2V): (Sana)Compact Linear DiT framework for efficient high-resolution video
- Wan-2.2 S2V (diffusers PR): (Wan2.2)Audio-driven cinematic speech-to-video generation
- LongCat-Video: Unified framework for minutes-long coherent video generation via Block Sparse Attention
- LTXVideo / LTXVideo LongMulti (diffusers PR): Real-time DiT-based generation with production-ready camera controls
- DiffSynth-Studio (ModelScope): (Wan2.2)Comprehensive training and quantization tools for Wan video models
- Phantom (Phantom HuMo): Human-centric video generation framework focus on subject ID consistency
- CausVid-Plus / WAN-CausVid-Plus: (Wan2.1)Causal diffusion for high-quality temporally consistent long videos
- Wan2GP (workflow/GUI for Wan): (Wan)Web-based UI focused on running complex video models for GPU-poor setups
- LivePortrait: Efficient portrait animation system with high stitching and retargeting control
- Magi (SandAI): High-quality autoregressive video generation framework
- Ming (inclusionAI): Unified multimodal model for processing text, audio, image, and video
Other/Unsorted
- DiffusionForcing: Full-sequence diffusion with autoregressive next-token prediction
- Self-Forcing: Framework for improving temporal consistency in long-horizon video generation
- SEVA: Stable Virtual Camera for novel view synthesis and 3D-consistent video
- ByteDance USO: Unified Style-Subject Optimized framework for personalized image generation
- ByteDance Lynx: State-of-the-art high-fidelity personalized video generation based on DiT
- LanDiff: Coarse-to-fine text-to-video integrating Language and Diffusion Models
- Video Inpaint Pipeline: Unified inpainting pipeline implementation within Diffusers library
- Sonic Inpaint: Audio-driven portrait animation system focus on global audio perception
- Make-It-Count: CountGen method for precise numerical control of objects via object identity features
- ControlNeXt: Lightweight architecture for efficient controllable image and video generation
- MS-Diffusion: Layout-guided multi-subject image personalization framework
- UniRef: Unified model for segmentation tasks designed as foundation model plug-in
- FlashFace: High-fidelity human image customization and face swapping framework
- ReNO: Reward-based Noise Optimization to improve text-to-image quality during inference
Not Planned
- LoRAdapter: Not recently updated
- SD3 UltraEdit: Based on SD3
- PowerPaint: Based on SD15
- FreeCustom: Based on SD15
- AnyDoor: Based on SD21
- AnyText2: Based on SD15
- DragonDiffusion: Based on SD15
- DenseDiffusion: Based on SD15
- IC-Light: Based on SD15
Code TODO
npm run todo
installer.py:TODO rocm: switch to pytorch source when it becomes available
modules/control/run.py:TODO modernui: monkey-patch for missing tabs.select event
modules/history.py:TODO: apply metadata, preview, load/save
modules/image/resize.py:TODO resize image: enable full VAE mode for resize-latent
modules/lora/lora_apply.py:TODO lora: add other quantization types
modules/lora/lora_apply.py:TODO lora: maybe force imediate quantization
modules/lora/lora_extract.py:TODO: lora: support pre-quantized flux
modules/lora/lora_load.py:TODO lora: add t5 key support for sd35/f1
modules/masking.py:TODO: additional masking algorithms
modules/modular_guiders.py:TODO: guiders
modules/processing_class.py:TODO processing: remove duplicate mask params
modules/sd_hijack_hypertile.py:TODO hypertile: vae breaks when using non-standard sizes
modules/sd_models.py:TODO model load: implement model in-memory caching
modules/sd_samplers_diffusers.py:TODO enso-required
modules/sd_unet.py:TODO model load: force-reloading entire model as loading transformers only leads to massive memory usage
modules/transformer_cache.py:TODO fc: autodetect distilled based on model
modules/transformer_cache.py:TODO fc: autodetect tensor format based on model
modules/ui_models_load.py:TODO loader: load receipe
modules/ui_models_load.py:TODO loader: save receipe
modules/video_models/video_save.py:TODO audio set time-base