html Awesome Instruction Editing
A Survey of Instruction-Guided Image Editing
Awesome Instruction Editing
I. Introduction
Instruction-guided image editing is a rapidly evolving field allowing users to modify specific regions of an image using diverse instructions, ranging from text to multi-modal prompts. This survey provides a unified perspective and comprehensive taxonomy of instruction-guided image editing, covering core topics across applications, learning paradigms, and model architectures. This website offers a sorted compilation of academic articles, datasets, bounding methodology and techniques for instruction editing.
II. List of Approaches (Sortable)
Total number of rows: 161
Title Venue Year Code Features Category
Dragging with Geometry: From Pixels to Geometry-Guided Image Editing ICML 2026 [Code] interactive point-based image editing, geometry-guided dragging, 3D cues Image Editing
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control ICLR 2026 [Code] shape-aware image editing Image Editing
UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images SIGGRAPH Asia 2025 [Code] stylized text editing and generation Image Editing
EditInfinity: Image Editing with Binary-Quantized Generative Models NeurIPS 2025 [Code] text-driven image editing, binary-quantized generative models Image Editing
Prompt-Softbox-Prompt: A Free-Text Embedding Control for Image Editing MM 2025 [Code] free-text embedding control Image Editing
Exploring Optimal Latent Trajetory for Zero-shot Image Editing arXiv 2025 - zero-shot image editing Image Editing
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors arXiv 2025 - interactive image editing Image Editing
Energy-Guided Optimization for Personalized Image Editing with Pretrained Text-to-Image Diffusion Models arXiv 2025 - personalized image editing Image Editing
Early Timestep Zero-Shot Candidate Selection for Instruction-Guided Image Editing arXiv 2025 - instruction-guided image editing Image Editing
PartEdit: Fine-Grained Image Editing using Pre-Trained Diffusion Models arXiv 2025 - Fine-Grained Image Editing Image Editing
S2Edit: Text-Guided Image Editing with Precise Semantic and Spatial Control arXiv 2025 - text guided image editing Image Editing
REED-VAE: RE-Encode Decode Training for Iterative Image Editing with Diffusion Models arXiv 2025 - iterative image editing Image Editing
Towards Efficient Exemplar Based Image Editing with Multimodal VLMs arXiv 2025 - exemplar-based image editing Image Editing
ReFlex: Text-Guided Editing of Real Images in Rectified Flow via Mid-Step Feature Extraction and Attention Adaptation arXiv 2025 - text-guided image editing Image Editing
LUSD: Localized Update Score Distillation for Text-Guided Image Editing arXiv 2025 - text-guided image editing Image Editing
Guiding Instruction-based Image Editing via Multimodal Large Language Models ICLR 2024 [Code] LLM-guided, Diffusion, Concise instruction loss, Supervised fine-tuning Image Editing
Hive: Harnessing human feedback for instructional visual editing CVPR 2024 [Code] RLHF, Diffusion, Data augmentation Image Editing
InstructBrush: Learning Attention-based Instruction Optimization for Image Editing arXiv 2024 [Code] Diffusion, Attention-based Image Editing
FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image Editing arXiv 2024 [Code] Controllable diffusion Image Editing
Pix2Pix-OnTheFly: Leveraging LLMs for Instruction-Guided Image Editing arXiv 2024 [Code] on-the-fly, tuning-free, training-free Image Editing
EffiVED:Efficient Video Editing via Text-instruction Diffusion Models arXiv 2024 [Code] Video editing, decoupled classifier-free Image Editing
Grounded-Instruct-Pix2Pix: Improving Instruction Based Image Editing with Automatic Target Grounding ICASSP 2024 [Code] Diffusion, mask generation image editing Image Editing
TexFit: Text-Driven Fashion Image Editing with Diffusion Models AAAI 2024 [Code] Fashion editing, region locaation, diffusion Image Editing
InstructGIE: Towards Generalizable Image Editing arXiv 2024 [Code] Diffusion, context matching Image Editing
An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control arXiv 2024 [Code] Freestyle, Diffusion, Group attention Image Editing
Text-Driven Image Editing via Learnable Regions CVPR 2024 [Code] Region generation, diffusion, mask-free Image Editing
ChartReformer: Natural Language-Driven Chart Image Editing ICDAR 2024 [Code] chart editing Image Editing
GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models arXiv 2024 [Code] Hybrid, direction transfer Image Editing
StyleBooth: Image Style Editing with Multimodal Instruction arXiv 2024 [Code] style editing, diffusion Image Editing
ZONE: Zero-Shot Instruction-Guided Local Editing CVPR 2024 [Code] Local editing, localisation Image Editing
Inversion-Free Image Editing with Natural Language CVPR 2024 [Code] Consistent models, unified attention Image Editing
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation CVPR 2024 [Code] Diffusion, multi-instruction Image Editing
MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers arXiv 2024 [Code] MoE, LLM-powered Image Editing
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists ICLR 2024 [Code] Diffusion, LLM-based, classifier-free Image Editing
Iterative Multi-Granular Image Editing Using Diffusion Models WACV 2024 - Diffusion, Iterative editing Image Editing
Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing NeurIPS 2024 [Code] Diffusion, dynamic prompt Image Editing
Object-Aware Inversion and Reassembly for Image Editing ICLR 2024 [Code] Diffusion, multi-object Image Editing
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models arXiv 2024 [Code] video editing, zero-shot Image Editing
Video-P2P: Video Editing with Cross-attention Control CVPR 2024 [Code] Decoupled-guidance attention control, video editing Image Editing
NeRF-Insert: 3D Local Editing with Multimodal Control Signals arXiv 2024 - 3D Editing Image Editing
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models arXiv 2024 [Code] 3D Editing Image Editing
AudioScenic: Audio-Driven Video Scene Editing arXiv 2024 - audio-based instruction Image Editing
LocInv: Localization-aware Inversion for Text-Guided Image Editing CVPR-AI4CC 2024 [Code] Localization-aware inversion Image Editing
SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models arXiv 2024 [Code] Audio-driven Image Editing
Exploring Text-Guided Single Image Editing for Remote Sensing Images arXiv 2024 [Code] Remote sensing images Image Editing
GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting arXiv 2024 [Code] Fashion editing Image Editing
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing arXiv 2024 - Chain of thought Image Editing
Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection arXiv 2024 [Code] Diffusion, Self-attention Injection Image Editing
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning arXiv 2024 [Code] Music editing, diffusion Image Editing
Text Guided Image Editing with Automatic Concept Locating and Forgetting arXiv 2024 - Diffusion, concept forgetting Image Editing
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction arXiv 2024 [Code] Diffusion, instruction-driven editing Image Editing
Revealing Directions for Text-guided 3D Face Editing arXiv 2024 - Text-guided 3D face editing Image Editing
Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing arXiv 2024 - Text-to-image, editing, diffusion Image Editing
Hyper-parameter tuning for text guided image editing arXiv 2024 [Code] Text Editing Image Editing
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models arXiv 2024 - Text-guided Object Insertion Image Editing
GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing arXiv 2024 - Diffusion image augmentation Image Editing
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion arXiv 2024 - Text-Guided Image Editing Image Editing
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing arXiv 2024 - semantic image editing Image Editing
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers arXiv 2024 - disentangled semantic editing Image Editing
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency arXiv 2024 - Instruction-based image editing Image Editing
CA-Edit: Causality-Aware Condition Adapter for High-Fidelity Local Facial Attribute Editing arXiv 2024 - facial attribute editing Image Editing
Unsupervised Region-Based Image Editing of Denoising Diffusion Models arXiv 2024 - region-based image editing Image Editing
Edicho: Consistent Image Editing in the Wild arXiv 2024 - consistent image editing Image Editing
UIP2P: Unsupervised Instruction-based Image Editing via Edit Reversibility Constraint arXiv 2024 - instruction-based image editing Image Editing
InstructPix2Pix: Learning To Follow Image Editing Instruction CVPR 2023 [Code] Core paper, Diffusion Image Editing
Visual Instruction Inversion: Image Editing via Image Prompting NeurIPS 2023 [Code] Diffusion, visual instruction Image Editing
Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions ICCV 2023 [Code] 3D scene editing Image Editing
Instruct 3D-to-3D: Text Instruction Guided 3D-to-3D conversion arXiv 2023 [Code] 3D editing, Dynamic scaling Image Editing
InstructME: An Instruction Guided Music Edit And Remix Framework with Latent Diffusion Models arXiv 2023 [Code] Music editing, diffusion Image Editing
EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models arXiv 2023 [Code] authorized editing, diffusion Image Editing
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis arXiv 2023 [Code] Video editing, cross-time attention Image Editing
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models NeurIPS 2023 [Code] Audio, Diffusion Image Editing
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following arXiv 2023 [Code] Refinement prior, instrucitonal tuning Image Editing
Learning to Follow Object-Centric Image Editing Instructions Faithfully EMNLP 2023 [Code] Diffusion, additional supervision Image Editing
StableVideo: Text-driven Consistency-aware Diffusion Video Editing ICCV 2023 [Code] Diffusion, Video Image Editing
Vox-E: Text-Guided Voxel Editing of 3D Objects ICCV 2023 [Code] Diffusion, 3D Image Editing
Unitune: Text-driven image editing by fine tuning a diffusion model on a single image TOG 2023 [Code] Diffusion, fine-tuning Image Editing
Dreamix: Video Diffusion Models are General Video Editors arXiv 2023 [Code] Cascaded diffusion, video Image Editing
Dialogpaint: A dialog-based image editing model arXiv 2023 - Dialog-based Image Editing
iEdit: Localised Text-guided Image Editing with Weak Supervision arXiv 2023 - Localized diffusion Image Editing
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation NeurIPS 2023 - Example-based instruction Image Editing
NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models CVPR 2023 [Code] null-tex embedding, Diffusion, CLIP Image Editing
Imagic: Text-based real image editing with diffusion models CVPR 2023 [Code] Diffusion, embedding interpolation Image Editing
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models arXiv 2023 [Code] Diffusion, dual-branch concept Image Editing
InstructEdit: Improving Automatic Masks for Diffusion-based Image Editing With User Instructions arXiv 2023 [Code] Diffusion, LLM-powered Image Editing
Instructdiffusion: A generalist modeling interface for vision tasks arXiv 2023 [Code] Multi-task, multi-turn, Diffusion, LLM Image Editing
Emu Edit: Precise Image Editing via Recognition and Generation Tasks arXiv 2023 [Code] Diffusion, multi-task, multi-turn Image Editing
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models arXiv 2023 [Code] MLLM, Diffusion Image Editing
ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation arXiv 2023 [Code] LLM, Diffusion Image Editing
Prompt-to-Prompt Image Editing with Cross Attention Control ICLR 2023 [Code] Diffusion, Cross Attention Image Editing
Target-Free Text-Guided Image Manipulation AAAI 2023 [Code] 3D Editing Image Editing
Paint by example: Exemplar-based image editing with diffusion models CVPR 2023 [Code] Diffusion, example-based Image Editing
De-net: Dynamic text-guided image editing adversarial networks AAAI 2023 [Code] GAN, multi-task Image Editing
Imagen editor and editbench: Advancing and evaluating text-guided image inpainting CVPR 2023 [Code] Diffusion, benchmark, CLIP Image Editing
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation CVPR 2023 [Code] Diffusion, feature injection Image Editing
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing ICCV 2023 [Code] Diffusion, mutual self-attention Image Editing
LIME: Localized Image Editing via Attention Regularization in Diffusion Models arXiv 2023 - Localized image editing Image Editing
LDEdit: Towards Generalized Text Guided Image Manipulation via Latent Diffusion Models BMVC 2022 - latent diffusion Image Editing
StyleMC: Multi-Channel Based Fast Text-Guided Image Generation and Manipulation WACV 2022 [Code] GAN, CLIP Image Editing
Blended Diffusion for Text-Driven Editing of Natural Images CVPR 2022 [Code] Diffusion, CLIP, Blend Image Editing
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance ECCV 2022 [Code] GAN, CLIP Image Editing
StyleGAN-NADA: CLIP-guided domain adaptation of image generators TOG 2022 [Code] GAN, CLIP Image Editing
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation CVPR 2022 [Code] Diffusion, CLIP, Noise combination Image Editing
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models ICML 2022 [Code] Diffusion, CLIP, Classifier-free guidance Image Editing
DiffEdit: Diffusion-based semantic image editing with mask guidance ICLR 2022 [Code] Diffusion, DDIM, Mask generation Image Editing
Text2mesh: Text-driven neural stylization for meshes CVPR 2022 [Code] 3D Editing Image Editing
Manitrans: Entity-level text-guided image manipulation via token-wise semantic alignment and generation CVPR 2022 [Code] GAN, multi-entities Image Editing
Text2live: Text-driven layered image and video editing ECCV 2022 [Code] GAN, CLIP, Video editing Image Editing
SPEECHPAINTER: TEXT-CONDITIONED SPEECH INPAINTING Interspeech 2022 [Code] Speech editing Image Editing
Talk-to-Edit: Fine-Grained Facial Editing via Dialog ICCV 2021 [Code] GAN, dialog, semantic field Image Editing
Manigan: Text-guided image manipulation CVPR 2020 [Code] GAN, affine combination Image Editing
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning EMNLP 2020 [Code] GAN, Cross-task consistency Image Editing
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions ECCV 2020 [Code] GAN Image Editing
Sequential Attention GAN for Interactive Image Editing MM 2020 - GAN, Dialog, Sequential Attention Image Editing
Lightweight generative adversarial networks for text-guided image manipulation NeurIPS 2020 [Code] Light-weight GAN Image Editing
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction ICCV 2019 [Code] GAN Image Editing
Language-Based Image Editing With Recurrent Attentive Models CVPR 2018 [Code] GAN, Recurrent Attention Image Editing
Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language NeurIPS 2018 [Code] GAN, simple Image Editing
Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs CHI 2025 - 3D Radiance Field editing Media Editing
SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing ICCV 2025 [Code] head reconstruction and real-time editing Media Editing
Edit as You See: Image-guided Video Editing via Masked Motion Modeling arXiv 2025 - image-guided video editing Media Editing
CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing arXiv 2025 - Text-Based CAD Editing Media Editing
MRHaD: Mixed Reality-based Hand-Drawn Map Editing Interface for Mobile Robot Navigation arXiv 2025 - mixed reality map editing Media Editing
ScanEdit: Hierarchically-Guided Functional 3D Scan Editing arXiv 2025 - 3D Scan Editing Media Editing
Vidi: Large Multimodal Models for Video Understanding and Editing arXiv 2025 - video understanding and editing Media Editing
Rethinking Score Distilling Sampling for 3D Editing and Generation arXiv 2025 - 3D editing and generation Media Editing
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing arXiv 2025 - 3D visual editing Media Editing
EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues arXiv 2025 - Automated cinematic editing Media Editing
VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing arXiv 2025 - multi-grained video editing Media Editing
VEU-Bench: Towards Comprehensive Understanding of Video Editing arXiv 2025 - video editing benchmark Media Editing
Controllable Pedestrian Video Editing for Multi-View Driving Scenarios via Motion Sequence arXiv 2025 - pedestrian video editing Media Editing
TexGS-VolVis: Expressive Scene Editing for Volume Visualization via Textured Gaussian Splatting arXiv 2025 - Volume Scene Editing Media Editing
SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing SIGGRAPH Asia 2024 [Code] Diffusion, scene graph, image-editing Media Editing
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition arXiv 2024 - Text-to-Audio, Multimodal Media Editing
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework arXiv 2024 [Code] Diffusion-based text-to-audio Media Editing
Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis BMVC 2024 [Code] Diffusion-based local image manipulation Media Editing
Steer-by-prior Editing of Symbolic Music Loops MML 2024 [Code] Masked Language Modelling, music instruments Media Editing
Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning ISMIR 2024 [Code] Diffusion-based text-to-audio Media Editing
GroupDiff: Diffusion-based Group Portrait Editing ECCV 2024 [Code] Diffusion-based image editing Media Editing
RegionDrag: Fast Region-Based Image Editing with Diffusion Models ECCV 2024 [Code] Diffusion-based image editing Media Editing
SyncNoise: Geometrically Consistent Noise Prediction for Text-based 3D Scene Editing arXiv 2024 - Multi-view consistency Media Editing
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation arXiv 2024 [Code] Diffusion-based editing Media Editing
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control arXiv 2024 - Audio editing Media Editing
3DEgo: 3D Editing on the Go! ECCV 2024 [Code] Monocular 3D Scene Synthesis Media Editing
MedEdit: Counterfactual Diffusion-based Image Editing on Brain MRI SASHIMI 2024 - Biomedical editing Media Editing
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing ECCV 2024 - Image editing Media Editing
LEMON: Localized Editing with Mesh Optimization and Neural Shaders arXiv 2024 - Mesh editing Media Editing
Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images arXiv 2024 - Image editing Media Editing
Streamlining Image Editing with Layered Diffusion Brushes arXiv 2024 - Image editing Media Editing
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing arXiv 2024 [Code] Image Editing Dataset Media Editing
Environment Maps Editing using Inverse Rendering and Adversarial Implicit Functions arXiv 2024 - Inverse rendering, HDR editing Media Editing
HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion arXiv 2024 - Hair editing, Diffusion models Media Editing
DiffuMask-Editor: A Novel Paradigm of Integration Between the Segmentation Diffusion Model and Image Editing to Improve Segmentation Ability arXiv 2024 - Synthetic Data Generation Media Editing
Taming Rectified Flow for Inversion and Editing arXiv 2024 [Code] Image Inversion Media Editing
Pathways on the Image Manifold: Image Editing via Video Generation arXiv 2024 - video-based editing, Frame2Frame, Temporal Editing Caption Media Editing
PrEditor3D: Fast and Precise 3D Shape Editing arXiv 2024 - 3D shape editing Media Editing
Diffusion-Based Attention Warping for Consistent 3D Scene Editing arXiv 2024 - 3D scene editing Media Editing
MIVE: New Design and Benchmark for Multi-Instance Video Editing arXiv 2024 - Multi-Instance Video Editing Media Editing
DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes arXiv 2024 - 3D object editing Media Editing
MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation arXiv 2024 - Multi-Attribute Video Editing Media Editing
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting arXiv 2024 - 3D scene editing Media Editing
III. Citations
Source: https://github.com/tamlhp/awesome-instruction-editing
Paper:   https://doi.org/10.1016/j.engappai.2025.112953
© 2026 Instruction Editing  
Flag Counter