Image-to-Image Transformations

Key Takeaways

Image-to-image starts with partially noised source images, not pure noise
The strength parameter controls transformation degree (0 = original, 1 = new)
Preserve composition with lower strength; change style with higher values
Match latent resolution; prep images for better results
Works for style transfer, editing, variations, and upscaling tasks

Overview

Image-to-image generation transforms existing images using text prompts, preserving aspects of the original while introducing new elements. Unlike text-to-image which starts from random noise, image-to-image begins with your source image, giving you creative control over how much to preserve versus transform.

Think of it like renovating a house instead of building from scratch. You keep the foundation and structure (the composition) but can change the style, colors, and details. The strength parameter is like choosing between a light refresh (new paint) or a complete remodel (new everything).

Notebook

View the companion notebook: Image to Image Transformations

Setting Up the Pipeline

Install dependencies and load the image-to-image pipeline:

!pip install -q diffusers transformers accelerate

import torch
from diffusers import AutoPipelineForImage2Image
from PIL import Image
import requests
from io import BytesIO

# Load the pipeline
pipeline = AutoPipelineForImage2Image.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    use_safetensors=True,
).to("cuda")

# Load an example image
img_url = "https://example.com/your-image.jpg"
response = requests.get(img_url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((512, 512))

The Strength Parameter

The strength parameter determines how much of your original image to preserve versus how much freedom the model has to change it. Strength maps to how far along the diffusion trajectory the process starts—high strength injects more noise (more change), low strength preserves structure (less change).

prompt = "A oil painting of a landscape in impressionist style"

# Low strength - minimal changes
light_edit = pipeline(
    prompt=prompt,
    image=init_image,
    strength=0.3,
    guidance_scale=7.5
).images[0]

# Medium strength - balanced transformation
balanced_edit = pipeline(
    prompt=prompt,
    image=init_image,
    strength=0.5,
    guidance_scale=7.5
).images[0]

# High strength - major transformation
heavy_edit = pipeline(
    prompt=prompt,
    image=init_image,
    strength=0.8,
    guidance_scale=7.5
).images[0]

Strength = 0.0: Returns original unchanged. Strength = 0.3: Light edits, colors and textures change. Strength = 0.5: Balanced stylistic changes. Strength = 0.8: Heavy transformation, may alter composition. Strength = 1.0: Nearly equivalent to text-to-image.

How It Works

The process differs from text-to-image in a crucial way:

Encode: Your image is encoded into latent space using the VAE encoder
Add Noise: Noise is added based on the strength parameter
Denoise: The U-Net removes noise while being guided by your text prompt
Decode: The VAE decoder converts the result back to an image

With strength=0.5, the model only runs the last 50% of denoising steps. Your image starts partially along the denoising trajectory, not from pure noise:

# Understanding the math
total_steps = 50
strength = 0.5
start_step = int(total_steps * (1 - strength))  # Start at step 25
# Model runs steps 25-50, preserving early structure

Practical Applications

Style Transfer

Transform photos into different artistic styles while preserving composition:

# Load a photo
photo = Image.open("landscape_photo.jpg").resize((512, 512))

styles = [
    "oil painting in the style of Van Gogh",
    "watercolor painting with soft edges",
    "pencil sketch with cross-hatching",
    "digital art in cyberpunk style"
]

styled_images = []
for style_prompt in styles:
    result = pipeline(
        prompt=style_prompt,
        image=photo,
        strength=0.6,
        guidance_scale=7.5
    ).images[0]
    styled_images.append(result)

Object Modification

Change specific elements while keeping the rest intact:

# Original: photo of a red car
prompt = "A blue car on a city street"

modified = pipeline(
    prompt=prompt,
    image=car_photo,
    strength=0.4,  # Lower strength to preserve composition
    guidance_scale=10,  # Higher guidance for prompt adherence
    negative_prompt="red, distorted, blurry"
).images[0]

Creating Variations

Generate multiple versions of the same concept:

prompt = "professional product photography of a watch"
variations = []

for i in range(4):
    generator = torch.Generator("cuda").manual_seed(i)
    variant = pipeline(
        prompt=prompt,
        image=watch_image,
        strength=0.35,
        generator=generator
    ).images[0]
    variations.append(variant)

Composition Preservation

Prompts that mention objects and their relations help maintain layout. For strict structure preservation, combine with control methods when available:

# Include spatial information in prompt
spatial_prompt = (
    "Mountain in the background, lake in the middle ground, "
    "trees in the foreground, sunset lighting"
)

# Use negative prompts to prevent unwanted changes
result = pipeline(
    prompt=spatial_prompt,
    image=landscape,
    strength=0.5,
    negative_prompt="cluttered, distorted perspective, warped"
).images[0]

Apply multiple passes with decreasing strength for controlled editing:

def progressive_edit(image, prompt, strengths=[0.6, 0.4, 0.2]):
    current_image = image

    for strength in strengths:
        current_image = pipeline(
            prompt=prompt,
            image=current_image,
            strength=strength,
            guidance_scale=7.5
        ).images[0]

    return current_image

# Gradually transform the image
final = progressive_edit(
    init_image,
    "ethereal fantasy landscape with glowing crystals"
)

Masked Editing (Inpainting)

For precise local edits, use inpainting pipelines:

from diffusers import AutoPipelineForInpainting

inpaint_pipe = AutoPipelineForInpainting.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16
).to("cuda")

# Create a mask (white = edit, black = preserve)
mask = Image.new("L", (512, 512), 0)
# Draw white areas where you want changes

result = inpaint_pipe(
    prompt="a golden crown",
    image=portrait,
    mask_image=mask,
    strength=0.8
).images[0]

Resolution Matching

Always match the model’s native resolution for best results:

def prepare_image(image, target_size=512):
    # Calculate aspect ratio
    aspect = image.width / image.height

    if aspect > 1:  # Landscape
        new_width = target_size
        new_height = int(target_size / aspect)
    else:  # Portrait or square
        new_height = target_size
        new_width = int(target_size * aspect)

    # Resize maintaining aspect ratio
    resized = image.resize((new_width, new_height), Image.LANCZOS)

    # Pad to square if needed
    if new_width != new_height:
        padded = Image.new("RGB", (target_size, target_size))
        paste_x = (target_size - new_width) // 2
        paste_y = (target_size - new_height) // 2
        padded.paste(resized, (paste_x, paste_y))
        return padded

    return resized

Strength vs Steps Visualization

Understanding how strength affects the denoising process:

import matplotlib.pyplot as plt
import numpy as np

# Visualize the relationship
strengths = np.linspace(0, 1, 11)
total_steps = 50

fig, ax = plt.subplots(figsize=(10, 6))

for strength in strengths:
    start_step = int(total_steps * (1 - strength))
    remaining_steps = total_steps - start_step

    ax.barh(
        strength,
        remaining_steps,
        left=start_step,
        height=0.08,
        label=f"Strength {strength:.1f}"
    )

ax.set_xlabel("Denoising Steps")
ax.set_ylabel("Strength Parameter")
ax.set_title("How Strength Controls the Denoising Process")
ax.set_xlim(0, total_steps)
plt.show()

Lower strength means fewer denoising steps, preserving more of the original. Higher strength means more denoising steps, allowing greater transformation.

Upscaling with Image-to-Image

Use image-to-image for intelligent upscaling:

from diffusers import StableDiffusionUpscalePipeline

upscale_pipe = StableDiffusionUpscalePipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler",
    torch_dtype=torch.float16
).to("cuda")

# Upscale 4x with enhancement
upscaled = upscale_pipe(
    prompt="high quality, detailed, sharp",
    image=low_res_image,
    num_inference_steps=20
).images[0]

Troubleshooting

Loss of Detail

If losing too much detail, reduce strength or increase guidance scale. Consider using img2img-specific models.

Color Shifts

To maintain original colors, include color descriptions in your prompt or use lower strength values (0.2-0.4).

Resolution Artifacts

Always resize images to model’s native resolution (512×512 for SD 1.5, 768×768 or 1024×1024 for SDXL).

Color and Style Consistency

To keep color palettes, use lower strength and fewer steps, or apply color transfer post-process. To change only style, keep semantic content in the prompt and emphasize stylistic descriptors:

# Preserve colors while changing style
color_preserving_prompt = (
    "impressionist painting with vibrant blues and warm oranges, "
    "maintaining original color palette"
)

result = pipeline(
    prompt=color_preserving_prompt,
    image=original,
    strength=0.4,
    negative_prompt="desaturated, monochrome, color shift"
).images[0]

Multi-pass Workflows

Iterate with small strength adjustments for complex transformations:

# Stage 1: Style transfer
stage1 = pipeline(
    "oil painting style",
    image=photo,
    strength=0.5
).images[0]

# Stage 2: Lighting adjustment
stage2 = pipeline(
    "dramatic sunset lighting",
    image=stage1,
    strength=0.3
).images[0]

# Stage 3: Final touches
final = pipeline(
    "highly detailed, masterpiece",
    image=stage2,
    strength=0.2
).images[0]

Save seeds for each pass to re-run successful chains deterministically.

Conclusion

Image-to-image generation offers precise control over AI image transformation. The strength parameter balances preservation versus transformation—start with 0.3-0.5 for most tasks. Lower values preserve structure while higher values allow creative freedom. Combine with progressive refinement, inpainting, and proper resolution handling for professional results. Remember that this technique excels at style transfer, concept variation, and iterative refinement tasks.

Steve Kinney