imaginaire.generators package¶

Submodules¶

imaginaire.generators.coco_funit module¶

class imaginaire.generators.coco_funit.COCOFUNITTranslator(num_filters=64, num_filters_mlp=256, style_dims=64, usb_dims=1024, num_res_blocks=2, num_mlp_blocks=3, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, weight_norm_type='', **kwargs)[source]¶

Bases: torch.nn.modules.module.Module

COCO-FUNIT Generator architecture.

Parameters

num_filters (int) – Base filter numbers.
num_filters_mlp (int) – Base filter number in the MLP module.
style_dims (int) – Dimension of the style code.
usb_dims (int) – Dimension of the universal style bias code.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_mlp_blocks (int) – Number of layers in the MLP module.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.
num_image_channels (int) – Number of input image channels.
weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.

decode(content, style)[source]¶

Generate images by combining their content and style codes.

Parameters

content (tensor) – Content code tensor.
style (tensor) – Style code tensor.

encode(images)[source]¶

Encoder images to get their content and style codes.

Parameters: images (tensor) – Input image tensor.

forward(images)[source]¶

Reconstruct the input image by combining the computer content and style code.

Parameters: images (tensor) – Input image tensor.

training = None¶

class imaginaire.generators.coco_funit.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

COCO-FUNIT Generator.

forward(data)[source]¶

In the FUNIT’s forward pass, it generates a content embedding and a style code from the content image, and a style code from the style image. By mixing the content code and the style code from the content image, we reconstruct the input image. By mixing the content code and the style code from the style image, we have a translation output.

Parameters: data (dict) – Training data at the current iteration.

inference(data, keep_original_size=True)[source]¶

COCO-FUNIT inference.

Parameters

data (dict) – Training data at the current iteration. - images_content (tensor): Content images. - images_style (tensor): Style images.
a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.
keep_original_size (bool) – If True, output image is resized
the input content image size. (to) –

training = None¶

imaginaire.generators.dummy module¶

class imaginaire.generators.dummy.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Dummy generator.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

forward(data)[source]¶

Dummy Generator forward.

Parameters: data (dict) –

training = None¶

imaginaire.generators.fs_vid2vid module¶

class imaginaire.generators.fs_vid2vid.AttentionModule(atn_cfg, data_cfg, conv_2d_block, num_filters_each_layer)[source]¶

Bases: torch.nn.modules.module.Module

Attention module constructor.

Parameters

atn_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file
conv_2d_block – Conv2DBlock constructor.
num_filters_each_layer (int) – The number of filters in each layer.

attention_encode(img, net_name)[source]¶

Encode the input image to get the attention map.

Parameters

img (NxCxHxW tensor) – Input image.
net_name (str) – Name for attention network.

Returns

Encoded feature.

Return type

x (NxC2xH2xW2 tensor)

forward(in_features, label, ref_label, attention=None)[source]¶

Get the attention map to combine multiple image features in the case of multiple reference images.

Parameters

in_features ((NxK)xC1xH1xW1 tensor) – Input feaures.
label (NxC2xH2xW2 tensor) – Target label.
ref_label (NxC2xH2xW2 tensor) – Reference label.
attention (Nx(KxH1xW1)x(H1xW1) tensor) – Attention maps.

Returns

out_features (NxC1xH1xW1 tensor): Attention-combined features.
attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.
atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.

Return type

(tuple)

training = None¶

class imaginaire.generators.fs_vid2vid.FlowGenerator(flow_cfg, data_cfg, num_frames)[source]¶

Bases: torch.nn.modules.module.Module

flow generator constructor.

Parameters

flow_cfg (obj) – Flow definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
num_frames (int) – Number of input frames.

forward(label, ref_label, ref_image)[source]¶

Flow generator forward.

Parameters

label (4D tensor) – Input label tensor.
ref_label (4D tensor) – Reference label tensors.
ref_image (4D tensor) – Reference image tensors.

Returns

flow (4D tensor) : Generated flow map.
mask (4D tensor) : Generated occlusion mask.

Return type

(tuple)

training = None¶

class imaginaire.generators.fs_vid2vid.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Few-shot vid2vid generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

SPADE_combine(encoded_label, cond_inputs)[source]¶

Using Multi-SPADE to combine raw synthesized image with warped images.

Parameters

encoded_label (list of tensors) – Original label map embeddings.
cond_inputs (list of tensors) – New SPADE conditional inputs from the warped images.

Returns

Combined conditional inputs.

Return type

encoded_label (list of tensors)

custom_init()[source]¶: This function is for dealing with the numerical issue that might occur when doing mixed precision training.

flow_generation(label, ref_labels, ref_images, prev_labels, prev_images, ref_idx)[source]¶

Generates flows and masks for warping reference / previous images.

Parameters

label (NxCxHxW tensor) – Target label map.
ref_labels (NxKxCxHxW tensor) – Reference label maps.
ref_images (NxKx3xHxW tensor) – Reference images.
prev_labels (NxTxCxHxW tensor) – Previous label maps.
prev_images (NxTx3xHxW tensor) – Previous images.
ref_idx (Nx1 tensor) – Index for which image to use from the
images. (reference) –

Returns

flow (list of Nx2xHxW tensor): Optical flows.
occ_mask (list of Nx1xHxW tensor): Occlusion masks.
img_warp (list of Nx3xHxW tensor): Warped reference / previous images.
cond_inputs (list of Nx4xHxW tensor): Conditional inputs for SPADE combination.

Return type

(tuple)

forward(data)[source]¶

few-shot vid2vid generator forward.

Parameters: data (dict) – Dictionary of input data.
Returns: Dictionary of output data.
Return type: output (dict)

init_network_weights(net_src, net_dst)[source]¶: Initialize weights in net_dst with those in net_src.

init_temporal_network(cfg_init=None)[source]¶

When starting training multiple frames, initialize the flow network.

Parameters: cfg_init (dict) – Weight initialization config.

load_pretrained_network(pretrained_dict, prefix='module.')[source]¶

Load the pretrained network into self network.

Parameters

pretrained_dict (dict) – Pretrained network weights.
prefix (str) – Prefix to the network weights name.

one_up_conv_layer(x, encoded_label, conv_weight, norm_weight, i)[source]¶

One residual block layer in the main branch.

Parameters

x (4D tensor) – Current feature map.
encoded_label (list of tensors) – Encoded input label maps.
conv_weight (list of tensors) – Hyper conv weights.
norm_weight (list of tensors) – Hyper norm weights.
i (int) – Layer index.

Returns

Output feature map.

Return type

x (4D tensor)

reset()[source]¶: Reset the network at the beginning of a sequence.

training = None¶

class imaginaire.generators.fs_vid2vid.LabelEmbedder(emb_cfg, num_input_channels, num_hyper_layers=0)[source]¶

Bases: torch.nn.modules.module.Module

Embed the input label map to get embedded features.

Parameters

emb_cfg (obj) – Embed network configuration.
num_input_channels (int) – Number of input channels.
num_hyper_layers (int) – Number of hyper layers.

forward(input, weights=None)[source]¶

Embedding network forward.

Parameters

input (NxCxHxW tensor) – Network input.
weights (list of tensors) – Conv weights if using hyper network.

Returns

Network outputs at different layers.

Return type

output (list of tensors)

training = None¶

class imaginaire.generators.fs_vid2vid.WeightGenerator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Weight generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file

encode_reference(ref_image, ref_label, label, k)[source]¶

Encode the reference image to get features for weight generation.

Parameters

ref_image ((NxK)x3xHxW tensor) – Reference images.
ref_label ((NxK)xCxHxW tensor) – Reference labels.
label (NxCxHxW tensor) – Target label.
k (int) – Number of reference images.

Returns

x (NxC2xH2xW2 tensor): Encoded features from reference images for the main branch (as input to the decoder).
encoded_ref (list of tensors): Encoded features from reference images for the weight generation branch.
attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.
atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.
ref_idx (Nx1 tensor): Index for which image to use from the reference images.

Return type

(tuple)

forward(ref_image, ref_label, label, is_first_frame)[source]¶

Generate network weights based on the reference images.

Parameters

ref_image (NxKx3xHxW tensor) – Reference images.
ref_label (NxKxCxHxW tensor) – Reference labels.
label (NxCxHxW tensor) – Target label.
is_first_frame (bool) – Whether the current frame is the first frame.

Returns

x (NxC2xH2xW2 tensor): Encoded features from reference images for the main branch (as input to the decoder).
encoded_label (list of tensors): Encoded target label map for SPADE.
conv_weights (list of tensors): Network weights for conv layers in the main network.
norm_weights (list of tensors): Network weights for SPADE layers in the main network.
attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.
atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.
ref_idx (Nx1 tensor): Index for which image to use from the reference images.

Return type

(tuple)

get_conv_weights(x, i)[source]¶

Adaptively generate weights for layer i in main branch convolutions.

Parameters

x (NxCxHxW tensor) – Input features.
i (int) – Layer index.

Returns

conv_weights (list of tensors): Weights for the conv layers in the main branch.

Return type

(tuple)

get_norm_weights(x, i)[source]¶

Adaptively generate weights for SPADE in layer i of generator.

Parameters

x (NxCxHxW tensor) – Input features.
i (int) – Layer index.

Returns

embedding_weights (list of tensors): Weights for the label embedding network.
norm_weights (list of tensors): Weights for the SPADE layers.

Return type

(tuple)

reset()[source]¶: Reset the network at the beginning of a sequence.

training = None¶

class imaginaire.generators.fs_vid2vid.WeightReshaper[source]¶

Bases: object

Handles all weight reshape related tasks.

reshape_embed_input(x)[source]¶

Reshape input to be (B x C) X H X W.

Parameters: x (tensor or list of tensors) – Input features.
Returns: Reshaped features.
Return type: x (tensor or list of tensors)

reshape_weight(x, weight_shape)[source]¶

Reshape input x to the desired weight shape.

Parameters

x (tensor or list of tensors) – Input features.
weight_shape (list of int) – Desired shape of the weight.

Returns

weight (tensor): Network weights
bias (tensor): Network bias.

Return type

(tuple)

split_weights(weight, sizes)[source]¶

When the desired shape is a list, first divide the input to each corresponding weight shape in the list.

Parameters

weight (tensor) – Input weight.
sizes (int or list of int) – Target sizes.

Returns

Divided weights.

Return type

weight (list of tensors)

sum(x)[source]¶

Sum all elements recursively in a nested list.

Parameters: x (nested list of int) – Input list of elements.
Returns: Sum of all elements.
Return type: out (int)

sum_mul(x)[source]¶

Given a weight shape, compute the number of elements needed for weight + bias. If input is a list of shapes, sum all the elements.

Parameters: x (list of int) – Input list of elements.
Returns: Summed number of elements.
Return type: out (int or list of int)

imaginaire.generators.funit module¶

class imaginaire.generators.funit.ContentEncoder(num_downsamples, num_res_blocks, image_channels, num_filters, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]¶

Bases: torch.nn.modules.module.Module

Improved FUNIT Content Encoder. This is basically the same as the original FUNIT content encoder.

Parameters

num_downsamples (int) – Number of times we reduce resolution by 2x2.
num_res_blocks (int) – Number of times we append residual block after all the downsampling modules.
image_channels (int) – Number of input image channels.
num_filters (int) – Base filter number.
padding_mode (str) – Padding mode
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.
nonlinearity (str) – Nonlinearity.

forward(x)[source]¶

Parameters: x (tensor) – Input image.

training = None¶

class imaginaire.generators.funit.Decoder(num_enc_output_channels, style_channels, num_image_channels=3, num_upsamples=4, padding_type='reflect', weight_norm_type='none', nonlinearity='relu')[source]¶

Bases: torch.nn.modules.module.Module

Improved FUNIT decoder.

Parameters

num_enc_output_channels (int) – Number of content feature channels.
style_channels (int) – Dimension of the style code.
num_image_channels (int) – Number of image channels.
num_upsamples (int) – How many times we are going to apply upsample residual block.

forward(x, style)[source]¶

Parameters

x (tensor) – Content embedding of the content image.
style (tensor) – Style embedding of the style image.

training = None¶

class imaginaire.generators.funit.FUNITTranslator(num_filters=64, num_filters_mlp=256, style_dims=64, num_res_blocks=2, num_mlp_blocks=3, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, weight_norm_type='', **kwargs)[source]¶

Bases: torch.nn.modules.module.Module

Parameters

num_filters (int) – Base filter numbers.
num_filters_mlp (int) – Base filter number in the MLP module.
style_dims (int) – Dimension of the style code.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_mlp_blocks (int) – Number of layers in the MLP module.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.
num_image_channels (int) – Number of input image channels.
weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.

decode(content, style)[source]¶

Generate images by combining their content and style codes.

Parameters

content (tensor) – Content code tensor.
style (tensor) – Style code tensor.

encode(images)[source]¶

Encoder images to get their content and style codes.

Parameters: images (tensor) – Input image tensor.

forward(images)[source]¶

Reconstruct the input image by combining the computer content and style code.

Parameters: images (tensor) – Input image tensor.

training = None¶

class imaginaire.generators.funit.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Generator of the improved FUNIT baseline in the COCO-FUNIT paper.

forward(data)[source]¶

In the FUNIT’s forward pass, it generates a content embedding and a style code from the content image, and a style code from the style image. By mixing the content code and the style code from the content image, we reconstruct the input image. By mixing the content code and the style code from the style image, we have a translation output.

Parameters: data (dict) – Training data at the current iteration.

inference(data, keep_original_size=True)[source]¶

COCO-FUNIT inference.

Parameters

data (dict) – Training data at the current iteration. - images_content (tensor): Content images. - images_style (tensor): Style images.
a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.
keep_original_size (bool) – If True, output image is resized
the input content image size. (to) –

training = None¶

class imaginaire.generators.funit.MLP(input_dim, output_dim, latent_dim, num_layers, activation_norm_type, nonlinearity)[source]¶

Bases: torch.nn.modules.module.Module

Improved FUNIT style decoder.

Parameters

input_dim (int) – Input dimension (style code dimension).
output_dim (int) – Output dimension (to be fed into the AdaIN layer).
latent_dim (int) – Latent dimension.
num_layers (int) – Number of layers in the MLP.
activation_norm_type (str) – Activation type.
nonlinearity (str) – Nonlinearity type.

forward(x)[source]¶

Parameters: x (tensor) – Input tensor.

training = None¶

class imaginaire.generators.funit.StyleEncoder(num_downsamples, image_channels, num_filters, style_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]¶

Bases: torch.nn.modules.module.Module

Improved FUNIT Style Encoder. This is basically the same as the original FUNIT Style Encoder.

Parameters

num_downsamples (int) – Number of times we reduce resolution by 2x2.
image_channels (int) – Number of input image channels.
num_filters (int) – Base filter number.
style_channels (int) – Style code dimension.
padding_mode (str) – Padding mode.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.
nonlinearity (str) – Nonlinearity.

forward(x)[source]¶

Parameters: x (tensor) – Input image.

training = None¶

imaginaire.generators.gancraft module¶

class imaginaire.generators.gancraft.Generator(gen_cfg, data_cfg)[source]¶

Bases: imaginaire.generators.gancraft_base.Base3DGenerator

GANcraft generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

custom_init()[source]¶: Weight initialization of GANcraft components.

forward(data, random_style=False)[source]¶

GANcraft Generator forward.

Parameters

data (dict) – images (N x 3 x H x W tensor) : Real images voxel_id (N x H x W x max_samples x 1 tensor): IDs of intersected tensors along each ray. depth2 (N x 2 x H x W x max_samples x 1 tensor): Depths of entrance and exit points for each ray-voxel intersection. raydirs (N x H x W x 1 x 3 tensor): The direction of each ray. cam_ori_t (N x 3 tensor): Camera origins.
random_style (bool) – Whether to sample a random style vector.

Returns

fake_images (N x 3 x H x W tensor): fake images mu (N x C1 tensor): mean vectors logvar (N x C1 tensor): log-variance vectors

Return type

output (dict)

get_pseudo_gt(pseudo_gen, voxel_id, z=None, style_img=None, resize_512=True, deterministic=False)[source]¶

Evaluating img2img network to obtain pseudo-ground truth images.

Parameters

pseudo_gen (callable) – Function converting mask to image using img2img network.
voxel_id (N x img_dims[0] x img_dims[1] x max_samples x 1 tensor) – IDs of intersected tensors along
ray. (each) –
z (N x C tensor) – Optional style code passed to pseudo_gen.
style_img (N x 3 x H x W tensor) – Optional style image passed to pseudo_gen.
resize_512 (bool) – If True, evaluate pseudo_gen at 512x512 regardless of input resolution.
deterministic (bool) – If True, disable stochastic label mapping.

inference(output_dir, camera_mode, style_img_path=None, seed=1, pad=30, num_samples=40, num_blocks_early_stop=6, sample_depth=3, tile_size=128, resolution_hw=[540, 960], cam_ang=72, cam_maxstep=10)[source]¶

Compute result images according to the provided camera trajectory and save the results in the specified folder. The full image is evaluated in multiple tiles to save memory.

Parameters

output_dir (str) – Where should the results be stored.
camera_mode (int) – Which camera trajectory to use.
style_img_path (str) – Path to the style-conditioning image.
seed (int) – Random seed (controls style when style_image_path is not specified).
pad (int) – Pixels to remove from the image tiles before stitching. Should be equal or larger than the
field of the CNN to avoid border artifact. (receptive) –
num_samples (int) – Number of samples per ray (different from training).
num_blocks_early_stop (int) – Max number of intersected boxes per ray before stopping
from training) ((different) –
sample_depth (float) – Max distance traveled through boxes before stopping (different from training).
tile_size (int) – Max size of a tile in pixels.
resolution_hw (list [H, W]) – Resolution of the output image.
cam_ang (float) – Horizontal FOV of the camera (may be adjusted by the camera controller).
cam_maxstep (int) – Number of frames sampled from the camera trajectory.

sample_camera(data, pseudo_gen)[source]¶

Sample camera randomly and precompute everything used by both Gen and Dis.

Parameters

data (dict) – images (N x 3 x H x W tensor) : Real images label (N x C2 x H x W tensor) : Segmentation map
pseudo_gen (callable) – Function converting mask to image using img2img network.

Returns

voxel_id (N x H x W x max_samples x 1 tensor): IDs of intersected tensors along each ray. depth2 (N x 2 x H x W x max_samples x 1 tensor): Depths of entrance and exit points for each ray-voxel intersection. raydirs (N x H x W x 1 x 3 tensor): The direction of each ray. cam_ori_t (N x 3 tensor): Camera origins. pseudo_real_img (N x 3 x H x W tensor): Pseudo-ground truth image. real_masks (N x C3 x H x W tensor): One-hot segmentation map for real images, with translated labels. fake_masks (N x C3 x H x W tensor): One-hot segmentation map for sampled camera views.

Return type

ret (dict)

training = None¶

imaginaire.generators.gancraft_base module¶

class imaginaire.generators.gancraft_base.Base3DGenerator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Minecraft 3D generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

get_param_groups(cfg_opt)[source]¶

training = None¶

class imaginaire.generators.gancraft_base.RenderCNN(in_channels, style_dim, hidden_channels=256, leaky_relu=True)[source]¶

Bases: torch.nn.modules.module.Module

CNN converting intermediate feature map to final image.

forward(x, z)[source]¶

Forward network.

Parameters

x (N x in_channels x H x W tensor) – Intermediate feature map
z (N x style_dim tensor) – Style codes.

modulate(x, w, b)[source]¶

training = None¶

class imaginaire.generators.gancraft_base.RenderMLP(in_channels, style_dim, viewdir_dim, mask_dim=680, out_channels_s=1, out_channels_c=3, hidden_channels=256, use_seg=True)[source]¶

Bases: torch.nn.modules.module.Module

MLP with affine modulation.

forward(x, raydir, z, m)[source]¶

Forward network

Parameters

x (N x H x W x M x in_channels tensor) – Projected features.
raydir (N x H x W x 1 x viewdir_dim tensor) – Ray directions.
z (N x style_dim tensor) – Style codes.
m (N x H x W x M x mask_dim tensor) – One-hot segmentation maps.

training = None¶

class imaginaire.generators.gancraft_base.SKYMLP(in_channels, style_dim, out_channels_c=3, hidden_channels=256, leaky_relu=True)[source]¶

Bases: torch.nn.modules.module.Module

MLP converting ray directions to sky features.

forward(x, z)[source]¶

Forward network

Parameters

x (.. x in_channels tensor) – Ray direction embeddings.
z (.. x style_dim tensor) – Style codes.

training = None¶

class imaginaire.generators.gancraft_base.StyleEncoder(style_enc_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Style Encoder constructor.

Parameters: style_enc_cfg (obj) – Style encoder definition file.

forward(input_x)[source]¶

SPADE Style Encoder forward.

Parameters: input_x (N x 3 x H x W tensor) – input images.
Returns: Mean vectors. logvar (N x C tensor): Log-variance vectors. z (N x C tensor): Style code vectors.
Return type: mu (N x C tensor)

training = None¶

class imaginaire.generators.gancraft_base.StyleMLP(style_dim, out_dim, hidden_channels=256, leaky_relu=True, num_layers=5, normalize_input=True, output_act=True)[source]¶

Bases: torch.nn.modules.module.Module

MLP converting style code to intermediate style representation.

forward(z)[source]¶

Forward network

Parameters: z (N x style_dim tensor) – Style codes.

training = None¶

imaginaire.generators.munit module¶

class imaginaire.generators.munit.AutoEncoder(num_filters=64, max_num_filters=256, num_filters_mlp=256, latent_dim=8, num_res_blocks=4, num_mlp_blocks=2, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, content_norm_type='instance', style_norm_type='', decoder_norm_type='instance', weight_norm_type='', decoder_norm_params=namespace(affine=False), output_nonlinearity='', pre_act=False, apply_noise=False, **kwargs)[source]¶

Bases: torch.nn.modules.module.Module

Improved MUNIT autoencoder.

Parameters

num_filters (int) – Base filter numbers.
max_num_filters (int) – Maximum number of filters in the encoder.
num_filters_mlp (int) – Base filter number in the MLP module.
latent_dim (int) – Dimension of the style code.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_mlp_blocks (int) – Number of layers in the MLP module.
num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_image_channels (int) – Number of input image channels.
content_norm_type (str) – Type of activation normalization in the content encoder.
style_norm_type (str) – Type of activation normalization in the style encoder.
decoder_norm_type (str) – Type of activation normalization in the decoder.
weight_norm_type (str) – Type of weight normalization.
decoder_norm_params (obj) – Parameters of activation normalization in the decoder. If not None, decoder_norm_params.__dict__ will be used as keyword arguments when initializing activation normalization.
output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.
pre_act (bool) – If True, uses pre-activation residual blocks.
apply_noise (bool) – If True, injects Gaussian noise in the decoder.

decode(content, style)[source]¶

Decode content and style code to an image.

Parameters

content (Tensor) – Content code.
style (Tensor) – Style code.

Returns

Output images.

Return type

images (Tensor)

encode(images)[source]¶

Encode an image to content and style code.

Parameters

images (Tensor) – Input images.

Returns

content (Tensor): Content code.
style (Tensor): Style code.

Return type

(tuple)

forward(images)[source]¶

Reconstruct an image.

Parameters: images (Tensor) – Input images.
Returns: Reconstructed images.
Return type: images_recon (Tensor)

training = None¶

class imaginaire.generators.munit.Decoder(num_upsamples, num_res_blocks, num_filters, num_image_channels, style_channels, padding_mode, activation_norm_type, activation_norm_params, weight_norm_type, nonlinearity, output_nonlinearity, pre_act=False, apply_noise=False)[source]¶

Bases: torch.nn.modules.module.Module

Improved MUNIT decoder. The network consists of

$(num_res_blocks) residual blocks.
$(num_upsamples) residual blocks or convolutional blocks
output layer.

Parameters

num_upsamples (int) – Number of times we increase resolution by 2x2.
num_res_blocks (int) – Number of residual blocks.
num_filters (int) – Base filter numbers.
num_image_channels (int) – Number of input image channels.
style_channels (int) – Dimension of the style code.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
activation_norm_params (obj) – Parameters of activation normalization. If not None, decoder_norm_params.__dict__ will be used as keyword arguments when initializing activation normalization.
weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.
output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.
pre_act (bool) – If True, uses pre-activation residual blocks.
apply_noise (bool) – If True, injects Gaussian noise.

forward(x, style)[source]¶

Parameters

x (tensor) – Content embedding of the content image.
style (tensor) – Style embedding of the style image.

training = None¶

class imaginaire.generators.munit.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Improved MUNIT generator.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

forward(data, random_style=True, image_recon=True, latent_recon=True, cycle_recon=True, within_latent_recon=False)[source]¶

In MUNIT’s forward pass, it generates a content code and a style code from images in both domain. It then performs a within-domain reconstruction step and a cross-domain translation step. In within-domain reconstruction, it reconstructs an image using the content and style from the same image and optionally encodes the image back to the latent space. In cross-domain translation, it generates an translated image by mixing the content and style from images in different domains, and optionally encodes the image back to the latent space.

Parameters

data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.
random_style (bool) – If True, samples the style code from the prior distribution, otherwise uses the style code encoded from the input images in the other domain.
image_recon (bool) – If True, also returns reconstructed images.
latent_recon (bool) – If True, also returns reconstructed latent code during cross-domain translation.
cycle_recon (bool) – If True, also returns cycle reconstructed images.
within_latent_recon (bool) – If True, also returns reconstructed latent code during within-domain reconstruction.

inference(data, a2b=True, random_style=True)[source]¶

MUNIT inference.

Parameters

data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.
a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.
random_style (bool) – If True, samples the style code from the prior distribution, otherwise uses the style code encoded from the input images in the other domain.

training = None¶

class imaginaire.generators.munit.MLP(input_dim, output_dim, latent_dim, num_layers, norm, nonlinearity)[source]¶

Bases: torch.nn.modules.module.Module

The multi-layer perceptron (MLP) that maps Gaussian style code to a feature vector that is given as the conditional input to AdaIN.

Parameters

input_dim (int) – Number of channels in the input tensor.
output_dim (int) – Number of channels in the output tensor.
latent_dim (int) – Number of channels in the latent features.
num_layers (int) – Number of layers in the MLP.
norm (str) – Type of activation normalization.
nonlinearity (str) – Type of nonlinear activation function.

forward(x)[source]¶

Parameters: x (tensor) – Input image.

training = None¶

class imaginaire.generators.munit.StyleEncoder(num_downsamples, num_image_channels, num_filters, style_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]¶

Bases: torch.nn.modules.module.Module

MUNIT style encoder.

Parameters

num_downsamples (int) – Number of times we reduce resolution by 2x2.
num_image_channels (int) – Number of input image channels.
num_filters (int) – Base filter numbers.
style_channels (int) – Dimension of the style code.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.

forward(x)[source]¶

Parameters: x (tensor) – Input image.

training = None¶

imaginaire.generators.pix2pixHD module¶

class imaginaire.generators.pix2pixHD.Encoder(enc_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Encoder for getting region-wise features for style control.

Parameters

enc_cfg (obj) – Encoder definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file

forward(input, instance_map)[source]¶

Extracting region-wise features

Parameters

input (4D tensor) – Real RGB images.
instance_map (4D tensor) – Instance label mask.

Returns

Instance-wise average-pooled: feature maps.

Return type

outputs_mean (4D tensor)

training = None¶

class imaginaire.generators.pix2pixHD.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Pix2pixHD coarse-to-fine generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

forward(data, random_style=False)[source]¶

Coarse-to-fine generator forward.

Parameters

data (dict) – Dictionary of input data.
random_style (bool) – Always set to false for the pix2pixHD model.

Returns

Dictionary of output data.

Return type

output (dict)

inference(data, **kwargs)[source]¶

Generator inference.

Parameters: data (dict) – Dictionary of input data.
Returns: Output fake images. file_names (str): Data file name.
Return type: fake_images (tensor)

load_pretrained_network(pretrained_dict)[source]¶: Load a pretrained network.

training = None¶

class imaginaire.generators.pix2pixHD.GlobalGenerator(gen_cfg, data_cfg, num_input_channels, padding_mode, base_conv_block, base_res_block)[source]¶

Bases: torch.nn.modules.module.Module

Coarse generator constructor. This is the main generator in the pix2pixHD architecture.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
num_input_channels (int) – Number of segmentation labels.
padding_mode (str) – zero | reflect | …
base_conv_block (obj) – Conv block with preset attributes.
base_res_block (obj) – Residual block with preset attributes.

forward(input)[source]¶

Coarse-to-fine generator forward.

Parameters: input (4D tensor) – Input semantic representations.
Returns: Synthesized image by generator.
Return type: output (4D tensor)

training = None¶

class imaginaire.generators.pix2pixHD.LocalEnhancer(gen_cfg, data_cfg, num_input_channels, num_filters, padding_mode, base_conv_block, base_res_block, output_img=False)[source]¶

Bases: torch.nn.modules.module.Module

Local enhancer constructor. These are sub-networks that are useful when aiming to produce high-resolution outputs.

Parameters

gen_cfg (obj) – local generator definition part of the yaml config
file. –
data_cfg (obj) – Data definition part of the yaml config file.
num_input_channels (int) – Number of segmentation labels.
num_filters (int) – Number of filters for the first layer.
padding_mode (str) – zero | reflect | …
base_conv_block (obj) – Conv block with preset attributes.
base_res_block (obj) – Residual block with preset attributes.
output_img (bool) – Output is image or feature map.

forward(output_coarse, input_fine)[source]¶

Local enhancer forward.

Parameters

output_coarse (4D tensor) – Coarse output from previous layer.
input_fine (4D tensor) – Fine input from current layer.

Returns

Refined output.

Return type

output (4D tensor)

training = None¶

imaginaire.generators.spade module¶

class imaginaire.generators.spade.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

SPADE generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

forward(data, random_style=False)[source]¶

SPADE Generator forward.

Parameters

data (dict) –

images (N x C1 x H x W tensor) : Ground truth images
label (N x C2 x H x W tensor) : Semantic representations
z (N x style_dims tensor): Gaussian random noise
random_style (bool): Whether to sample a random style vector.

Returns

fake_images (N x 3 x H x W tensor): fake images
mu (N x C1 tensor): mean vectors
logvar (N x C1 tensor): log-variance vectors

Return type

(dict)

inference(data, random_style=False, use_fixed_random_style=False, keep_original_size=False)[source]¶

Compute results images for a batch of input data and save the results in the specified folder.

Parameters

data (dict) –
- images (N x C1 x H x W tensor) : Ground truth images
- label (N x C2 x H x W tensor) : Semantic representations
- z (N x style_dims tensor): Gaussian random noise
random_style (bool) – Whether to sample a random style vector.
use_fixed_random_style (bool) – Sample random style once and use it for all the remaining inference.
keep_original_size (bool) – Keep original size of the input.

Returns

fake_images (N x 3 x H x W tensor): fake images
mu (N x C1 tensor): mean vectors
logvar (N x C1 tensor): log-variance vectors

Return type

(dict)

training = None¶

class imaginaire.generators.spade.SPADEGenerator(num_labels, out_image_small_side_size, image_channels, num_filters, kernel_size, style_dims, activation_norm_params, weight_norm_type, global_adaptive_norm_type, skip_activation_norm, use_posenc_in_input_layer, use_style_encoder, output_multiplier)[source]¶

Bases: torch.nn.modules.module.Module

SPADE Image Generator constructor.

Parameters

num_labels (int) – Number of different labels.
out_image_small_side_size (int) – min(width, height)
image_channels (int) – Num. of channels of the output image.
num_filters (int) – Base filter numbers.
kernel_size (int) – Convolution kernel size.
style_dims (int) – Dimensions of the style code.
activation_norm_params (obj) – Spatially adaptive normalization param.
weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.
global_adaptive_norm_type (str) – Type of normalization in SPADE.
skip_activation_norm (bool) – If True, applies activation norm to the shortcut connection in residual blocks.
use_style_encoder (bool) – Whether to use global adaptive norm like conditional batch norm or adaptive instance norm.
output_multiplier (float) – A positive number multiplied to the output

forward(data)[source]¶

SPADE Generator forward.

Parameters

data (dict) –

data (N x C1 x H x W tensor) : Ground truth images.
label (N x C2 x H x W tensor) : Semantic representations.
z (N x style_dims tensor): Gaussian random noise.

Returns

fake_images (N x 3 x H x W tensor): Fake images.

Return type

output (dict)

training = None¶

class imaginaire.generators.spade.StyleEncoder(style_enc_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Style Encode constructor.

Parameters: style_enc_cfg (obj) – Style encoder definition file.

forward(input_x)[source]¶

SPADE Style Encoder forward.

Parameters

input_x (N x 3 x H x W tensor) – input images.

Returns

mu (N x C tensor): Mean vectors.
logvar (N x C tensor): Log-variance vectors.
z (N x C tensor): Style code vectors.

Return type

(tuple)

training = None¶

imaginaire.generators.unit module¶

class imaginaire.generators.unit.AutoEncoder(num_filters=64, max_num_filters=256, num_res_blocks=4, num_downsamples_content=2, num_image_channels=3, content_norm_type='instance', decoder_norm_type='instance', weight_norm_type='', output_nonlinearity='', pre_act=False, apply_noise=False, **kwargs)[source]¶

Bases: torch.nn.modules.module.Module

Improved UNIT autoencoder.

Parameters

num_filters (int) – Base filter numbers.
max_num_filters (int) – Maximum number of filters in the encoder.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_image_channels (int) – Number of input image channels.
content_norm_type (str) – Type of activation normalization in the content encoder.
decoder_norm_type (str) – Type of activation normalization in the decoder.
weight_norm_type (str) – Type of weight normalization.
output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.
pre_act (bool) – If True, uses pre-activation residual blocks.
apply_noise (bool) – If True, injects Gaussian noise in the decoder.

forward(images)[source]¶

Reconstruct an image.

Parameters: images (Tensor) – Input images.
Returns: Reconstructed images.
Return type: images_recon (Tensor)

training = None¶

class imaginaire.generators.unit.ContentEncoder(num_downsamples, num_res_blocks, num_image_channels, num_filters, max_num_filters, padding_mode, activation_norm_type, weight_norm_type, nonlinearity, pre_act=False)[source]¶

Bases: torch.nn.modules.module.Module

Improved UNIT encoder. The network consists of:

input layers
$(num_downsamples) convolutional blocks
$(num_res_blocks) residual blocks.
output layer.

Parameters

num_downsamples (int) – Number of times we reduce resolution by 2x2.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_image_channels (int) – Number of input image channels.
num_filters (int) – Base filter numbers.
max_num_filters (int) – Maximum number of filters in the encoder.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.
pre_act (bool) – If True, uses pre-activation residual blocks.

forward(x)[source]¶

Parameters: x (tensor) – Input image.

training = None¶

class imaginaire.generators.unit.Decoder(num_upsamples, num_res_blocks, num_filters, num_image_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity, output_nonlinearity, pre_act=False, apply_noise=False)[source]¶

Bases: torch.nn.modules.module.Module

Improved UNIT decoder. The network consists of:

$(num_res_blocks) residual blocks.
$(num_upsamples) residual blocks or convolutional blocks
output layer.

Parameters

num_upsamples (int) – Number of times we increase resolution by 2x2.
num_res_blocks (int) – Number of residual blocks.
num_filters (int) – Base filter numbers.
num_image_channels (int) – Number of input image channels.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.
output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.
pre_act (bool) – If True, uses pre-activation residual blocks.
apply_noise (bool) – If True, injects Gaussian noise.

forward(x)[source]¶

Parameters: x (tensor) – Content embedding of the content image.

training = None¶

class imaginaire.generators.unit.Generator(gen_cfg, data_cfg)[source]¶

Bases: torch.nn.modules.module.Module

Improved UNIT generator.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

forward(data, image_recon=True, cycle_recon=True)[source]¶: UNIT forward function

inference(data, a2b=True)[source]¶

UNIT inference.

Parameters

data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.
a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.

training = None¶

imaginaire.generators.vid2vid module¶

class imaginaire.generators.vid2vid.BaseNetwork[source]¶

Bases: torch.nn.modules.module.Module

vid2vid generator.

get_num_filters(num_downsamples)[source]¶

Get the number of filters at current layer.

Parameters: num_downsamples (int) – How many downsamples at current layer.
Returns: Number of filters.
Return type: output (int)

training = None¶

class imaginaire.generators.vid2vid.FlowGenerator(flow_cfg, data_cfg)[source]¶

Bases: imaginaire.generators.vid2vid.BaseNetwork

Flow generator constructor.

Parameters

flow_cfg (obj) – Flow definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

forward(label, img_prev)[source]¶

Flow generator forward.

Parameters

label (4D tensor) – Input label tensor.
img_prev (4D tensor) – Previously generated image tensors.

Returns

flow (4D tensor) : Generated flow map.
mask (4D tensor) : Generated occlusion mask.

Return type

(tuple)

training = None¶

class imaginaire.generators.vid2vid.Generator(gen_cfg, data_cfg)[source]¶

Bases: imaginaire.generators.vid2vid.BaseNetwork

vid2vid generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.

forward(data)[source]¶

vid2vid generator forward.

Parameters: data (dict) – Dictionary of input data.
Returns: Dictionary of output data.
Return type: output (dict)

get_cond_dims(num_downs=0)[source]¶

Get the dimensions of conditional inputs.

Parameters: num_downs (int) – How many downsamples at current layer.
Returns: List of dimensions.
Return type: ch (list)

get_cond_maps(label, embedder)[source]¶

Get the conditional inputs.

Parameters

label (4D tensor) – Input label tensor.
embedder (obj) – Embedding network.

Returns

List of conditional inputs.

Return type

cond_maps (list)

init_temporal_network(cfg_init=None)[source]¶

When starting training multiple frames, initialize the downsampling network and flow network.

Parameters: cfg_init (dict) – Weight initialization config.

one_up_conv_layer(x, encoded_label, i)[source]¶

One residual block layer in the main branch.

Parameters

x (4D tensor) – Current feature map.
encoded_label (list of tensors) – Encoded input label maps.
i (int) – Layer index.

Returns

Output feature map.

Return type

x (4D tensor)

training = None¶

imaginaire.generators.wc_vid2vid module¶

class imaginaire.generators.wc_vid2vid.Generator(gen_cfg, data_cfg)[source]¶

Bases: imaginaire.generators.vid2vid.Generator

world consistent vid2vid generator constructor.

Parameters

gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file

forward(data)[source]¶

vid2vid generator forward. :param data: Dictionary of input data. :type data: dict

Returns: Dictionary of output data.
Return type: output (dict)

get_cond_dims(num_downs=0)[source]¶

Get the dimensions of conditional inputs. :param num_downs: How many downsamples at current layer. :type num_downs: int

Returns: List of dimensions.
Return type: ch (list)

get_cond_maps(label, embedder)[source]¶

Get the conditional inputs. :param label: Input label tensor. :type label: 4D tensor :param embedder: Embedding network. :type embedder: obj

Returns: List of conditional inputs.
Return type: cond_maps (list)

get_guidance_images_and_masks(unprojection)[source]¶: Do stuff.

get_partial(num_downs=0)[source]¶

Get if convs should be partial or not. :param num_downs: How many downsamples at current layer. :type num_downs: int

Returns: List of boolean partial or not.
Return type: partial (list)

renderer_update_point_cloud(image, point_info)[source]¶: Update the renderer’s color dictionary.

reset_renderer(is_flipped_input=False)[source]¶: Reset the renderer. :param is_flipped_input: Is the input sequence left-right flipped? :type is_flipped_input: bool

training = None¶

imaginaire.generators package¶

Submodules¶

imaginaire.generators.coco_funit module¶

imaginaire.generators.dummy module¶

imaginaire.generators.fs_vid2vid module¶

imaginaire.generators.funit module¶

imaginaire.generators.gancraft module¶

imaginaire.generators.gancraft_base module¶

imaginaire.generators.munit module¶

imaginaire.generators.pix2pixHD module¶

imaginaire.generators.spade module¶

imaginaire.generators.unit module¶

imaginaire.generators.vid2vid module¶

imaginaire.generators.wc_vid2vid module¶

Module contents¶