imaginaire.generators package

Submodules

imaginaire.generators.coco_funit module

class imaginaire.generators.coco_funit.COCOFUNITTranslator(num_filters=64, num_filters_mlp=256, style_dims=64, usb_dims=1024, num_res_blocks=2, num_mlp_blocks=3, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, weight_norm_type='', **kwargs)[source]

Bases: torch.nn.modules.module.Module

COCO-FUNIT Generator architecture.

Parameters
  • num_filters (int) – Base filter numbers.

  • num_filters_mlp (int) – Base filter number in the MLP module.

  • style_dims (int) – Dimension of the style code.

  • usb_dims (int) – Dimension of the universal style bias code.

  • num_res_blocks (int) – Number of residual blocks at the end of the content encoder.

  • num_mlp_blocks (int) – Number of layers in the MLP module.

  • num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.

  • num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.

  • num_image_channels (int) – Number of input image channels.

  • weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.

decode(content, style)[source]

Generate images by combining their content and style codes.

Parameters
  • content (tensor) – Content code tensor.

  • style (tensor) – Style code tensor.

encode(images)[source]

Encoder images to get their content and style codes.

Parameters

images (tensor) – Input image tensor.

forward(images)[source]

Reconstruct the input image by combining the computer content and style code.

Parameters

images (tensor) – Input image tensor.

training = None
class imaginaire.generators.coco_funit.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

COCO-FUNIT Generator.

forward(data)[source]

In the FUNIT’s forward pass, it generates a content embedding and a style code from the content image, and a style code from the style image. By mixing the content code and the style code from the content image, we reconstruct the input image. By mixing the content code and the style code from the style image, we have a translation output.

Parameters

data (dict) – Training data at the current iteration.

inference(data, keep_original_size=True)[source]

COCO-FUNIT inference.

Parameters
  • data (dict) – Training data at the current iteration. - images_content (tensor): Content images. - images_style (tensor): Style images.

  • a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.

  • keep_original_size (bool) – If True, output image is resized

  • the input content image size. (to) –

training = None

imaginaire.generators.dummy module

class imaginaire.generators.dummy.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Dummy generator.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

forward(data)[source]

Dummy Generator forward.

Parameters

data (dict) –

training = None

imaginaire.generators.fs_vid2vid module

class imaginaire.generators.fs_vid2vid.AttentionModule(atn_cfg, data_cfg, conv_2d_block, num_filters_each_layer)[source]

Bases: torch.nn.modules.module.Module

Attention module constructor.

Parameters
  • atn_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file

  • conv_2d_block – Conv2DBlock constructor.

  • num_filters_each_layer (int) – The number of filters in each layer.

attention_encode(img, net_name)[source]

Encode the input image to get the attention map.

Parameters
  • img (NxCxHxW tensor) – Input image.

  • net_name (str) – Name for attention network.

Returns

Encoded feature.

Return type

x (NxC2xH2xW2 tensor)

forward(in_features, label, ref_label, attention=None)[source]

Get the attention map to combine multiple image features in the case of multiple reference images.

Parameters
  • in_features ((NxK)xC1xH1xW1 tensor) – Input feaures.

  • label (NxC2xH2xW2 tensor) – Target label.

  • ref_label (NxC2xH2xW2 tensor) – Reference label.

  • attention (Nx(KxH1xW1)x(H1xW1) tensor) – Attention maps.

Returns

  • out_features (NxC1xH1xW1 tensor): Attention-combined features.

  • attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.

  • atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.

Return type

(tuple)

training = None
class imaginaire.generators.fs_vid2vid.FlowGenerator(flow_cfg, data_cfg, num_frames)[source]

Bases: torch.nn.modules.module.Module

flow generator constructor.

Parameters
  • flow_cfg (obj) – Flow definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

  • num_frames (int) – Number of input frames.

forward(label, ref_label, ref_image)[source]

Flow generator forward.

Parameters
  • label (4D tensor) – Input label tensor.

  • ref_label (4D tensor) – Reference label tensors.

  • ref_image (4D tensor) – Reference image tensors.

Returns

  • flow (4D tensor) : Generated flow map.

  • mask (4D tensor) : Generated occlusion mask.

Return type

(tuple)

training = None
class imaginaire.generators.fs_vid2vid.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Few-shot vid2vid generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

SPADE_combine(encoded_label, cond_inputs)[source]

Using Multi-SPADE to combine raw synthesized image with warped images.

Parameters
  • encoded_label (list of tensors) – Original label map embeddings.

  • cond_inputs (list of tensors) – New SPADE conditional inputs from the warped images.

Returns

Combined conditional inputs.

Return type

encoded_label (list of tensors)

custom_init()[source]

This function is for dealing with the numerical issue that might occur when doing mixed precision training.

flow_generation(label, ref_labels, ref_images, prev_labels, prev_images, ref_idx)[source]

Generates flows and masks for warping reference / previous images.

Parameters
  • label (NxCxHxW tensor) – Target label map.

  • ref_labels (NxKxCxHxW tensor) – Reference label maps.

  • ref_images (NxKx3xHxW tensor) – Reference images.

  • prev_labels (NxTxCxHxW tensor) – Previous label maps.

  • prev_images (NxTx3xHxW tensor) – Previous images.

  • ref_idx (Nx1 tensor) – Index for which image to use from the

  • images. (reference) –

Returns

  • flow (list of Nx2xHxW tensor): Optical flows.

  • occ_mask (list of Nx1xHxW tensor): Occlusion masks.

  • img_warp (list of Nx3xHxW tensor): Warped reference / previous images.

  • cond_inputs (list of Nx4xHxW tensor): Conditional inputs for SPADE combination.

Return type

(tuple)

forward(data)[source]

few-shot vid2vid generator forward.

Parameters

data (dict) – Dictionary of input data.

Returns

Dictionary of output data.

Return type

output (dict)

init_network_weights(net_src, net_dst)[source]

Initialize weights in net_dst with those in net_src.

init_temporal_network(cfg_init=None)[source]

When starting training multiple frames, initialize the flow network.

Parameters

cfg_init (dict) – Weight initialization config.

load_pretrained_network(pretrained_dict, prefix='module.')[source]

Load the pretrained network into self network.

Parameters
  • pretrained_dict (dict) – Pretrained network weights.

  • prefix (str) – Prefix to the network weights name.

one_up_conv_layer(x, encoded_label, conv_weight, norm_weight, i)[source]

One residual block layer in the main branch.

Parameters
  • x (4D tensor) – Current feature map.

  • encoded_label (list of tensors) – Encoded input label maps.

  • conv_weight (list of tensors) – Hyper conv weights.

  • norm_weight (list of tensors) – Hyper norm weights.

  • i (int) – Layer index.

Returns

Output feature map.

Return type

x (4D tensor)

reset()[source]

Reset the network at the beginning of a sequence.

training = None
class imaginaire.generators.fs_vid2vid.LabelEmbedder(emb_cfg, num_input_channels, num_hyper_layers=0)[source]

Bases: torch.nn.modules.module.Module

Embed the input label map to get embedded features.

Parameters
  • emb_cfg (obj) – Embed network configuration.

  • num_input_channels (int) – Number of input channels.

  • num_hyper_layers (int) – Number of hyper layers.

forward(input, weights=None)[source]

Embedding network forward.

Parameters
  • input (NxCxHxW tensor) – Network input.

  • weights (list of tensors) – Conv weights if using hyper network.

Returns

Network outputs at different layers.

Return type

output (list of tensors)

training = None
class imaginaire.generators.fs_vid2vid.WeightGenerator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Weight generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file

encode_reference(ref_image, ref_label, label, k)[source]

Encode the reference image to get features for weight generation.

Parameters
  • ref_image ((NxK)x3xHxW tensor) – Reference images.

  • ref_label ((NxK)xCxHxW tensor) – Reference labels.

  • label (NxCxHxW tensor) – Target label.

  • k (int) – Number of reference images.

Returns

  • x (NxC2xH2xW2 tensor): Encoded features from reference images for the main branch (as input to the decoder).

  • encoded_ref (list of tensors): Encoded features from reference images for the weight generation branch.

  • attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.

  • atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.

  • ref_idx (Nx1 tensor): Index for which image to use from the reference images.

Return type

(tuple)

forward(ref_image, ref_label, label, is_first_frame)[source]

Generate network weights based on the reference images.

Parameters
  • ref_image (NxKx3xHxW tensor) – Reference images.

  • ref_label (NxKxCxHxW tensor) – Reference labels.

  • label (NxCxHxW tensor) – Target label.

  • is_first_frame (bool) – Whether the current frame is the first frame.

Returns

  • x (NxC2xH2xW2 tensor): Encoded features from reference images for the main branch (as input to the decoder).

  • encoded_label (list of tensors): Encoded target label map for SPADE.

  • conv_weights (list of tensors): Network weights for conv layers in the main network.

  • norm_weights (list of tensors): Network weights for SPADE layers in the main network.

  • attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.

  • atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.

  • ref_idx (Nx1 tensor): Index for which image to use from the reference images.

Return type

(tuple)

get_conv_weights(x, i)[source]

Adaptively generate weights for layer i in main branch convolutions.

Parameters
  • x (NxCxHxW tensor) – Input features.

  • i (int) – Layer index.

Returns

  • conv_weights (list of tensors): Weights for the conv layers in the main branch.

Return type

(tuple)

get_norm_weights(x, i)[source]

Adaptively generate weights for SPADE in layer i of generator.

Parameters
  • x (NxCxHxW tensor) – Input features.

  • i (int) – Layer index.

Returns

  • embedding_weights (list of tensors): Weights for the label embedding network.

  • norm_weights (list of tensors): Weights for the SPADE layers.

Return type

(tuple)

reset()[source]

Reset the network at the beginning of a sequence.

training = None
class imaginaire.generators.fs_vid2vid.WeightReshaper[source]

Bases: object

Handles all weight reshape related tasks.

reshape_embed_input(x)[source]

Reshape input to be (B x C) X H X W.

Parameters

x (tensor or list of tensors) – Input features.

Returns

Reshaped features.

Return type

x (tensor or list of tensors)

reshape_weight(x, weight_shape)[source]

Reshape input x to the desired weight shape.

Parameters
  • x (tensor or list of tensors) – Input features.

  • weight_shape (list of int) – Desired shape of the weight.

Returns

  • weight (tensor): Network weights

  • bias (tensor): Network bias.

Return type

(tuple)

split_weights(weight, sizes)[source]

When the desired shape is a list, first divide the input to each corresponding weight shape in the list.

Parameters
  • weight (tensor) – Input weight.

  • sizes (int or list of int) – Target sizes.

Returns

Divided weights.

Return type

weight (list of tensors)

sum(x)[source]

Sum all elements recursively in a nested list.

Parameters

x (nested list of int) – Input list of elements.

Returns

Sum of all elements.

Return type

out (int)

sum_mul(x)[source]

Given a weight shape, compute the number of elements needed for weight + bias. If input is a list of shapes, sum all the elements.

Parameters

x (list of int) – Input list of elements.

Returns

Summed number of elements.

Return type

out (int or list of int)

imaginaire.generators.funit module

class imaginaire.generators.funit.ContentEncoder(num_downsamples, num_res_blocks, image_channels, num_filters, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]

Bases: torch.nn.modules.module.Module

Improved FUNIT Content Encoder. This is basically the same as the original FUNIT content encoder.

Parameters
  • num_downsamples (int) – Number of times we reduce resolution by 2x2.

  • num_res_blocks (int) – Number of times we append residual block after all the downsampling modules.

  • image_channels (int) – Number of input image channels.

  • num_filters (int) – Base filter number.

  • padding_mode (str) – Padding mode

  • activation_norm_type (str) – Type of activation normalization.

  • weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.

  • nonlinearity (str) – Nonlinearity.

forward(x)[source]
Parameters

x (tensor) – Input image.

training = None
class imaginaire.generators.funit.Decoder(num_enc_output_channels, style_channels, num_image_channels=3, num_upsamples=4, padding_type='reflect', weight_norm_type='none', nonlinearity='relu')[source]

Bases: torch.nn.modules.module.Module

Improved FUNIT decoder.

Parameters
  • num_enc_output_channels (int) – Number of content feature channels.

  • style_channels (int) – Dimension of the style code.

  • num_image_channels (int) – Number of image channels.

  • num_upsamples (int) – How many times we are going to apply upsample residual block.

forward(x, style)[source]
Parameters
  • x (tensor) – Content embedding of the content image.

  • style (tensor) – Style embedding of the style image.

training = None
class imaginaire.generators.funit.FUNITTranslator(num_filters=64, num_filters_mlp=256, style_dims=64, num_res_blocks=2, num_mlp_blocks=3, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, weight_norm_type='', **kwargs)[source]

Bases: torch.nn.modules.module.Module

Parameters
  • num_filters (int) – Base filter numbers.

  • num_filters_mlp (int) – Base filter number in the MLP module.

  • style_dims (int) – Dimension of the style code.

  • num_res_blocks (int) – Number of residual blocks at the end of the content encoder.

  • num_mlp_blocks (int) – Number of layers in the MLP module.

  • num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.

  • num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.

  • num_image_channels (int) – Number of input image channels.

  • weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.

decode(content, style)[source]

Generate images by combining their content and style codes.

Parameters
  • content (tensor) – Content code tensor.

  • style (tensor) – Style code tensor.

encode(images)[source]

Encoder images to get their content and style codes.

Parameters

images (tensor) – Input image tensor.

forward(images)[source]

Reconstruct the input image by combining the computer content and style code.

Parameters

images (tensor) – Input image tensor.

training = None
class imaginaire.generators.funit.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Generator of the improved FUNIT baseline in the COCO-FUNIT paper.

forward(data)[source]

In the FUNIT’s forward pass, it generates a content embedding and a style code from the content image, and a style code from the style image. By mixing the content code and the style code from the content image, we reconstruct the input image. By mixing the content code and the style code from the style image, we have a translation output.

Parameters

data (dict) – Training data at the current iteration.

inference(data, keep_original_size=True)[source]

COCO-FUNIT inference.

Parameters
  • data (dict) – Training data at the current iteration. - images_content (tensor): Content images. - images_style (tensor): Style images.

  • a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.

  • keep_original_size (bool) – If True, output image is resized

  • the input content image size. (to) –

training = None
class imaginaire.generators.funit.MLP(input_dim, output_dim, latent_dim, num_layers, activation_norm_type, nonlinearity)[source]

Bases: torch.nn.modules.module.Module

Improved FUNIT style decoder.

Parameters
  • input_dim (int) – Input dimension (style code dimension).

  • output_dim (int) – Output dimension (to be fed into the AdaIN layer).

  • latent_dim (int) – Latent dimension.

  • num_layers (int) – Number of layers in the MLP.

  • activation_norm_type (str) – Activation type.

  • nonlinearity (str) – Nonlinearity type.

forward(x)[source]
Parameters

x (tensor) – Input tensor.

training = None
class imaginaire.generators.funit.StyleEncoder(num_downsamples, image_channels, num_filters, style_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]

Bases: torch.nn.modules.module.Module

Improved FUNIT Style Encoder. This is basically the same as the original FUNIT Style Encoder.

Parameters
  • num_downsamples (int) – Number of times we reduce resolution by 2x2.

  • image_channels (int) – Number of input image channels.

  • num_filters (int) – Base filter number.

  • style_channels (int) – Style code dimension.

  • padding_mode (str) – Padding mode.

  • activation_norm_type (str) – Type of activation normalization.

  • weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.

  • nonlinearity (str) – Nonlinearity.

forward(x)[source]
Parameters

x (tensor) – Input image.

training = None

imaginaire.generators.gancraft module

class imaginaire.generators.gancraft.Generator(gen_cfg, data_cfg)[source]

Bases: imaginaire.generators.gancraft_base.Base3DGenerator

GANcraft generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

custom_init()[source]

Weight initialization of GANcraft components.

forward(data, random_style=False)[source]

GANcraft Generator forward.

Parameters
  • data (dict) – images (N x 3 x H x W tensor) : Real images voxel_id (N x H x W x max_samples x 1 tensor): IDs of intersected tensors along each ray. depth2 (N x 2 x H x W x max_samples x 1 tensor): Depths of entrance and exit points for each ray-voxel intersection. raydirs (N x H x W x 1 x 3 tensor): The direction of each ray. cam_ori_t (N x 3 tensor): Camera origins.

  • random_style (bool) – Whether to sample a random style vector.

Returns

fake_images (N x 3 x H x W tensor): fake images mu (N x C1 tensor): mean vectors logvar (N x C1 tensor): log-variance vectors

Return type

output (dict)

get_pseudo_gt(pseudo_gen, voxel_id, z=None, style_img=None, resize_512=True, deterministic=False)[source]

Evaluating img2img network to obtain pseudo-ground truth images.

Parameters
  • pseudo_gen (callable) – Function converting mask to image using img2img network.

  • voxel_id (N x img_dims[0] x img_dims[1] x max_samples x 1 tensor) – IDs of intersected tensors along

  • ray. (each) –

  • z (N x C tensor) – Optional style code passed to pseudo_gen.

  • style_img (N x 3 x H x W tensor) – Optional style image passed to pseudo_gen.

  • resize_512 (bool) – If True, evaluate pseudo_gen at 512x512 regardless of input resolution.

  • deterministic (bool) – If True, disable stochastic label mapping.

inference(output_dir, camera_mode, style_img_path=None, seed=1, pad=30, num_samples=40, num_blocks_early_stop=6, sample_depth=3, tile_size=128, resolution_hw=[540, 960], cam_ang=72, cam_maxstep=10)[source]

Compute result images according to the provided camera trajectory and save the results in the specified folder. The full image is evaluated in multiple tiles to save memory.

Parameters
  • output_dir (str) – Where should the results be stored.

  • camera_mode (int) – Which camera trajectory to use.

  • style_img_path (str) – Path to the style-conditioning image.

  • seed (int) – Random seed (controls style when style_image_path is not specified).

  • pad (int) – Pixels to remove from the image tiles before stitching. Should be equal or larger than the

  • field of the CNN to avoid border artifact. (receptive) –

  • num_samples (int) – Number of samples per ray (different from training).

  • num_blocks_early_stop (int) – Max number of intersected boxes per ray before stopping

  • from training) ((different) –

  • sample_depth (float) – Max distance traveled through boxes before stopping (different from training).

  • tile_size (int) – Max size of a tile in pixels.

  • resolution_hw (list [H, W]) – Resolution of the output image.

  • cam_ang (float) – Horizontal FOV of the camera (may be adjusted by the camera controller).

  • cam_maxstep (int) – Number of frames sampled from the camera trajectory.

sample_camera(data, pseudo_gen)[source]

Sample camera randomly and precompute everything used by both Gen and Dis.

Parameters
  • data (dict) – images (N x 3 x H x W tensor) : Real images label (N x C2 x H x W tensor) : Segmentation map

  • pseudo_gen (callable) – Function converting mask to image using img2img network.

Returns

voxel_id (N x H x W x max_samples x 1 tensor): IDs of intersected tensors along each ray. depth2 (N x 2 x H x W x max_samples x 1 tensor): Depths of entrance and exit points for each ray-voxel intersection. raydirs (N x H x W x 1 x 3 tensor): The direction of each ray. cam_ori_t (N x 3 tensor): Camera origins. pseudo_real_img (N x 3 x H x W tensor): Pseudo-ground truth image. real_masks (N x C3 x H x W tensor): One-hot segmentation map for real images, with translated labels. fake_masks (N x C3 x H x W tensor): One-hot segmentation map for sampled camera views.

Return type

ret (dict)

training = None

imaginaire.generators.gancraft_base module

class imaginaire.generators.gancraft_base.Base3DGenerator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Minecraft 3D generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

get_param_groups(cfg_opt)[source]
training = None
class imaginaire.generators.gancraft_base.RenderCNN(in_channels, style_dim, hidden_channels=256, leaky_relu=True)[source]

Bases: torch.nn.modules.module.Module

CNN converting intermediate feature map to final image.

forward(x, z)[source]

Forward network.

Parameters
  • x (N x in_channels x H x W tensor) – Intermediate feature map

  • z (N x style_dim tensor) – Style codes.

modulate(x, w, b)[source]
training = None
class imaginaire.generators.gancraft_base.RenderMLP(in_channels, style_dim, viewdir_dim, mask_dim=680, out_channels_s=1, out_channels_c=3, hidden_channels=256, use_seg=True)[source]

Bases: torch.nn.modules.module.Module

MLP with affine modulation.

forward(x, raydir, z, m)[source]

Forward network

Parameters
  • x (N x H x W x M x in_channels tensor) – Projected features.

  • raydir (N x H x W x 1 x viewdir_dim tensor) – Ray directions.

  • z (N x style_dim tensor) – Style codes.

  • m (N x H x W x M x mask_dim tensor) – One-hot segmentation maps.

training = None
class imaginaire.generators.gancraft_base.SKYMLP(in_channels, style_dim, out_channels_c=3, hidden_channels=256, leaky_relu=True)[source]

Bases: torch.nn.modules.module.Module

MLP converting ray directions to sky features.

forward(x, z)[source]

Forward network

Parameters
  • x (.. x in_channels tensor) – Ray direction embeddings.

  • z (.. x style_dim tensor) – Style codes.

training = None
class imaginaire.generators.gancraft_base.StyleEncoder(style_enc_cfg)[source]

Bases: torch.nn.modules.module.Module

Style Encoder constructor.

Parameters

style_enc_cfg (obj) – Style encoder definition file.

forward(input_x)[source]

SPADE Style Encoder forward.

Parameters

input_x (N x 3 x H x W tensor) – input images.

Returns

Mean vectors. logvar (N x C tensor): Log-variance vectors. z (N x C tensor): Style code vectors.

Return type

mu (N x C tensor)

training = None
class imaginaire.generators.gancraft_base.StyleMLP(style_dim, out_dim, hidden_channels=256, leaky_relu=True, num_layers=5, normalize_input=True, output_act=True)[source]

Bases: torch.nn.modules.module.Module

MLP converting style code to intermediate style representation.

forward(z)[source]

Forward network

Parameters

z (N x style_dim tensor) – Style codes.

training = None

imaginaire.generators.munit module

class imaginaire.generators.munit.AutoEncoder(num_filters=64, max_num_filters=256, num_filters_mlp=256, latent_dim=8, num_res_blocks=4, num_mlp_blocks=2, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, content_norm_type='instance', style_norm_type='', decoder_norm_type='instance', weight_norm_type='', decoder_norm_params=namespace(affine=False), output_nonlinearity='', pre_act=False, apply_noise=False, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Improved MUNIT autoencoder.

Parameters
  • num_filters (int) – Base filter numbers.

  • max_num_filters (int) – Maximum number of filters in the encoder.

  • num_filters_mlp (int) – Base filter number in the MLP module.

  • latent_dim (int) – Dimension of the style code.

  • num_res_blocks (int) – Number of residual blocks at the end of the content encoder.

  • num_mlp_blocks (int) – Number of layers in the MLP module.

  • num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.

  • num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.

  • num_image_channels (int) – Number of input image channels.

  • content_norm_type (str) – Type of activation normalization in the content encoder.

  • style_norm_type (str) – Type of activation normalization in the style encoder.

  • decoder_norm_type (str) – Type of activation normalization in the decoder.

  • weight_norm_type (str) – Type of weight normalization.

  • decoder_norm_params (obj) – Parameters of activation normalization in the decoder. If not None, decoder_norm_params.__dict__ will be used as keyword arguments when initializing activation normalization.

  • output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.

  • pre_act (bool) – If True, uses pre-activation residual blocks.

  • apply_noise (bool) – If True, injects Gaussian noise in the decoder.

decode(content, style)[source]

Decode content and style code to an image.

Parameters
  • content (Tensor) – Content code.

  • style (Tensor) – Style code.

Returns

Output images.

Return type

images (Tensor)

encode(images)[source]

Encode an image to content and style code.

Parameters

images (Tensor) – Input images.

Returns

  • content (Tensor): Content code.

  • style (Tensor): Style code.

Return type

(tuple)

forward(images)[source]

Reconstruct an image.

Parameters

images (Tensor) – Input images.

Returns

Reconstructed images.

Return type

images_recon (Tensor)

training = None
class imaginaire.generators.munit.Decoder(num_upsamples, num_res_blocks, num_filters, num_image_channels, style_channels, padding_mode, activation_norm_type, activation_norm_params, weight_norm_type, nonlinearity, output_nonlinearity, pre_act=False, apply_noise=False)[source]

Bases: torch.nn.modules.module.Module

Improved MUNIT decoder. The network consists of

  • $(num_res_blocks) residual blocks.

  • $(num_upsamples) residual blocks or convolutional blocks

  • output layer.

Parameters
  • num_upsamples (int) – Number of times we increase resolution by 2x2.

  • num_res_blocks (int) – Number of residual blocks.

  • num_filters (int) – Base filter numbers.

  • num_image_channels (int) – Number of input image channels.

  • style_channels (int) – Dimension of the style code.

  • padding_mode (string) – Type of padding.

  • activation_norm_type (str) – Type of activation normalization.

  • activation_norm_params (obj) – Parameters of activation normalization. If not None, decoder_norm_params.__dict__ will be used as keyword arguments when initializing activation normalization.

  • weight_norm_type (str) – Type of weight normalization.

  • nonlinearity (str) – Type of nonlinear activation function.

  • output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.

  • pre_act (bool) – If True, uses pre-activation residual blocks.

  • apply_noise (bool) – If True, injects Gaussian noise.

forward(x, style)[source]
Parameters
  • x (tensor) – Content embedding of the content image.

  • style (tensor) – Style embedding of the style image.

training = None
class imaginaire.generators.munit.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Improved MUNIT generator.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

forward(data, random_style=True, image_recon=True, latent_recon=True, cycle_recon=True, within_latent_recon=False)[source]

In MUNIT’s forward pass, it generates a content code and a style code from images in both domain. It then performs a within-domain reconstruction step and a cross-domain translation step. In within-domain reconstruction, it reconstructs an image using the content and style from the same image and optionally encodes the image back to the latent space. In cross-domain translation, it generates an translated image by mixing the content and style from images in different domains, and optionally encodes the image back to the latent space.

Parameters
  • data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.

  • random_style (bool) – If True, samples the style code from the prior distribution, otherwise uses the style code encoded from the input images in the other domain.

  • image_recon (bool) – If True, also returns reconstructed images.

  • latent_recon (bool) – If True, also returns reconstructed latent code during cross-domain translation.

  • cycle_recon (bool) – If True, also returns cycle reconstructed images.

  • within_latent_recon (bool) – If True, also returns reconstructed latent code during within-domain reconstruction.

inference(data, a2b=True, random_style=True)[source]

MUNIT inference.

Parameters
  • data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.

  • a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.

  • random_style (bool) – If True, samples the style code from the prior distribution, otherwise uses the style code encoded from the input images in the other domain.

training = None
class imaginaire.generators.munit.MLP(input_dim, output_dim, latent_dim, num_layers, norm, nonlinearity)[source]

Bases: torch.nn.modules.module.Module

The multi-layer perceptron (MLP) that maps Gaussian style code to a feature vector that is given as the conditional input to AdaIN.

Parameters
  • input_dim (int) – Number of channels in the input tensor.

  • output_dim (int) – Number of channels in the output tensor.

  • latent_dim (int) – Number of channels in the latent features.

  • num_layers (int) – Number of layers in the MLP.

  • norm (str) – Type of activation normalization.

  • nonlinearity (str) – Type of nonlinear activation function.

forward(x)[source]
Parameters

x (tensor) – Input image.

training = None
class imaginaire.generators.munit.StyleEncoder(num_downsamples, num_image_channels, num_filters, style_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]

Bases: torch.nn.modules.module.Module

MUNIT style encoder.

Parameters
  • num_downsamples (int) – Number of times we reduce resolution by 2x2.

  • num_image_channels (int) – Number of input image channels.

  • num_filters (int) – Base filter numbers.

  • style_channels (int) – Dimension of the style code.

  • padding_mode (string) – Type of padding.

  • activation_norm_type (str) – Type of activation normalization.

  • weight_norm_type (str) – Type of weight normalization.

  • nonlinearity (str) – Type of nonlinear activation function.

forward(x)[source]
Parameters

x (tensor) – Input image.

training = None

imaginaire.generators.pix2pixHD module

class imaginaire.generators.pix2pixHD.Encoder(enc_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Encoder for getting region-wise features for style control.

Parameters
  • enc_cfg (obj) – Encoder definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file

forward(input, instance_map)[source]

Extracting region-wise features

Parameters
  • input (4D tensor) – Real RGB images.

  • instance_map (4D tensor) – Instance label mask.

Returns

Instance-wise average-pooled

feature maps.

Return type

outputs_mean (4D tensor)

training = None
class imaginaire.generators.pix2pixHD.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Pix2pixHD coarse-to-fine generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

forward(data, random_style=False)[source]

Coarse-to-fine generator forward.

Parameters
  • data (dict) – Dictionary of input data.

  • random_style (bool) – Always set to false for the pix2pixHD model.

Returns

Dictionary of output data.

Return type

output (dict)

inference(data, **kwargs)[source]

Generator inference.

Parameters

data (dict) – Dictionary of input data.

Returns

Output fake images. file_names (str): Data file name.

Return type

fake_images (tensor)

load_pretrained_network(pretrained_dict)[source]

Load a pretrained network.

training = None
class imaginaire.generators.pix2pixHD.GlobalGenerator(gen_cfg, data_cfg, num_input_channels, padding_mode, base_conv_block, base_res_block)[source]

Bases: torch.nn.modules.module.Module

Coarse generator constructor. This is the main generator in the pix2pixHD architecture.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

  • num_input_channels (int) – Number of segmentation labels.

  • padding_mode (str) – zero | reflect | …

  • base_conv_block (obj) – Conv block with preset attributes.

  • base_res_block (obj) – Residual block with preset attributes.

forward(input)[source]

Coarse-to-fine generator forward.

Parameters

input (4D tensor) – Input semantic representations.

Returns

Synthesized image by generator.

Return type

output (4D tensor)

training = None
class imaginaire.generators.pix2pixHD.LocalEnhancer(gen_cfg, data_cfg, num_input_channels, num_filters, padding_mode, base_conv_block, base_res_block, output_img=False)[source]

Bases: torch.nn.modules.module.Module

Local enhancer constructor. These are sub-networks that are useful when aiming to produce high-resolution outputs.

Parameters
  • gen_cfg (obj) – local generator definition part of the yaml config

  • file.

  • data_cfg (obj) – Data definition part of the yaml config file.

  • num_input_channels (int) – Number of segmentation labels.

  • num_filters (int) – Number of filters for the first layer.

  • padding_mode (str) – zero | reflect | …

  • base_conv_block (obj) – Conv block with preset attributes.

  • base_res_block (obj) – Residual block with preset attributes.

  • output_img (bool) – Output is image or feature map.

forward(output_coarse, input_fine)[source]

Local enhancer forward.

Parameters
  • output_coarse (4D tensor) – Coarse output from previous layer.

  • input_fine (4D tensor) – Fine input from current layer.

Returns

Refined output.

Return type

output (4D tensor)

training = None

imaginaire.generators.spade module

class imaginaire.generators.spade.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

SPADE generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

forward(data, random_style=False)[source]

SPADE Generator forward.

Parameters

data (dict) –

  • images (N x C1 x H x W tensor) : Ground truth images

  • label (N x C2 x H x W tensor) : Semantic representations

  • z (N x style_dims tensor): Gaussian random noise

  • random_style (bool): Whether to sample a random style vector.

Returns

  • fake_images (N x 3 x H x W tensor): fake images

  • mu (N x C1 tensor): mean vectors

  • logvar (N x C1 tensor): log-variance vectors

Return type

(dict)

inference(data, random_style=False, use_fixed_random_style=False, keep_original_size=False)[source]

Compute results images for a batch of input data and save the results in the specified folder.

Parameters
  • data (dict) –

    • images (N x C1 x H x W tensor) : Ground truth images

    • label (N x C2 x H x W tensor) : Semantic representations

    • z (N x style_dims tensor): Gaussian random noise

  • random_style (bool) – Whether to sample a random style vector.

  • use_fixed_random_style (bool) – Sample random style once and use it for all the remaining inference.

  • keep_original_size (bool) – Keep original size of the input.

Returns

  • fake_images (N x 3 x H x W tensor): fake images

  • mu (N x C1 tensor): mean vectors

  • logvar (N x C1 tensor): log-variance vectors

Return type

(dict)

training = None
class imaginaire.generators.spade.SPADEGenerator(num_labels, out_image_small_side_size, image_channels, num_filters, kernel_size, style_dims, activation_norm_params, weight_norm_type, global_adaptive_norm_type, skip_activation_norm, use_posenc_in_input_layer, use_style_encoder, output_multiplier)[source]

Bases: torch.nn.modules.module.Module

SPADE Image Generator constructor.

Parameters
  • num_labels (int) – Number of different labels.

  • out_image_small_side_size (int) – min(width, height)

  • image_channels (int) – Num. of channels of the output image.

  • num_filters (int) – Base filter numbers.

  • kernel_size (int) – Convolution kernel size.

  • style_dims (int) – Dimensions of the style code.

  • activation_norm_params (obj) – Spatially adaptive normalization param.

  • weight_norm_type (str) – Type of weight normalization. 'none', 'spectral', or 'weight'.

  • global_adaptive_norm_type (str) – Type of normalization in SPADE.

  • skip_activation_norm (bool) – If True, applies activation norm to the shortcut connection in residual blocks.

  • use_style_encoder (bool) – Whether to use global adaptive norm like conditional batch norm or adaptive instance norm.

  • output_multiplier (float) – A positive number multiplied to the output

forward(data)[source]

SPADE Generator forward.

Parameters

data (dict) –

  • data (N x C1 x H x W tensor) : Ground truth images.

  • label (N x C2 x H x W tensor) : Semantic representations.

  • z (N x style_dims tensor): Gaussian random noise.

Returns

  • fake_images (N x 3 x H x W tensor): Fake images.

Return type

output (dict)

training = None
class imaginaire.generators.spade.StyleEncoder(style_enc_cfg)[source]

Bases: torch.nn.modules.module.Module

Style Encode constructor.

Parameters

style_enc_cfg (obj) – Style encoder definition file.

forward(input_x)[source]

SPADE Style Encoder forward.

Parameters

input_x (N x 3 x H x W tensor) – input images.

Returns

  • mu (N x C tensor): Mean vectors.

  • logvar (N x C tensor): Log-variance vectors.

  • z (N x C tensor): Style code vectors.

Return type

(tuple)

training = None

imaginaire.generators.unit module

class imaginaire.generators.unit.AutoEncoder(num_filters=64, max_num_filters=256, num_res_blocks=4, num_downsamples_content=2, num_image_channels=3, content_norm_type='instance', decoder_norm_type='instance', weight_norm_type='', output_nonlinearity='', pre_act=False, apply_noise=False, **kwargs)[source]

Bases: torch.nn.modules.module.Module

Improved UNIT autoencoder.

Parameters
  • num_filters (int) – Base filter numbers.

  • max_num_filters (int) – Maximum number of filters in the encoder.

  • num_res_blocks (int) – Number of residual blocks at the end of the content encoder.

  • num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.

  • num_image_channels (int) – Number of input image channels.

  • content_norm_type (str) – Type of activation normalization in the content encoder.

  • decoder_norm_type (str) – Type of activation normalization in the decoder.

  • weight_norm_type (str) – Type of weight normalization.

  • output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.

  • pre_act (bool) – If True, uses pre-activation residual blocks.

  • apply_noise (bool) – If True, injects Gaussian noise in the decoder.

forward(images)[source]

Reconstruct an image.

Parameters

images (Tensor) – Input images.

Returns

Reconstructed images.

Return type

images_recon (Tensor)

training = None
class imaginaire.generators.unit.ContentEncoder(num_downsamples, num_res_blocks, num_image_channels, num_filters, max_num_filters, padding_mode, activation_norm_type, weight_norm_type, nonlinearity, pre_act=False)[source]

Bases: torch.nn.modules.module.Module

Improved UNIT encoder. The network consists of:

  • input layers

  • $(num_downsamples) convolutional blocks

  • $(num_res_blocks) residual blocks.

  • output layer.

Parameters
  • num_downsamples (int) – Number of times we reduce resolution by 2x2.

  • num_res_blocks (int) – Number of residual blocks at the end of the content encoder.

  • num_image_channels (int) – Number of input image channels.

  • num_filters (int) – Base filter numbers.

  • max_num_filters (int) – Maximum number of filters in the encoder.

  • padding_mode (string) – Type of padding.

  • activation_norm_type (str) – Type of activation normalization.

  • weight_norm_type (str) – Type of weight normalization.

  • nonlinearity (str) – Type of nonlinear activation function.

  • pre_act (bool) – If True, uses pre-activation residual blocks.

forward(x)[source]
Parameters

x (tensor) – Input image.

training = None
class imaginaire.generators.unit.Decoder(num_upsamples, num_res_blocks, num_filters, num_image_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity, output_nonlinearity, pre_act=False, apply_noise=False)[source]

Bases: torch.nn.modules.module.Module

Improved UNIT decoder. The network consists of:

  • $(num_res_blocks) residual blocks.

  • $(num_upsamples) residual blocks or convolutional blocks

  • output layer.

Parameters
  • num_upsamples (int) – Number of times we increase resolution by 2x2.

  • num_res_blocks (int) – Number of residual blocks.

  • num_filters (int) – Base filter numbers.

  • num_image_channels (int) – Number of input image channels.

  • padding_mode (string) – Type of padding.

  • activation_norm_type (str) – Type of activation normalization.

  • weight_norm_type (str) – Type of weight normalization.

  • nonlinearity (str) – Type of nonlinear activation function.

  • output_nonlinearity (str) – Type of nonlinearity before final output, 'tanh' or 'none'.

  • pre_act (bool) – If True, uses pre-activation residual blocks.

  • apply_noise (bool) – If True, injects Gaussian noise.

forward(x)[source]
Parameters

x (tensor) – Content embedding of the content image.

training = None
class imaginaire.generators.unit.Generator(gen_cfg, data_cfg)[source]

Bases: torch.nn.modules.module.Module

Improved UNIT generator.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

forward(data, image_recon=True, cycle_recon=True)[source]

UNIT forward function

inference(data, a2b=True)[source]

UNIT inference.

Parameters
  • data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.

  • a2b (bool) – If True, translates images from domain A to B, otherwise from B to A.

training = None

imaginaire.generators.vid2vid module

class imaginaire.generators.vid2vid.BaseNetwork[source]

Bases: torch.nn.modules.module.Module

vid2vid generator.

get_num_filters(num_downsamples)[source]

Get the number of filters at current layer.

Parameters

num_downsamples (int) – How many downsamples at current layer.

Returns

Number of filters.

Return type

output (int)

training = None
class imaginaire.generators.vid2vid.FlowGenerator(flow_cfg, data_cfg)[source]

Bases: imaginaire.generators.vid2vid.BaseNetwork

Flow generator constructor.

Parameters
  • flow_cfg (obj) – Flow definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

forward(label, img_prev)[source]

Flow generator forward.

Parameters
  • label (4D tensor) – Input label tensor.

  • img_prev (4D tensor) – Previously generated image tensors.

Returns

  • flow (4D tensor) : Generated flow map.

  • mask (4D tensor) : Generated occlusion mask.

Return type

(tuple)

training = None
class imaginaire.generators.vid2vid.Generator(gen_cfg, data_cfg)[source]

Bases: imaginaire.generators.vid2vid.BaseNetwork

vid2vid generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file.

forward(data)[source]

vid2vid generator forward.

Parameters

data (dict) – Dictionary of input data.

Returns

Dictionary of output data.

Return type

output (dict)

get_cond_dims(num_downs=0)[source]

Get the dimensions of conditional inputs.

Parameters

num_downs (int) – How many downsamples at current layer.

Returns

List of dimensions.

Return type

ch (list)

get_cond_maps(label, embedder)[source]

Get the conditional inputs.

Parameters
  • label (4D tensor) – Input label tensor.

  • embedder (obj) – Embedding network.

Returns

List of conditional inputs.

Return type

cond_maps (list)

init_temporal_network(cfg_init=None)[source]

When starting training multiple frames, initialize the downsampling network and flow network.

Parameters

cfg_init (dict) – Weight initialization config.

one_up_conv_layer(x, encoded_label, i)[source]

One residual block layer in the main branch.

Parameters
  • x (4D tensor) – Current feature map.

  • encoded_label (list of tensors) – Encoded input label maps.

  • i (int) – Layer index.

Returns

Output feature map.

Return type

x (4D tensor)

training = None

imaginaire.generators.wc_vid2vid module

class imaginaire.generators.wc_vid2vid.Generator(gen_cfg, data_cfg)[source]

Bases: imaginaire.generators.vid2vid.Generator

world consistent vid2vid generator constructor.

Parameters
  • gen_cfg (obj) – Generator definition part of the yaml config file.

  • data_cfg (obj) – Data definition part of the yaml config file

forward(data)[source]

vid2vid generator forward. :param data: Dictionary of input data. :type data: dict

Returns

Dictionary of output data.

Return type

output (dict)

get_cond_dims(num_downs=0)[source]

Get the dimensions of conditional inputs. :param num_downs: How many downsamples at current layer. :type num_downs: int

Returns

List of dimensions.

Return type

ch (list)

get_cond_maps(label, embedder)[source]

Get the conditional inputs. :param label: Input label tensor. :type label: 4D tensor :param embedder: Embedding network. :type embedder: obj

Returns

List of conditional inputs.

Return type

cond_maps (list)

get_guidance_images_and_masks(unprojection)[source]

Do stuff.

get_partial(num_downs=0)[source]

Get if convs should be partial or not. :param num_downs: How many downsamples at current layer. :type num_downs: int

Returns

List of boolean partial or not.

Return type

partial (list)

renderer_update_point_cloud(image, point_info)[source]

Update the renderer’s color dictionary.

reset_renderer(is_flipped_input=False)[source]

Reset the renderer. :param is_flipped_input: Is the input sequence left-right flipped? :type is_flipped_input: bool

training = None

Module contents