imaginaire.generators package¶
Submodules¶
imaginaire.generators.coco_funit module¶
-
class
imaginaire.generators.coco_funit.
COCOFUNITTranslator
(num_filters=64, num_filters_mlp=256, style_dims=64, usb_dims=1024, num_res_blocks=2, num_mlp_blocks=3, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, weight_norm_type='', **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
COCO-FUNIT Generator architecture.
- Parameters
num_filters (int) – Base filter numbers.
num_filters_mlp (int) – Base filter number in the MLP module.
style_dims (int) – Dimension of the style code.
usb_dims (int) – Dimension of the universal style bias code.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_mlp_blocks (int) – Number of layers in the MLP module.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.
num_image_channels (int) – Number of input image channels.
weight_norm_type (str) – Type of weight normalization.
'none'
,'spectral'
, or'weight'
.
-
decode
(content, style)[source]¶ Generate images by combining their content and style codes.
- Parameters
content (tensor) – Content code tensor.
style (tensor) – Style code tensor.
-
encode
(images)[source]¶ Encoder images to get their content and style codes.
- Parameters
images (tensor) – Input image tensor.
-
forward
(images)[source]¶ Reconstruct the input image by combining the computer content and style code.
- Parameters
images (tensor) – Input image tensor.
-
training
= None¶
-
class
imaginaire.generators.coco_funit.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
COCO-FUNIT Generator.
-
forward
(data)[source]¶ In the FUNIT’s forward pass, it generates a content embedding and a style code from the content image, and a style code from the style image. By mixing the content code and the style code from the content image, we reconstruct the input image. By mixing the content code and the style code from the style image, we have a translation output.
- Parameters
data (dict) – Training data at the current iteration.
-
inference
(data, keep_original_size=True)[source]¶ COCO-FUNIT inference.
- Parameters
data (dict) – Training data at the current iteration. - images_content (tensor): Content images. - images_style (tensor): Style images.
a2b (bool) – If
True
, translates images from domain A to B, otherwise from B to A.keep_original_size (bool) – If
True
, output image is resizedthe input content image size. (to) –
-
training
= None¶
-
imaginaire.generators.dummy module¶
imaginaire.generators.fs_vid2vid module¶
-
class
imaginaire.generators.fs_vid2vid.
AttentionModule
(atn_cfg, data_cfg, conv_2d_block, num_filters_each_layer)[source]¶ Bases:
torch.nn.modules.module.Module
Attention module constructor.
- Parameters
atn_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file
conv_2d_block – Conv2DBlock constructor.
num_filters_each_layer (int) – The number of filters in each layer.
-
attention_encode
(img, net_name)[source]¶ Encode the input image to get the attention map.
- Parameters
img (NxCxHxW tensor) – Input image.
net_name (str) – Name for attention network.
- Returns
Encoded feature.
- Return type
x (NxC2xH2xW2 tensor)
-
forward
(in_features, label, ref_label, attention=None)[source]¶ Get the attention map to combine multiple image features in the case of multiple reference images.
- Parameters
in_features ((NxK)xC1xH1xW1 tensor) – Input feaures.
label (NxC2xH2xW2 tensor) – Target label.
ref_label (NxC2xH2xW2 tensor) – Reference label.
attention (Nx(KxH1xW1)x(H1xW1) tensor) – Attention maps.
- Returns
out_features (NxC1xH1xW1 tensor): Attention-combined features.
attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.
atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.
- Return type
(tuple)
-
training
= None¶
-
class
imaginaire.generators.fs_vid2vid.
FlowGenerator
(flow_cfg, data_cfg, num_frames)[source]¶ Bases:
torch.nn.modules.module.Module
flow generator constructor.
- Parameters
flow_cfg (obj) – Flow definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
num_frames (int) – Number of input frames.
-
forward
(label, ref_label, ref_image)[source]¶ Flow generator forward.
- Parameters
label (4D tensor) – Input label tensor.
ref_label (4D tensor) – Reference label tensors.
ref_image (4D tensor) – Reference image tensors.
- Returns
flow (4D tensor) : Generated flow map.
mask (4D tensor) : Generated occlusion mask.
- Return type
(tuple)
-
training
= None¶
-
class
imaginaire.generators.fs_vid2vid.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Few-shot vid2vid generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
SPADE_combine
(encoded_label, cond_inputs)[source]¶ Using Multi-SPADE to combine raw synthesized image with warped images.
- Parameters
encoded_label (list of tensors) – Original label map embeddings.
cond_inputs (list of tensors) – New SPADE conditional inputs from the warped images.
- Returns
Combined conditional inputs.
- Return type
encoded_label (list of tensors)
-
custom_init
()[source]¶ This function is for dealing with the numerical issue that might occur when doing mixed precision training.
-
flow_generation
(label, ref_labels, ref_images, prev_labels, prev_images, ref_idx)[source]¶ Generates flows and masks for warping reference / previous images.
- Parameters
label (NxCxHxW tensor) – Target label map.
ref_labels (NxKxCxHxW tensor) – Reference label maps.
ref_images (NxKx3xHxW tensor) – Reference images.
prev_labels (NxTxCxHxW tensor) – Previous label maps.
prev_images (NxTx3xHxW tensor) – Previous images.
ref_idx (Nx1 tensor) – Index for which image to use from the
images. (reference) –
- Returns
flow (list of Nx2xHxW tensor): Optical flows.
occ_mask (list of Nx1xHxW tensor): Occlusion masks.
img_warp (list of Nx3xHxW tensor): Warped reference / previous images.
cond_inputs (list of Nx4xHxW tensor): Conditional inputs for SPADE combination.
- Return type
(tuple)
-
forward
(data)[source]¶ few-shot vid2vid generator forward.
- Parameters
data (dict) – Dictionary of input data.
- Returns
Dictionary of output data.
- Return type
output (dict)
-
init_network_weights
(net_src, net_dst)[source]¶ Initialize weights in net_dst with those in net_src.
-
init_temporal_network
(cfg_init=None)[source]¶ When starting training multiple frames, initialize the flow network.
- Parameters
cfg_init (dict) – Weight initialization config.
-
load_pretrained_network
(pretrained_dict, prefix='module.')[source]¶ Load the pretrained network into self network.
- Parameters
pretrained_dict (dict) – Pretrained network weights.
prefix (str) – Prefix to the network weights name.
-
one_up_conv_layer
(x, encoded_label, conv_weight, norm_weight, i)[source]¶ One residual block layer in the main branch.
- Parameters
x (4D tensor) – Current feature map.
encoded_label (list of tensors) – Encoded input label maps.
conv_weight (list of tensors) – Hyper conv weights.
norm_weight (list of tensors) – Hyper norm weights.
i (int) – Layer index.
- Returns
Output feature map.
- Return type
x (4D tensor)
-
training
= None¶
-
class
imaginaire.generators.fs_vid2vid.
LabelEmbedder
(emb_cfg, num_input_channels, num_hyper_layers=0)[source]¶ Bases:
torch.nn.modules.module.Module
Embed the input label map to get embedded features.
- Parameters
emb_cfg (obj) – Embed network configuration.
num_input_channels (int) – Number of input channels.
num_hyper_layers (int) – Number of hyper layers.
-
forward
(input, weights=None)[source]¶ Embedding network forward.
- Parameters
input (NxCxHxW tensor) – Network input.
weights (list of tensors) – Conv weights if using hyper network.
- Returns
Network outputs at different layers.
- Return type
output (list of tensors)
-
training
= None¶
-
class
imaginaire.generators.fs_vid2vid.
WeightGenerator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Weight generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file
-
encode_reference
(ref_image, ref_label, label, k)[source]¶ Encode the reference image to get features for weight generation.
- Parameters
ref_image ((NxK)x3xHxW tensor) – Reference images.
ref_label ((NxK)xCxHxW tensor) – Reference labels.
label (NxCxHxW tensor) – Target label.
k (int) – Number of reference images.
- Returns
x (NxC2xH2xW2 tensor): Encoded features from reference images for the main branch (as input to the decoder).
encoded_ref (list of tensors): Encoded features from reference images for the weight generation branch.
attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.
atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.
ref_idx (Nx1 tensor): Index for which image to use from the reference images.
- Return type
(tuple)
-
forward
(ref_image, ref_label, label, is_first_frame)[source]¶ Generate network weights based on the reference images.
- Parameters
ref_image (NxKx3xHxW tensor) – Reference images.
ref_label (NxKxCxHxW tensor) – Reference labels.
label (NxCxHxW tensor) – Target label.
is_first_frame (bool) – Whether the current frame is the first frame.
- Returns
x (NxC2xH2xW2 tensor): Encoded features from reference images for the main branch (as input to the decoder).
encoded_label (list of tensors): Encoded target label map for SPADE.
conv_weights (list of tensors): Network weights for conv layers in the main network.
norm_weights (list of tensors): Network weights for SPADE layers in the main network.
attention (Nx(KxH1xW1)x(H1xW1) tensor): Attention maps.
atn_vis (1x1xH1xW1 tensor): Visualization for attention scores.
ref_idx (Nx1 tensor): Index for which image to use from the reference images.
- Return type
(tuple)
-
get_conv_weights
(x, i)[source]¶ Adaptively generate weights for layer i in main branch convolutions.
- Parameters
x (NxCxHxW tensor) – Input features.
i (int) – Layer index.
- Returns
conv_weights (list of tensors): Weights for the conv layers in the main branch.
- Return type
(tuple)
-
get_norm_weights
(x, i)[source]¶ Adaptively generate weights for SPADE in layer i of generator.
- Parameters
x (NxCxHxW tensor) – Input features.
i (int) – Layer index.
- Returns
embedding_weights (list of tensors): Weights for the label embedding network.
norm_weights (list of tensors): Weights for the SPADE layers.
- Return type
(tuple)
-
training
= None¶
-
class
imaginaire.generators.fs_vid2vid.
WeightReshaper
[source]¶ Bases:
object
Handles all weight reshape related tasks.
-
reshape_embed_input
(x)[source]¶ Reshape input to be (B x C) X H X W.
- Parameters
x (tensor or list of tensors) – Input features.
- Returns
Reshaped features.
- Return type
x (tensor or list of tensors)
-
reshape_weight
(x, weight_shape)[source]¶ Reshape input x to the desired weight shape.
- Parameters
x (tensor or list of tensors) – Input features.
weight_shape (list of int) – Desired shape of the weight.
- Returns
weight (tensor): Network weights
bias (tensor): Network bias.
- Return type
(tuple)
-
split_weights
(weight, sizes)[source]¶ When the desired shape is a list, first divide the input to each corresponding weight shape in the list.
- Parameters
weight (tensor) – Input weight.
sizes (int or list of int) – Target sizes.
- Returns
Divided weights.
- Return type
weight (list of tensors)
-
imaginaire.generators.funit module¶
-
class
imaginaire.generators.funit.
ContentEncoder
(num_downsamples, num_res_blocks, image_channels, num_filters, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]¶ Bases:
torch.nn.modules.module.Module
Improved FUNIT Content Encoder. This is basically the same as the original FUNIT content encoder.
- Parameters
num_downsamples (int) – Number of times we reduce resolution by 2x2.
num_res_blocks (int) – Number of times we append residual block after all the downsampling modules.
image_channels (int) – Number of input image channels.
num_filters (int) – Base filter number.
padding_mode (str) – Padding mode
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
'none'
,'spectral'
, or'weight'
.nonlinearity (str) – Nonlinearity.
-
training
= None¶
-
class
imaginaire.generators.funit.
Decoder
(num_enc_output_channels, style_channels, num_image_channels=3, num_upsamples=4, padding_type='reflect', weight_norm_type='none', nonlinearity='relu')[source]¶ Bases:
torch.nn.modules.module.Module
Improved FUNIT decoder.
- Parameters
num_enc_output_channels (int) – Number of content feature channels.
style_channels (int) – Dimension of the style code.
num_image_channels (int) – Number of image channels.
num_upsamples (int) – How many times we are going to apply upsample residual block.
-
forward
(x, style)[source]¶ - Parameters
x (tensor) – Content embedding of the content image.
style (tensor) – Style embedding of the style image.
-
training
= None¶
-
class
imaginaire.generators.funit.
FUNITTranslator
(num_filters=64, num_filters_mlp=256, style_dims=64, num_res_blocks=2, num_mlp_blocks=3, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, weight_norm_type='', **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
- Parameters
num_filters (int) – Base filter numbers.
num_filters_mlp (int) – Base filter number in the MLP module.
style_dims (int) – Dimension of the style code.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_mlp_blocks (int) – Number of layers in the MLP module.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.
num_image_channels (int) – Number of input image channels.
weight_norm_type (str) – Type of weight normalization.
'none'
,'spectral'
, or'weight'
.
-
decode
(content, style)[source]¶ Generate images by combining their content and style codes.
- Parameters
content (tensor) – Content code tensor.
style (tensor) – Style code tensor.
-
encode
(images)[source]¶ Encoder images to get their content and style codes.
- Parameters
images (tensor) – Input image tensor.
-
forward
(images)[source]¶ Reconstruct the input image by combining the computer content and style code.
- Parameters
images (tensor) – Input image tensor.
-
training
= None¶
-
class
imaginaire.generators.funit.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Generator of the improved FUNIT baseline in the COCO-FUNIT paper.
-
forward
(data)[source]¶ In the FUNIT’s forward pass, it generates a content embedding and a style code from the content image, and a style code from the style image. By mixing the content code and the style code from the content image, we reconstruct the input image. By mixing the content code and the style code from the style image, we have a translation output.
- Parameters
data (dict) – Training data at the current iteration.
-
inference
(data, keep_original_size=True)[source]¶ COCO-FUNIT inference.
- Parameters
data (dict) – Training data at the current iteration. - images_content (tensor): Content images. - images_style (tensor): Style images.
a2b (bool) – If
True
, translates images from domain A to B, otherwise from B to A.keep_original_size (bool) – If
True
, output image is resizedthe input content image size. (to) –
-
training
= None¶
-
-
class
imaginaire.generators.funit.
MLP
(input_dim, output_dim, latent_dim, num_layers, activation_norm_type, nonlinearity)[source]¶ Bases:
torch.nn.modules.module.Module
Improved FUNIT style decoder.
- Parameters
input_dim (int) – Input dimension (style code dimension).
output_dim (int) – Output dimension (to be fed into the AdaIN layer).
latent_dim (int) – Latent dimension.
num_layers (int) – Number of layers in the MLP.
activation_norm_type (str) – Activation type.
nonlinearity (str) – Nonlinearity type.
-
training
= None¶
-
class
imaginaire.generators.funit.
StyleEncoder
(num_downsamples, image_channels, num_filters, style_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]¶ Bases:
torch.nn.modules.module.Module
Improved FUNIT Style Encoder. This is basically the same as the original FUNIT Style Encoder.
- Parameters
num_downsamples (int) – Number of times we reduce resolution by 2x2.
image_channels (int) – Number of input image channels.
num_filters (int) – Base filter number.
style_channels (int) – Style code dimension.
padding_mode (str) – Padding mode.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
'none'
,'spectral'
, or'weight'
.nonlinearity (str) – Nonlinearity.
-
training
= None¶
imaginaire.generators.gancraft module¶
-
class
imaginaire.generators.gancraft.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
imaginaire.generators.gancraft_base.Base3DGenerator
GANcraft generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
forward
(data, random_style=False)[source]¶ GANcraft Generator forward.
- Parameters
data (dict) – images (N x 3 x H x W tensor) : Real images voxel_id (N x H x W x max_samples x 1 tensor): IDs of intersected tensors along each ray. depth2 (N x 2 x H x W x max_samples x 1 tensor): Depths of entrance and exit points for each ray-voxel intersection. raydirs (N x H x W x 1 x 3 tensor): The direction of each ray. cam_ori_t (N x 3 tensor): Camera origins.
random_style (bool) – Whether to sample a random style vector.
- Returns
fake_images (N x 3 x H x W tensor): fake images mu (N x C1 tensor): mean vectors logvar (N x C1 tensor): log-variance vectors
- Return type
output (dict)
-
get_pseudo_gt
(pseudo_gen, voxel_id, z=None, style_img=None, resize_512=True, deterministic=False)[source]¶ Evaluating img2img network to obtain pseudo-ground truth images.
- Parameters
pseudo_gen (callable) – Function converting mask to image using img2img network.
voxel_id (N x img_dims[0] x img_dims[1] x max_samples x 1 tensor) – IDs of intersected tensors along
ray. (each) –
z (N x C tensor) – Optional style code passed to pseudo_gen.
style_img (N x 3 x H x W tensor) – Optional style image passed to pseudo_gen.
resize_512 (bool) – If True, evaluate pseudo_gen at 512x512 regardless of input resolution.
deterministic (bool) – If True, disable stochastic label mapping.
-
inference
(output_dir, camera_mode, style_img_path=None, seed=1, pad=30, num_samples=40, num_blocks_early_stop=6, sample_depth=3, tile_size=128, resolution_hw=[540, 960], cam_ang=72, cam_maxstep=10)[source]¶ Compute result images according to the provided camera trajectory and save the results in the specified folder. The full image is evaluated in multiple tiles to save memory.
- Parameters
output_dir (str) – Where should the results be stored.
camera_mode (int) – Which camera trajectory to use.
style_img_path (str) – Path to the style-conditioning image.
seed (int) – Random seed (controls style when style_image_path is not specified).
pad (int) – Pixels to remove from the image tiles before stitching. Should be equal or larger than the
field of the CNN to avoid border artifact. (receptive) –
num_samples (int) – Number of samples per ray (different from training).
num_blocks_early_stop (int) – Max number of intersected boxes per ray before stopping
from training) ((different) –
sample_depth (float) – Max distance traveled through boxes before stopping (different from training).
tile_size (int) – Max size of a tile in pixels.
resolution_hw (list [H, W]) – Resolution of the output image.
cam_ang (float) – Horizontal FOV of the camera (may be adjusted by the camera controller).
cam_maxstep (int) – Number of frames sampled from the camera trajectory.
-
sample_camera
(data, pseudo_gen)[source]¶ Sample camera randomly and precompute everything used by both Gen and Dis.
- Parameters
data (dict) – images (N x 3 x H x W tensor) : Real images label (N x C2 x H x W tensor) : Segmentation map
pseudo_gen (callable) – Function converting mask to image using img2img network.
- Returns
voxel_id (N x H x W x max_samples x 1 tensor): IDs of intersected tensors along each ray. depth2 (N x 2 x H x W x max_samples x 1 tensor): Depths of entrance and exit points for each ray-voxel intersection. raydirs (N x H x W x 1 x 3 tensor): The direction of each ray. cam_ori_t (N x 3 tensor): Camera origins. pseudo_real_img (N x 3 x H x W tensor): Pseudo-ground truth image. real_masks (N x C3 x H x W tensor): One-hot segmentation map for real images, with translated labels. fake_masks (N x C3 x H x W tensor): One-hot segmentation map for sampled camera views.
- Return type
ret (dict)
-
training
= None¶
imaginaire.generators.gancraft_base module¶
-
class
imaginaire.generators.gancraft_base.
Base3DGenerator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Minecraft 3D generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
training
= None¶
-
class
imaginaire.generators.gancraft_base.
RenderCNN
(in_channels, style_dim, hidden_channels=256, leaky_relu=True)[source]¶ Bases:
torch.nn.modules.module.Module
CNN converting intermediate feature map to final image.
-
forward
(x, z)[source]¶ Forward network.
- Parameters
x (N x in_channels x H x W tensor) – Intermediate feature map
z (N x style_dim tensor) – Style codes.
-
training
= None¶
-
-
class
imaginaire.generators.gancraft_base.
RenderMLP
(in_channels, style_dim, viewdir_dim, mask_dim=680, out_channels_s=1, out_channels_c=3, hidden_channels=256, use_seg=True)[source]¶ Bases:
torch.nn.modules.module.Module
MLP with affine modulation.
-
forward
(x, raydir, z, m)[source]¶ Forward network
- Parameters
x (N x H x W x M x in_channels tensor) – Projected features.
raydir (N x H x W x 1 x viewdir_dim tensor) – Ray directions.
z (N x style_dim tensor) – Style codes.
m (N x H x W x M x mask_dim tensor) – One-hot segmentation maps.
-
training
= None¶
-
-
class
imaginaire.generators.gancraft_base.
SKYMLP
(in_channels, style_dim, out_channels_c=3, hidden_channels=256, leaky_relu=True)[source]¶ Bases:
torch.nn.modules.module.Module
MLP converting ray directions to sky features.
-
forward
(x, z)[source]¶ Forward network
- Parameters
x (.. x in_channels tensor) – Ray direction embeddings.
z (.. x style_dim tensor) – Style codes.
-
training
= None¶
-
-
class
imaginaire.generators.gancraft_base.
StyleEncoder
(style_enc_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Style Encoder constructor.
- Parameters
style_enc_cfg (obj) – Style encoder definition file.
-
forward
(input_x)[source]¶ SPADE Style Encoder forward.
- Parameters
input_x (N x 3 x H x W tensor) – input images.
- Returns
Mean vectors. logvar (N x C tensor): Log-variance vectors. z (N x C tensor): Style code vectors.
- Return type
mu (N x C tensor)
-
training
= None¶
imaginaire.generators.munit module¶
-
class
imaginaire.generators.munit.
AutoEncoder
(num_filters=64, max_num_filters=256, num_filters_mlp=256, latent_dim=8, num_res_blocks=4, num_mlp_blocks=2, num_downsamples_style=4, num_downsamples_content=2, num_image_channels=3, content_norm_type='instance', style_norm_type='', decoder_norm_type='instance', weight_norm_type='', decoder_norm_params=namespace(affine=False), output_nonlinearity='', pre_act=False, apply_noise=False, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
Improved MUNIT autoencoder.
- Parameters
num_filters (int) – Base filter numbers.
max_num_filters (int) – Maximum number of filters in the encoder.
num_filters_mlp (int) – Base filter number in the MLP module.
latent_dim (int) – Dimension of the style code.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_mlp_blocks (int) – Number of layers in the MLP module.
num_downsamples_style (int) – Number of times we reduce resolution by 2x2 for the style image.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_image_channels (int) – Number of input image channels.
content_norm_type (str) – Type of activation normalization in the content encoder.
style_norm_type (str) – Type of activation normalization in the style encoder.
decoder_norm_type (str) – Type of activation normalization in the decoder.
weight_norm_type (str) – Type of weight normalization.
decoder_norm_params (obj) – Parameters of activation normalization in the decoder. If not
None
, decoder_norm_params.__dict__ will be used as keyword arguments when initializing activation normalization.output_nonlinearity (str) – Type of nonlinearity before final output,
'tanh'
or'none'
.pre_act (bool) – If
True
, uses pre-activation residual blocks.apply_noise (bool) – If
True
, injects Gaussian noise in the decoder.
-
decode
(content, style)[source]¶ Decode content and style code to an image.
- Parameters
content (Tensor) – Content code.
style (Tensor) – Style code.
- Returns
Output images.
- Return type
images (Tensor)
-
encode
(images)[source]¶ Encode an image to content and style code.
- Parameters
images (Tensor) – Input images.
- Returns
content (Tensor): Content code.
style (Tensor): Style code.
- Return type
(tuple)
-
forward
(images)[source]¶ Reconstruct an image.
- Parameters
images (Tensor) – Input images.
- Returns
Reconstructed images.
- Return type
images_recon (Tensor)
-
training
= None¶
-
class
imaginaire.generators.munit.
Decoder
(num_upsamples, num_res_blocks, num_filters, num_image_channels, style_channels, padding_mode, activation_norm_type, activation_norm_params, weight_norm_type, nonlinearity, output_nonlinearity, pre_act=False, apply_noise=False)[source]¶ Bases:
torch.nn.modules.module.Module
Improved MUNIT decoder. The network consists of
$(num_res_blocks) residual blocks.
$(num_upsamples) residual blocks or convolutional blocks
output layer.
- Parameters
num_upsamples (int) – Number of times we increase resolution by 2x2.
num_res_blocks (int) – Number of residual blocks.
num_filters (int) – Base filter numbers.
num_image_channels (int) – Number of input image channels.
style_channels (int) – Dimension of the style code.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
activation_norm_params (obj) – Parameters of activation normalization. If not
None
, decoder_norm_params.__dict__ will be used as keyword arguments when initializing activation normalization.weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.
output_nonlinearity (str) – Type of nonlinearity before final output,
'tanh'
or'none'
.pre_act (bool) – If
True
, uses pre-activation residual blocks.apply_noise (bool) – If
True
, injects Gaussian noise.
-
forward
(x, style)[source]¶ - Parameters
x (tensor) – Content embedding of the content image.
style (tensor) – Style embedding of the style image.
-
training
= None¶
-
class
imaginaire.generators.munit.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Improved MUNIT generator.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
forward
(data, random_style=True, image_recon=True, latent_recon=True, cycle_recon=True, within_latent_recon=False)[source]¶ In MUNIT’s forward pass, it generates a content code and a style code from images in both domain. It then performs a within-domain reconstruction step and a cross-domain translation step. In within-domain reconstruction, it reconstructs an image using the content and style from the same image and optionally encodes the image back to the latent space. In cross-domain translation, it generates an translated image by mixing the content and style from images in different domains, and optionally encodes the image back to the latent space.
- Parameters
data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.
random_style (bool) – If
True
, samples the style code from the prior distribution, otherwise uses the style code encoded from the input images in the other domain.image_recon (bool) – If
True
, also returns reconstructed images.latent_recon (bool) – If
True
, also returns reconstructed latent code during cross-domain translation.cycle_recon (bool) – If
True
, also returns cycle reconstructed images.within_latent_recon (bool) – If
True
, also returns reconstructed latent code during within-domain reconstruction.
-
inference
(data, a2b=True, random_style=True)[source]¶ MUNIT inference.
- Parameters
data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.
a2b (bool) – If
True
, translates images from domain A to B, otherwise from B to A.random_style (bool) – If
True
, samples the style code from the prior distribution, otherwise uses the style code encoded from the input images in the other domain.
-
training
= None¶
-
class
imaginaire.generators.munit.
MLP
(input_dim, output_dim, latent_dim, num_layers, norm, nonlinearity)[source]¶ Bases:
torch.nn.modules.module.Module
The multi-layer perceptron (MLP) that maps Gaussian style code to a feature vector that is given as the conditional input to AdaIN.
- Parameters
input_dim (int) – Number of channels in the input tensor.
output_dim (int) – Number of channels in the output tensor.
latent_dim (int) – Number of channels in the latent features.
num_layers (int) – Number of layers in the MLP.
norm (str) – Type of activation normalization.
nonlinearity (str) – Type of nonlinear activation function.
-
training
= None¶
-
class
imaginaire.generators.munit.
StyleEncoder
(num_downsamples, num_image_channels, num_filters, style_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity)[source]¶ Bases:
torch.nn.modules.module.Module
MUNIT style encoder.
- Parameters
num_downsamples (int) – Number of times we reduce resolution by 2x2.
num_image_channels (int) – Number of input image channels.
num_filters (int) – Base filter numbers.
style_channels (int) – Dimension of the style code.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.
-
training
= None¶
imaginaire.generators.pix2pixHD module¶
-
class
imaginaire.generators.pix2pixHD.
Encoder
(enc_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Encoder for getting region-wise features for style control.
- Parameters
enc_cfg (obj) – Encoder definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file
-
forward
(input, instance_map)[source]¶ Extracting region-wise features
- Parameters
input (4D tensor) – Real RGB images.
instance_map (4D tensor) – Instance label mask.
- Returns
- Instance-wise average-pooled
feature maps.
- Return type
outputs_mean (4D tensor)
-
training
= None¶
-
class
imaginaire.generators.pix2pixHD.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Pix2pixHD coarse-to-fine generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
forward
(data, random_style=False)[source]¶ Coarse-to-fine generator forward.
- Parameters
data (dict) – Dictionary of input data.
random_style (bool) – Always set to false for the pix2pixHD model.
- Returns
Dictionary of output data.
- Return type
output (dict)
-
inference
(data, **kwargs)[source]¶ Generator inference.
- Parameters
data (dict) – Dictionary of input data.
- Returns
Output fake images. file_names (str): Data file name.
- Return type
fake_images (tensor)
-
training
= None¶
-
class
imaginaire.generators.pix2pixHD.
GlobalGenerator
(gen_cfg, data_cfg, num_input_channels, padding_mode, base_conv_block, base_res_block)[source]¶ Bases:
torch.nn.modules.module.Module
Coarse generator constructor. This is the main generator in the pix2pixHD architecture.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
num_input_channels (int) – Number of segmentation labels.
padding_mode (str) – zero | reflect | …
base_conv_block (obj) – Conv block with preset attributes.
base_res_block (obj) – Residual block with preset attributes.
-
forward
(input)[source]¶ Coarse-to-fine generator forward.
- Parameters
input (4D tensor) – Input semantic representations.
- Returns
Synthesized image by generator.
- Return type
output (4D tensor)
-
training
= None¶
-
class
imaginaire.generators.pix2pixHD.
LocalEnhancer
(gen_cfg, data_cfg, num_input_channels, num_filters, padding_mode, base_conv_block, base_res_block, output_img=False)[source]¶ Bases:
torch.nn.modules.module.Module
Local enhancer constructor. These are sub-networks that are useful when aiming to produce high-resolution outputs.
- Parameters
gen_cfg (obj) – local generator definition part of the yaml config
file. –
data_cfg (obj) – Data definition part of the yaml config file.
num_input_channels (int) – Number of segmentation labels.
num_filters (int) – Number of filters for the first layer.
padding_mode (str) – zero | reflect | …
base_conv_block (obj) – Conv block with preset attributes.
base_res_block (obj) – Residual block with preset attributes.
output_img (bool) – Output is image or feature map.
-
forward
(output_coarse, input_fine)[source]¶ Local enhancer forward.
- Parameters
output_coarse (4D tensor) – Coarse output from previous layer.
input_fine (4D tensor) – Fine input from current layer.
- Returns
Refined output.
- Return type
output (4D tensor)
-
training
= None¶
imaginaire.generators.spade module¶
-
class
imaginaire.generators.spade.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
SPADE generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
forward
(data, random_style=False)[source]¶ SPADE Generator forward.
- Parameters
data (dict) –
images (N x C1 x H x W tensor) : Ground truth images
label (N x C2 x H x W tensor) : Semantic representations
z (N x style_dims tensor): Gaussian random noise
random_style (bool): Whether to sample a random style vector.
- Returns
fake_images (N x 3 x H x W tensor): fake images
mu (N x C1 tensor): mean vectors
logvar (N x C1 tensor): log-variance vectors
- Return type
(dict)
-
inference
(data, random_style=False, use_fixed_random_style=False, keep_original_size=False)[source]¶ Compute results images for a batch of input data and save the results in the specified folder.
- Parameters
data (dict) –
images (N x C1 x H x W tensor) : Ground truth images
label (N x C2 x H x W tensor) : Semantic representations
z (N x style_dims tensor): Gaussian random noise
random_style (bool) – Whether to sample a random style vector.
use_fixed_random_style (bool) – Sample random style once and use it for all the remaining inference.
keep_original_size (bool) – Keep original size of the input.
- Returns
fake_images (N x 3 x H x W tensor): fake images
mu (N x C1 tensor): mean vectors
logvar (N x C1 tensor): log-variance vectors
- Return type
(dict)
-
training
= None¶
-
class
imaginaire.generators.spade.
SPADEGenerator
(num_labels, out_image_small_side_size, image_channels, num_filters, kernel_size, style_dims, activation_norm_params, weight_norm_type, global_adaptive_norm_type, skip_activation_norm, use_posenc_in_input_layer, use_style_encoder, output_multiplier)[source]¶ Bases:
torch.nn.modules.module.Module
SPADE Image Generator constructor.
- Parameters
num_labels (int) – Number of different labels.
out_image_small_side_size (int) – min(width, height)
image_channels (int) – Num. of channels of the output image.
num_filters (int) – Base filter numbers.
kernel_size (int) – Convolution kernel size.
style_dims (int) – Dimensions of the style code.
activation_norm_params (obj) – Spatially adaptive normalization param.
weight_norm_type (str) – Type of weight normalization.
'none'
,'spectral'
, or'weight'
.global_adaptive_norm_type (str) – Type of normalization in SPADE.
skip_activation_norm (bool) – If
True
, applies activation norm to the shortcut connection in residual blocks.use_style_encoder (bool) – Whether to use global adaptive norm like conditional batch norm or adaptive instance norm.
output_multiplier (float) – A positive number multiplied to the output
-
forward
(data)[source]¶ SPADE Generator forward.
- Parameters
data (dict) –
data (N x C1 x H x W tensor) : Ground truth images.
label (N x C2 x H x W tensor) : Semantic representations.
z (N x style_dims tensor): Gaussian random noise.
- Returns
fake_images (N x 3 x H x W tensor): Fake images.
- Return type
output (dict)
-
training
= None¶
-
class
imaginaire.generators.spade.
StyleEncoder
(style_enc_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Style Encode constructor.
- Parameters
style_enc_cfg (obj) – Style encoder definition file.
-
forward
(input_x)[source]¶ SPADE Style Encoder forward.
- Parameters
input_x (N x 3 x H x W tensor) – input images.
- Returns
mu (N x C tensor): Mean vectors.
logvar (N x C tensor): Log-variance vectors.
z (N x C tensor): Style code vectors.
- Return type
(tuple)
-
training
= None¶
imaginaire.generators.unit module¶
-
class
imaginaire.generators.unit.
AutoEncoder
(num_filters=64, max_num_filters=256, num_res_blocks=4, num_downsamples_content=2, num_image_channels=3, content_norm_type='instance', decoder_norm_type='instance', weight_norm_type='', output_nonlinearity='', pre_act=False, apply_noise=False, **kwargs)[source]¶ Bases:
torch.nn.modules.module.Module
Improved UNIT autoencoder.
- Parameters
num_filters (int) – Base filter numbers.
max_num_filters (int) – Maximum number of filters in the encoder.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_downsamples_content (int) – Number of times we reduce resolution by 2x2 for the content image.
num_image_channels (int) – Number of input image channels.
content_norm_type (str) – Type of activation normalization in the content encoder.
decoder_norm_type (str) – Type of activation normalization in the decoder.
weight_norm_type (str) – Type of weight normalization.
output_nonlinearity (str) – Type of nonlinearity before final output,
'tanh'
or'none'
.pre_act (bool) – If
True
, uses pre-activation residual blocks.apply_noise (bool) – If
True
, injects Gaussian noise in the decoder.
-
forward
(images)[source]¶ Reconstruct an image.
- Parameters
images (Tensor) – Input images.
- Returns
Reconstructed images.
- Return type
images_recon (Tensor)
-
training
= None¶
-
class
imaginaire.generators.unit.
ContentEncoder
(num_downsamples, num_res_blocks, num_image_channels, num_filters, max_num_filters, padding_mode, activation_norm_type, weight_norm_type, nonlinearity, pre_act=False)[source]¶ Bases:
torch.nn.modules.module.Module
Improved UNIT encoder. The network consists of:
input layers
$(num_downsamples) convolutional blocks
$(num_res_blocks) residual blocks.
output layer.
- Parameters
num_downsamples (int) – Number of times we reduce resolution by 2x2.
num_res_blocks (int) – Number of residual blocks at the end of the content encoder.
num_image_channels (int) – Number of input image channels.
num_filters (int) – Base filter numbers.
max_num_filters (int) – Maximum number of filters in the encoder.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.
pre_act (bool) – If
True
, uses pre-activation residual blocks.
-
training
= None¶
-
class
imaginaire.generators.unit.
Decoder
(num_upsamples, num_res_blocks, num_filters, num_image_channels, padding_mode, activation_norm_type, weight_norm_type, nonlinearity, output_nonlinearity, pre_act=False, apply_noise=False)[source]¶ Bases:
torch.nn.modules.module.Module
Improved UNIT decoder. The network consists of:
$(num_res_blocks) residual blocks.
$(num_upsamples) residual blocks or convolutional blocks
output layer.
- Parameters
num_upsamples (int) – Number of times we increase resolution by 2x2.
num_res_blocks (int) – Number of residual blocks.
num_filters (int) – Base filter numbers.
num_image_channels (int) – Number of input image channels.
padding_mode (string) – Type of padding.
activation_norm_type (str) – Type of activation normalization.
weight_norm_type (str) – Type of weight normalization.
nonlinearity (str) – Type of nonlinear activation function.
output_nonlinearity (str) – Type of nonlinearity before final output,
'tanh'
or'none'
.pre_act (bool) – If
True
, uses pre-activation residual blocks.apply_noise (bool) – If
True
, injects Gaussian noise.
-
training
= None¶
-
class
imaginaire.generators.unit.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
torch.nn.modules.module.Module
Improved UNIT generator.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
inference
(data, a2b=True)[source]¶ UNIT inference.
- Parameters
data (dict) – Training data at the current iteration. - images_a (tensor): Images from domain A. - images_b (tensor): Images from domain B.
a2b (bool) – If
True
, translates images from domain A to B, otherwise from B to A.
-
training
= None¶
imaginaire.generators.vid2vid module¶
-
class
imaginaire.generators.vid2vid.
BaseNetwork
[source]¶ Bases:
torch.nn.modules.module.Module
vid2vid generator.
-
get_num_filters
(num_downsamples)[source]¶ Get the number of filters at current layer.
- Parameters
num_downsamples (int) – How many downsamples at current layer.
- Returns
Number of filters.
- Return type
output (int)
-
training
= None¶
-
-
class
imaginaire.generators.vid2vid.
FlowGenerator
(flow_cfg, data_cfg)[source]¶ Bases:
imaginaire.generators.vid2vid.BaseNetwork
Flow generator constructor.
- Parameters
flow_cfg (obj) – Flow definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
forward
(label, img_prev)[source]¶ Flow generator forward.
- Parameters
label (4D tensor) – Input label tensor.
img_prev (4D tensor) – Previously generated image tensors.
- Returns
flow (4D tensor) : Generated flow map.
mask (4D tensor) : Generated occlusion mask.
- Return type
(tuple)
-
training
= None¶
-
class
imaginaire.generators.vid2vid.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
imaginaire.generators.vid2vid.BaseNetwork
vid2vid generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file.
-
forward
(data)[source]¶ vid2vid generator forward.
- Parameters
data (dict) – Dictionary of input data.
- Returns
Dictionary of output data.
- Return type
output (dict)
-
get_cond_dims
(num_downs=0)[source]¶ Get the dimensions of conditional inputs.
- Parameters
num_downs (int) – How many downsamples at current layer.
- Returns
List of dimensions.
- Return type
ch (list)
-
get_cond_maps
(label, embedder)[source]¶ Get the conditional inputs.
- Parameters
label (4D tensor) – Input label tensor.
embedder (obj) – Embedding network.
- Returns
List of conditional inputs.
- Return type
cond_maps (list)
-
init_temporal_network
(cfg_init=None)[source]¶ When starting training multiple frames, initialize the downsampling network and flow network.
- Parameters
cfg_init (dict) – Weight initialization config.
-
one_up_conv_layer
(x, encoded_label, i)[source]¶ One residual block layer in the main branch.
- Parameters
x (4D tensor) – Current feature map.
encoded_label (list of tensors) – Encoded input label maps.
i (int) – Layer index.
- Returns
Output feature map.
- Return type
x (4D tensor)
-
training
= None¶
imaginaire.generators.wc_vid2vid module¶
-
class
imaginaire.generators.wc_vid2vid.
Generator
(gen_cfg, data_cfg)[source]¶ Bases:
imaginaire.generators.vid2vid.Generator
world consistent vid2vid generator constructor.
- Parameters
gen_cfg (obj) – Generator definition part of the yaml config file.
data_cfg (obj) – Data definition part of the yaml config file
-
forward
(data)[source]¶ vid2vid generator forward. :param data: Dictionary of input data. :type data: dict
- Returns
Dictionary of output data.
- Return type
output (dict)
-
get_cond_dims
(num_downs=0)[source]¶ Get the dimensions of conditional inputs. :param num_downs: How many downsamples at current layer. :type num_downs: int
- Returns
List of dimensions.
- Return type
ch (list)
-
get_cond_maps
(label, embedder)[source]¶ Get the conditional inputs. :param label: Input label tensor. :type label: 4D tensor :param embedder: Embedding network. :type embedder: obj
- Returns
List of conditional inputs.
- Return type
cond_maps (list)
-
get_partial
(num_downs=0)[source]¶ Get if convs should be partial or not. :param num_downs: How many downsamples at current layer. :type num_downs: int
- Returns
List of boolean partial or not.
- Return type
partial (list)
-
reset_renderer
(is_flipped_input=False)[source]¶ Reset the renderer. :param is_flipped_input: Is the input sequence left-right flipped? :type is_flipped_input: bool
-
training
= None¶