imaginaire.model_utils package¶

Submodules¶

imaginaire.model_utils.fs_vid2vid module¶

Utils for the few shot vid2vid model.

imaginaire.model_utils.fs_vid2vid.combine_fg_mask(fg_mask, ref_fg_mask, has_fg)[source]¶

Get the union of target and reference foreground masks. :param fg_mask: Foreground mask for target image. :type fg_mask: tensor :param ref_fg_mask: Foreground mask for reference image. :type ref_fg_mask: tensor :param has_fg: Whether the image can be classified into fg/bg. :type has_fg: bool

Returns: Combined foreground mask.
Return type: output (tensor or int)

imaginaire.model_utils.fs_vid2vid.concat_frames(prev, now, n_frames)[source]¶

Concat previous and current frames and only keep the latest $(n_frames). If concatenated frames are longer than $(n_frames), drop the oldest one.

Parameters

prev (NxTxCxHxW tensor) – Tensor for previous frames.
now (NxCxHxW tensor) – Tensor for current frame.
n_frames (int) – Max number of frames to store.

Returns

Updated tensor.

Return type

result (NxTxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.crop_and_resize(img, coords, size=None, method='bilinear')[source]¶

Crop the image using the given coordinates and resize to target size.

Parameters

img (tensor or list of tensors) – Input image.
coords (list of int) – Pixel coordinates to crop.
size (list of int) – Output size.
method (str) – Interpolation method.

Returns

Output image.

Return type

img (tensor or list of tensors)

imaginaire.model_utils.fs_vid2vid.crop_face_from_data(cfg, is_inference, data)[source]¶

Crop the face regions in input data and resize to the target size. This is for training face datasets.

Parameters

cfg (obj) – Data configuration.
is_inference (bool) – Is doing inference or not.
data (dict) – Input data.

Returns

Cropped data.

Return type

data (dict)

imaginaire.model_utils.fs_vid2vid.crop_face_from_output(data_cfg, image, input_label, crop_smaller=0)[source]¶

Crop out the face region of the image (and resize if necessary to feed into generator/discriminator).

Parameters

data_cfg (obj) – Data configuration.
image (NxC1xHxW tensor or list of tensors) – Image to crop.
input_label (NxC2xHxW tensor) – Input label map.
crop_smaller (int) – Number of pixels to crop slightly smaller region.

Returns

Cropped image.

Return type

output (NxC1xHxW tensor or list of tensors)

imaginaire.model_utils.fs_vid2vid.crop_hand_from_output(data_cfg, image, input_label)[source]¶

Crop out the hand region of the image.

Parameters

data_cfg (obj) – Data configuration.
image (NxC1xHxW tensor or list of tensors) – Image to crop.
input_label (NxC2xHxW tensor) – Input label map.

Returns

Cropped image.

Return type

output (NxC1xHxW tensor or list of tensors)

imaginaire.model_utils.fs_vid2vid.crop_person_from_data(cfg, is_inference, data)[source]¶

Crop the person regions in data and resize to the target size. This is for training full body datasets.

Parameters

cfg (obj) – Data configuration.
is_inference (bool) – Is doing inference or not.
data (dict) – Input data.

Returns

Cropped data.

Return type

data (dict)

imaginaire.model_utils.fs_vid2vid.detach(output)[source]¶

Detach tensors in the dict.

Parameters: output (dict) – Output dict.
Returns: Detached output dict.
Return type: output (dict)

imaginaire.model_utils.fs_vid2vid.extract_valid_pose_labels(pose_map, pose_type, remove_face_labels, do_remove=True)[source]¶

Remove some labels (e.g. face regions) in the pose map if necessary.

Parameters

pose_map (3D, 4D or 5D tensor) – Input pose map.
pose_type (str) – ‘both’ or ‘open’.
remove_face_labels (bool) – Whether to remove labels for the face region.
do_remove (bool) – Do remove face labels.

Returns

Output pose map.

Return type

pose_map (3D, 4D or 5D tensor)

imaginaire.model_utils.fs_vid2vid.get_face_bbox_for_data(keypoints, orig_img_size, scale, is_inference)[source]¶

Get the bbox coordinates for face region.

Parameters

keypoints (Nx2 tensor) – Facial landmarks.
orig_img_size (int tuple) – Height and width of the input image size.
scale (float) – When training, randomly scale the crop size for
augmentation. –
is_inference (bool) – Is doing inference or not.

Returns

bbox for face region. scale (float): Also returns scale to ensure reference and target frames are croppped using the same scale.

Return type

crop_coords (list of int)

imaginaire.model_utils.fs_vid2vid.get_face_bbox_for_output(data_cfg, pose, crop_smaller=0)[source]¶

Get pixel coordinates of the face bounding box.

Parameters

data_cfg (obj) – Data configuration.
pose (NxCxHxW tensor) – Pose label map.
crop_smaller (int) – Number of pixels to crop slightly smaller region.

Returns

Face bbox.

Return type

output (list of int)

imaginaire.model_utils.fs_vid2vid.get_face_mask(densepose_map)[source]¶

Obtain mask of faces. :param densepose_map: DensePose map. :type densepose_map: 3D or 4D tensor

Returns: Face mask.
Return type: mask (3D or 4D tensor)

imaginaire.model_utils.fs_vid2vid.get_fg_mask(densepose_map, has_fg)[source]¶

Obtain the foreground mask for pose sequences, which only includes the human. This is done by looking at the body part map from DensePose.

Parameters

densepose_map (NxCxHxW tensor) – DensePose map.
has_fg (bool) – Whether data has foreground or not.

Returns

fg mask.

Return type

mask (Nx1xHxW tensor)

imaginaire.model_utils.fs_vid2vid.get_grid(batchsize, size, minval=-1.0, maxval=1.0)[source]¶

Get a grid ranging [-1, 1] of 2D/3D coordinates.

Parameters

batchsize (int) – Batch size.
size (tuple) – (height, width) or (depth, height, width).
minval (float) – minimum value in returned grid.
maxval (float) – maximum value in returned grid.

Returns

Grid of coordinates.

Return type

t_grid (4D tensor)

imaginaire.model_utils.fs_vid2vid.get_hand_bbox_for_output(data_cfg, pose)[source]¶

Get coordinates of the hand bounding box.

Parameters

data_cfg (obj) – Data configuration.
pose (NxCxHxW tensor) – Pose label map.

Returns

Hand bbox.

Return type

output (list of int)

imaginaire.model_utils.fs_vid2vid.get_part_mask(densepose_map)[source]¶

Obtain mask of different body parts of humans. This is done by looking at the body part map from DensePose.

Parameters: densepose_map (NxCxHxW tensor) – DensePose map.
Returns: Body part mask, where K is the number of parts.
Return type: mask (NxKxHxW tensor)

imaginaire.model_utils.fs_vid2vid.get_person_bbox_for_data(pose_map, orig_img_size, scale=1.5, crop_aspect_ratio=1, offset=None)[source]¶

Get the bbox (pixel coordinates) to crop for person body region.

Parameters

pose_map (NxCxHxW tensor) – Input pose map.
orig_img_size (int tuple) – Height and width of the input image size.
scale (float) – When training, randomly scale the crop size for
augmentation. –
crop_aspect_ratio (float) – Output aspect ratio,
offset (list of float) – Offset for crop position.

Returns

bbox for body region.

Return type

crop_coords (list of int)

imaginaire.model_utils.fs_vid2vid.normalize_faces(keypoints, ref_keypoints, dist_scale_x=None, dist_scale_y=None)[source]¶

Normalize face keypoints w.r.t. the reference face keypoints.

Parameters

keypoints (Kx2 numpy array) – target facial keypoints.
ref_keypoints (Kx2 numpy array) – reference facial keypoints.

Returns

normalized facial keypoints.

Return type

keypoints (Kx2 numpy array)

imaginaire.model_utils.fs_vid2vid.pick_image(images, idx)[source]¶

Pick the image among images according to idx.

Parameters

images (B x N x C x H x W tensor or list of tensors) – N images.
idx (B tensor) – indices to select.

Returns

Selected images.

Return type

image (B x C x H x W)

imaginaire.model_utils.fs_vid2vid.pre_process_densepose(pose_cfg, pose_map, is_infer=False)[source]¶

Pre-process the DensePose part of input label map.

Parameters

pose_cfg (obj) – Pose data configuration.
pose_map (NxCxHxW tensor) – Pose label map.
is_infer (bool) – Is doing inference.

Returns

Processed pose label map.

Return type

pose_map (NxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.random_roll(tensors)[source]¶

Randomly roll the input tensors along x and y dimensions. Also randomly flip the tensors.

Parameters: tensors (list of 4D tensors) – Input tensors.
Returns: Rolled tensors.
Return type: output (list of 4D tensors)

imaginaire.model_utils.fs_vid2vid.remove_other_ppl(labels, densemasks)[source]¶

Remove other people in the label map except for the current target by looking at the id in the densemask map.

Parameters

labels (NxCxHxW tensor) – Input labels.
densemasks (Nx1xHxW tensor) – Densemask maps.

Returns

Output labels.

Return type

labels (NxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.resample(image, flow)[source]¶

Resamples an image using the provided flow.

Parameters

image (NxCxHxW tensor) – Image to resample.
flow (Nx2xHxW tensor) – Optical flow to resample the image.

Returns

Resampled image.

Return type

output (NxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.roll(t, ny, nx, flip=False)[source]¶

Roll and flip the tensor by specified amounts.

Parameters

t (4D tensor) – Input tensor.
ny (int) – Amount to roll along y dimension.
nx (int) – Amount to roll along x dimension.
flip (bool) – Whether to flip input.

Returns

Output tensor.

Return type

t (4D tensor)

imaginaire.model_utils.fs_vid2vid.select_object(data, obj_indices=None)[source]¶

Select the object/person in the dict according to the object index. Currently it’s used to select the target person in OpenPose dict.

Parameters

data (dict) – Input data.
obj_indices (list of int) – Indices for the objects to select.

Returns

Output data.

Return type

data (dict)

imaginaire.model_utils.label module¶

imaginaire.model_utils.label.concat_few_shot_labels(cfg, is_inference, data)[source]¶

imaginaire.model_utils.label.concat_labels(cfg, is_inference, data)[source]¶

imaginaire.model_utils.label.make_one_hot(cfg, is_inference, data)[source]¶

Convert appropriate image data types to one-hot representation.

Parameters: data (dict) – Dict containing data_type as key, with each value as a list of torch.Tensors.
Returns: same as input data, but with one-hot for selected types.
Return type: data (dict)

imaginaire.model_utils.label.move_dont_care(cfg, is_inference, data)[source]¶

imaginaire.model_utils.pix2pixHD module¶

Utils for the pix2pixHD model.

imaginaire.model_utils.pix2pixHD.cluster_features(cfg, train_data_loader, net_E, preprocess=None, small_ratio=0.0625, is_cityscapes=True)[source]¶

Use clustering to compute the features.

Parameters

cfg (obj) – Global configuration file.
train_data_loader (obj) – Dataloader for iterate through the training set.
net_E (nn.Module) – Pytorch network.
preprocess (function) – Pre-processing function.
small_ratio (float) – We only consider instance that at least occupy $(small_ratio) amount of image space.
is_cityscapes (bool) – Is this is the cityscape dataset? In the Cityscapes dataset, the instance labels for car start with 26001, 26002, …

Returns

cluster centers.

Return type

( num_labels x num_cluster_centers x feature_dims)

imaginaire.model_utils.pix2pixHD.encode_features(net_E, feat_nc, label_nc, image, inst, is_cityscapes=True)[source]¶

Compute feature embeddings for an image image. TODO(Ting-Chun): To make this funciton dataset independent.

Parameters

net_E (nn.Module) – The encoder network.
feat_nc (int) – Feature dimensions
label_nc (int) – Number of segmentation labels.
image (tensor) – Input image tensor.
inst (tensor) – Input instance map.
is_cityscapes (bool) – Is this is the cityscape dataset? In the Cityscapes dataset, the instance labels for car start with 26001, 26002, …

Returns

We will have $(label_nc): list. For each list, it will record a list of feature vectors of dimension $(feat_nc+1) where the first $(feat_nc) dimensions is the representative feature of an instance and the last dimension is the proportion.

Return type

(list of list of numpy vectors)

imaginaire.model_utils.pix2pixHD.get_edges(t)[source]¶

Compute edge maps for a given input instance map.

Parameters: t (4D tensor) – Input instance map.
Returns: Output edge map.
Return type: (4D tensor)

imaginaire.model_utils.pix2pixHD.get_optimizer_with_params(cfg, net_G, net_D, param_names_start_with=[], param_names_include=[])[source]¶

Return the optimizer object.

Parameters

cfg (obj) – Global config.
net_G (obj) – Generator network.
net_D (obj) – Discriminator network.
param_names_start_with (list of strings) – Params whose names start with any of the strings will be trained.
param_names_include (list of strings) – Params whose names include any of the strings will be trained.

imaginaire.model_utils.pix2pixHD.get_train_params(net, param_names_start_with=[], param_names_include=[])[source]¶

Get train parameters.

Parameters

net (obj) – Network object.
param_names_start_with (list of strings) – Params whose names start with any of the strings will be trained.
param_names_include (list of strings) – Params whose names include any of the strings will be trained.

imaginaire.model_utils.rename_inputs module¶

imaginaire.model_utils.rename_inputs.rename_inputs(cfg, is_inference, data)[source]¶

imaginaire.model_utils package¶

Subpackages¶

Submodules¶

imaginaire.model_utils.fs_vid2vid module¶

imaginaire.model_utils.label module¶

imaginaire.model_utils.pix2pixHD module¶

imaginaire.model_utils.rename_inputs module¶

Module contents¶