imaginaire.model_utils package

Submodules

imaginaire.model_utils.fs_vid2vid module

Utils for the few shot vid2vid model.

imaginaire.model_utils.fs_vid2vid.combine_fg_mask(fg_mask, ref_fg_mask, has_fg)[source]

Get the union of target and reference foreground masks. :param fg_mask: Foreground mask for target image. :type fg_mask: tensor :param ref_fg_mask: Foreground mask for reference image. :type ref_fg_mask: tensor :param has_fg: Whether the image can be classified into fg/bg. :type has_fg: bool

Returns

Combined foreground mask.

Return type

output (tensor or int)

imaginaire.model_utils.fs_vid2vid.concat_frames(prev, now, n_frames)[source]

Concat previous and current frames and only keep the latest $(n_frames). If concatenated frames are longer than $(n_frames), drop the oldest one.

Parameters
  • prev (NxTxCxHxW tensor) – Tensor for previous frames.

  • now (NxCxHxW tensor) – Tensor for current frame.

  • n_frames (int) – Max number of frames to store.

Returns

Updated tensor.

Return type

result (NxTxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.crop_and_resize(img, coords, size=None, method='bilinear')[source]

Crop the image using the given coordinates and resize to target size.

Parameters
  • img (tensor or list of tensors) – Input image.

  • coords (list of int) – Pixel coordinates to crop.

  • size (list of int) – Output size.

  • method (str) – Interpolation method.

Returns

Output image.

Return type

img (tensor or list of tensors)

imaginaire.model_utils.fs_vid2vid.crop_face_from_data(cfg, is_inference, data)[source]

Crop the face regions in input data and resize to the target size. This is for training face datasets.

Parameters
  • cfg (obj) – Data configuration.

  • is_inference (bool) – Is doing inference or not.

  • data (dict) – Input data.

Returns

Cropped data.

Return type

data (dict)

imaginaire.model_utils.fs_vid2vid.crop_face_from_output(data_cfg, image, input_label, crop_smaller=0)[source]

Crop out the face region of the image (and resize if necessary to feed into generator/discriminator).

Parameters
  • data_cfg (obj) – Data configuration.

  • image (NxC1xHxW tensor or list of tensors) – Image to crop.

  • input_label (NxC2xHxW tensor) – Input label map.

  • crop_smaller (int) – Number of pixels to crop slightly smaller region.

Returns

Cropped image.

Return type

output (NxC1xHxW tensor or list of tensors)

imaginaire.model_utils.fs_vid2vid.crop_hand_from_output(data_cfg, image, input_label)[source]

Crop out the hand region of the image.

Parameters
  • data_cfg (obj) – Data configuration.

  • image (NxC1xHxW tensor or list of tensors) – Image to crop.

  • input_label (NxC2xHxW tensor) – Input label map.

Returns

Cropped image.

Return type

output (NxC1xHxW tensor or list of tensors)

imaginaire.model_utils.fs_vid2vid.crop_person_from_data(cfg, is_inference, data)[source]

Crop the person regions in data and resize to the target size. This is for training full body datasets.

Parameters
  • cfg (obj) – Data configuration.

  • is_inference (bool) – Is doing inference or not.

  • data (dict) – Input data.

Returns

Cropped data.

Return type

data (dict)

imaginaire.model_utils.fs_vid2vid.detach(output)[source]

Detach tensors in the dict.

Parameters

output (dict) – Output dict.

Returns

Detached output dict.

Return type

output (dict)

imaginaire.model_utils.fs_vid2vid.extract_valid_pose_labels(pose_map, pose_type, remove_face_labels, do_remove=True)[source]

Remove some labels (e.g. face regions) in the pose map if necessary.

Parameters
  • pose_map (3D, 4D or 5D tensor) – Input pose map.

  • pose_type (str) – ‘both’ or ‘open’.

  • remove_face_labels (bool) – Whether to remove labels for the face region.

  • do_remove (bool) – Do remove face labels.

Returns

Output pose map.

Return type

pose_map (3D, 4D or 5D tensor)

imaginaire.model_utils.fs_vid2vid.get_face_bbox_for_data(keypoints, orig_img_size, scale, is_inference)[source]

Get the bbox coordinates for face region.

Parameters
  • keypoints (Nx2 tensor) – Facial landmarks.

  • orig_img_size (int tuple) – Height and width of the input image size.

  • scale (float) – When training, randomly scale the crop size for

  • augmentation.

  • is_inference (bool) – Is doing inference or not.

Returns

bbox for face region. scale (float): Also returns scale to ensure reference and target frames are croppped using the same scale.

Return type

crop_coords (list of int)

imaginaire.model_utils.fs_vid2vid.get_face_bbox_for_output(data_cfg, pose, crop_smaller=0)[source]

Get pixel coordinates of the face bounding box.

Parameters
  • data_cfg (obj) – Data configuration.

  • pose (NxCxHxW tensor) – Pose label map.

  • crop_smaller (int) – Number of pixels to crop slightly smaller region.

Returns

Face bbox.

Return type

output (list of int)

imaginaire.model_utils.fs_vid2vid.get_face_mask(densepose_map)[source]

Obtain mask of faces. :param densepose_map: DensePose map. :type densepose_map: 3D or 4D tensor

Returns

Face mask.

Return type

mask (3D or 4D tensor)

imaginaire.model_utils.fs_vid2vid.get_fg_mask(densepose_map, has_fg)[source]

Obtain the foreground mask for pose sequences, which only includes the human. This is done by looking at the body part map from DensePose.

Parameters
  • densepose_map (NxCxHxW tensor) – DensePose map.

  • has_fg (bool) – Whether data has foreground or not.

Returns

fg mask.

Return type

mask (Nx1xHxW tensor)

imaginaire.model_utils.fs_vid2vid.get_grid(batchsize, size, minval=-1.0, maxval=1.0)[source]

Get a grid ranging [-1, 1] of 2D/3D coordinates.

Parameters
  • batchsize (int) – Batch size.

  • size (tuple) – (height, width) or (depth, height, width).

  • minval (float) – minimum value in returned grid.

  • maxval (float) – maximum value in returned grid.

Returns

Grid of coordinates.

Return type

t_grid (4D tensor)

imaginaire.model_utils.fs_vid2vid.get_hand_bbox_for_output(data_cfg, pose)[source]

Get coordinates of the hand bounding box.

Parameters
  • data_cfg (obj) – Data configuration.

  • pose (NxCxHxW tensor) – Pose label map.

Returns

Hand bbox.

Return type

output (list of int)

imaginaire.model_utils.fs_vid2vid.get_part_mask(densepose_map)[source]

Obtain mask of different body parts of humans. This is done by looking at the body part map from DensePose.

Parameters

densepose_map (NxCxHxW tensor) – DensePose map.

Returns

Body part mask, where K is the number of parts.

Return type

mask (NxKxHxW tensor)

imaginaire.model_utils.fs_vid2vid.get_person_bbox_for_data(pose_map, orig_img_size, scale=1.5, crop_aspect_ratio=1, offset=None)[source]

Get the bbox (pixel coordinates) to crop for person body region.

Parameters
  • pose_map (NxCxHxW tensor) – Input pose map.

  • orig_img_size (int tuple) – Height and width of the input image size.

  • scale (float) – When training, randomly scale the crop size for

  • augmentation.

  • crop_aspect_ratio (float) – Output aspect ratio,

  • offset (list of float) – Offset for crop position.

Returns

bbox for body region.

Return type

crop_coords (list of int)

imaginaire.model_utils.fs_vid2vid.normalize_faces(keypoints, ref_keypoints, dist_scale_x=None, dist_scale_y=None)[source]

Normalize face keypoints w.r.t. the reference face keypoints.

Parameters
  • keypoints (Kx2 numpy array) – target facial keypoints.

  • ref_keypoints (Kx2 numpy array) – reference facial keypoints.

Returns

normalized facial keypoints.

Return type

keypoints (Kx2 numpy array)

imaginaire.model_utils.fs_vid2vid.pick_image(images, idx)[source]

Pick the image among images according to idx.

Parameters
  • images (B x N x C x H x W tensor or list of tensors) – N images.

  • idx (B tensor) – indices to select.

Returns

Selected images.

Return type

image (B x C x H x W)

imaginaire.model_utils.fs_vid2vid.pre_process_densepose(pose_cfg, pose_map, is_infer=False)[source]

Pre-process the DensePose part of input label map.

Parameters
  • pose_cfg (obj) – Pose data configuration.

  • pose_map (NxCxHxW tensor) – Pose label map.

  • is_infer (bool) – Is doing inference.

Returns

Processed pose label map.

Return type

pose_map (NxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.random_roll(tensors)[source]

Randomly roll the input tensors along x and y dimensions. Also randomly flip the tensors.

Parameters

tensors (list of 4D tensors) – Input tensors.

Returns

Rolled tensors.

Return type

output (list of 4D tensors)

imaginaire.model_utils.fs_vid2vid.remove_other_ppl(labels, densemasks)[source]

Remove other people in the label map except for the current target by looking at the id in the densemask map.

Parameters
  • labels (NxCxHxW tensor) – Input labels.

  • densemasks (Nx1xHxW tensor) – Densemask maps.

Returns

Output labels.

Return type

labels (NxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.resample(image, flow)[source]

Resamples an image using the provided flow.

Parameters
  • image (NxCxHxW tensor) – Image to resample.

  • flow (Nx2xHxW tensor) – Optical flow to resample the image.

Returns

Resampled image.

Return type

output (NxCxHxW tensor)

imaginaire.model_utils.fs_vid2vid.roll(t, ny, nx, flip=False)[source]

Roll and flip the tensor by specified amounts.

Parameters
  • t (4D tensor) – Input tensor.

  • ny (int) – Amount to roll along y dimension.

  • nx (int) – Amount to roll along x dimension.

  • flip (bool) – Whether to flip input.

Returns

Output tensor.

Return type

t (4D tensor)

imaginaire.model_utils.fs_vid2vid.select_object(data, obj_indices=None)[source]

Select the object/person in the dict according to the object index. Currently it’s used to select the target person in OpenPose dict.

Parameters
  • data (dict) – Input data.

  • obj_indices (list of int) – Indices for the objects to select.

Returns

Output data.

Return type

data (dict)

imaginaire.model_utils.label module

imaginaire.model_utils.label.concat_few_shot_labels(cfg, is_inference, data)[source]
imaginaire.model_utils.label.concat_labels(cfg, is_inference, data)[source]
imaginaire.model_utils.label.make_one_hot(cfg, is_inference, data)[source]

Convert appropriate image data types to one-hot representation.

Parameters

data (dict) – Dict containing data_type as key, with each value as a list of torch.Tensors.

Returns

same as input data, but with one-hot for selected types.

Return type

data (dict)

imaginaire.model_utils.label.move_dont_care(cfg, is_inference, data)[source]

imaginaire.model_utils.pix2pixHD module

Utils for the pix2pixHD model.

imaginaire.model_utils.pix2pixHD.cluster_features(cfg, train_data_loader, net_E, preprocess=None, small_ratio=0.0625, is_cityscapes=True)[source]

Use clustering to compute the features.

Parameters
  • cfg (obj) – Global configuration file.

  • train_data_loader (obj) – Dataloader for iterate through the training set.

  • net_E (nn.Module) – Pytorch network.

  • preprocess (function) – Pre-processing function.

  • small_ratio (float) – We only consider instance that at least occupy $(small_ratio) amount of image space.

  • is_cityscapes (bool) – Is this is the cityscape dataset? In the Cityscapes dataset, the instance labels for car start with 26001, 26002, …

Returns

cluster centers.

Return type

( num_labels x num_cluster_centers x feature_dims)

imaginaire.model_utils.pix2pixHD.encode_features(net_E, feat_nc, label_nc, image, inst, is_cityscapes=True)[source]

Compute feature embeddings for an image image. TODO(Ting-Chun): To make this funciton dataset independent.

Parameters
  • net_E (nn.Module) – The encoder network.

  • feat_nc (int) – Feature dimensions

  • label_nc (int) – Number of segmentation labels.

  • image (tensor) – Input image tensor.

  • inst (tensor) – Input instance map.

  • is_cityscapes (bool) – Is this is the cityscape dataset? In the Cityscapes dataset, the instance labels for car start with 26001, 26002, …

Returns

We will have $(label_nc)

list. For each list, it will record a list of feature vectors of dimension $(feat_nc+1) where the first $(feat_nc) dimensions is the representative feature of an instance and the last dimension is the proportion.

Return type

(list of list of numpy vectors)

imaginaire.model_utils.pix2pixHD.get_edges(t)[source]

Compute edge maps for a given input instance map.

Parameters

t (4D tensor) – Input instance map.

Returns

Output edge map.

Return type

(4D tensor)

imaginaire.model_utils.pix2pixHD.get_optimizer_with_params(cfg, net_G, net_D, param_names_start_with=[], param_names_include=[])[source]

Return the optimizer object.

Parameters
  • cfg (obj) – Global config.

  • net_G (obj) – Generator network.

  • net_D (obj) – Discriminator network.

  • param_names_start_with (list of strings) – Params whose names start with any of the strings will be trained.

  • param_names_include (list of strings) – Params whose names include any of the strings will be trained.

imaginaire.model_utils.pix2pixHD.get_train_params(net, param_names_start_with=[], param_names_include=[])[source]

Get train parameters.

Parameters
  • net (obj) – Network object.

  • param_names_start_with (list of strings) – Params whose names start with any of the strings will be trained.

  • param_names_include (list of strings) – Params whose names include any of the strings will be trained.

imaginaire.model_utils.rename_inputs module

imaginaire.model_utils.rename_inputs.rename_inputs(cfg, is_inference, data)[source]

Module contents