imaginaire.model_utils package¶
Subpackages¶
- imaginaire.model_utils.gancraft package
- Subpackages
- Submodules
- imaginaire.model_utils.gancraft.camctl module
- imaginaire.model_utils.gancraft.layers module
- imaginaire.model_utils.gancraft.loss module
- imaginaire.model_utils.gancraft.mc_lbl_reduction module
- imaginaire.model_utils.gancraft.mc_utils module
- Module contents
- imaginaire.model_utils.wc_vid2vid package
Submodules¶
imaginaire.model_utils.fs_vid2vid module¶
Utils for the few shot vid2vid model.
-
imaginaire.model_utils.fs_vid2vid.
combine_fg_mask
(fg_mask, ref_fg_mask, has_fg)[source]¶ Get the union of target and reference foreground masks. :param fg_mask: Foreground mask for target image. :type fg_mask: tensor :param ref_fg_mask: Foreground mask for reference image. :type ref_fg_mask: tensor :param has_fg: Whether the image can be classified into fg/bg. :type has_fg: bool
- Returns
Combined foreground mask.
- Return type
output (tensor or int)
-
imaginaire.model_utils.fs_vid2vid.
concat_frames
(prev, now, n_frames)[source]¶ Concat previous and current frames and only keep the latest $(n_frames). If concatenated frames are longer than $(n_frames), drop the oldest one.
- Parameters
prev (NxTxCxHxW tensor) – Tensor for previous frames.
now (NxCxHxW tensor) – Tensor for current frame.
n_frames (int) – Max number of frames to store.
- Returns
Updated tensor.
- Return type
result (NxTxCxHxW tensor)
-
imaginaire.model_utils.fs_vid2vid.
crop_and_resize
(img, coords, size=None, method='bilinear')[source]¶ Crop the image using the given coordinates and resize to target size.
- Parameters
img (tensor or list of tensors) – Input image.
coords (list of int) – Pixel coordinates to crop.
size (list of int) – Output size.
method (str) – Interpolation method.
- Returns
Output image.
- Return type
img (tensor or list of tensors)
-
imaginaire.model_utils.fs_vid2vid.
crop_face_from_data
(cfg, is_inference, data)[source]¶ Crop the face regions in input data and resize to the target size. This is for training face datasets.
- Parameters
cfg (obj) – Data configuration.
is_inference (bool) – Is doing inference or not.
data (dict) – Input data.
- Returns
Cropped data.
- Return type
data (dict)
-
imaginaire.model_utils.fs_vid2vid.
crop_face_from_output
(data_cfg, image, input_label, crop_smaller=0)[source]¶ Crop out the face region of the image (and resize if necessary to feed into generator/discriminator).
- Parameters
data_cfg (obj) – Data configuration.
image (NxC1xHxW tensor or list of tensors) – Image to crop.
input_label (NxC2xHxW tensor) – Input label map.
crop_smaller (int) – Number of pixels to crop slightly smaller region.
- Returns
Cropped image.
- Return type
output (NxC1xHxW tensor or list of tensors)
-
imaginaire.model_utils.fs_vid2vid.
crop_hand_from_output
(data_cfg, image, input_label)[source]¶ Crop out the hand region of the image.
- Parameters
data_cfg (obj) – Data configuration.
image (NxC1xHxW tensor or list of tensors) – Image to crop.
input_label (NxC2xHxW tensor) – Input label map.
- Returns
Cropped image.
- Return type
output (NxC1xHxW tensor or list of tensors)
-
imaginaire.model_utils.fs_vid2vid.
crop_person_from_data
(cfg, is_inference, data)[source]¶ Crop the person regions in data and resize to the target size. This is for training full body datasets.
- Parameters
cfg (obj) – Data configuration.
is_inference (bool) – Is doing inference or not.
data (dict) – Input data.
- Returns
Cropped data.
- Return type
data (dict)
-
imaginaire.model_utils.fs_vid2vid.
detach
(output)[source]¶ Detach tensors in the dict.
- Parameters
output (dict) – Output dict.
- Returns
Detached output dict.
- Return type
output (dict)
-
imaginaire.model_utils.fs_vid2vid.
extract_valid_pose_labels
(pose_map, pose_type, remove_face_labels, do_remove=True)[source]¶ Remove some labels (e.g. face regions) in the pose map if necessary.
- Parameters
pose_map (3D, 4D or 5D tensor) – Input pose map.
pose_type (str) – ‘both’ or ‘open’.
remove_face_labels (bool) – Whether to remove labels for the face region.
do_remove (bool) – Do remove face labels.
- Returns
Output pose map.
- Return type
pose_map (3D, 4D or 5D tensor)
-
imaginaire.model_utils.fs_vid2vid.
get_face_bbox_for_data
(keypoints, orig_img_size, scale, is_inference)[source]¶ Get the bbox coordinates for face region.
- Parameters
keypoints (Nx2 tensor) – Facial landmarks.
orig_img_size (int tuple) – Height and width of the input image size.
scale (float) – When training, randomly scale the crop size for
augmentation. –
is_inference (bool) – Is doing inference or not.
- Returns
bbox for face region. scale (float): Also returns scale to ensure reference and target frames are croppped using the same scale.
- Return type
crop_coords (list of int)
-
imaginaire.model_utils.fs_vid2vid.
get_face_bbox_for_output
(data_cfg, pose, crop_smaller=0)[source]¶ Get pixel coordinates of the face bounding box.
- Parameters
data_cfg (obj) – Data configuration.
pose (NxCxHxW tensor) – Pose label map.
crop_smaller (int) – Number of pixels to crop slightly smaller region.
- Returns
Face bbox.
- Return type
output (list of int)
-
imaginaire.model_utils.fs_vid2vid.
get_face_mask
(densepose_map)[source]¶ Obtain mask of faces. :param densepose_map: DensePose map. :type densepose_map: 3D or 4D tensor
- Returns
Face mask.
- Return type
mask (3D or 4D tensor)
-
imaginaire.model_utils.fs_vid2vid.
get_fg_mask
(densepose_map, has_fg)[source]¶ Obtain the foreground mask for pose sequences, which only includes the human. This is done by looking at the body part map from DensePose.
- Parameters
densepose_map (NxCxHxW tensor) – DensePose map.
has_fg (bool) – Whether data has foreground or not.
- Returns
fg mask.
- Return type
mask (Nx1xHxW tensor)
-
imaginaire.model_utils.fs_vid2vid.
get_grid
(batchsize, size, minval=-1.0, maxval=1.0)[source]¶ Get a grid ranging [-1, 1] of 2D/3D coordinates.
- Parameters
batchsize (int) – Batch size.
size (tuple) – (height, width) or (depth, height, width).
minval (float) – minimum value in returned grid.
maxval (float) – maximum value in returned grid.
- Returns
Grid of coordinates.
- Return type
t_grid (4D tensor)
-
imaginaire.model_utils.fs_vid2vid.
get_hand_bbox_for_output
(data_cfg, pose)[source]¶ Get coordinates of the hand bounding box.
- Parameters
data_cfg (obj) – Data configuration.
pose (NxCxHxW tensor) – Pose label map.
- Returns
Hand bbox.
- Return type
output (list of int)
-
imaginaire.model_utils.fs_vid2vid.
get_part_mask
(densepose_map)[source]¶ Obtain mask of different body parts of humans. This is done by looking at the body part map from DensePose.
- Parameters
densepose_map (NxCxHxW tensor) – DensePose map.
- Returns
Body part mask, where K is the number of parts.
- Return type
mask (NxKxHxW tensor)
-
imaginaire.model_utils.fs_vid2vid.
get_person_bbox_for_data
(pose_map, orig_img_size, scale=1.5, crop_aspect_ratio=1, offset=None)[source]¶ Get the bbox (pixel coordinates) to crop for person body region.
- Parameters
pose_map (NxCxHxW tensor) – Input pose map.
orig_img_size (int tuple) – Height and width of the input image size.
scale (float) – When training, randomly scale the crop size for
augmentation. –
crop_aspect_ratio (float) – Output aspect ratio,
offset (list of float) – Offset for crop position.
- Returns
bbox for body region.
- Return type
crop_coords (list of int)
-
imaginaire.model_utils.fs_vid2vid.
normalize_faces
(keypoints, ref_keypoints, dist_scale_x=None, dist_scale_y=None)[source]¶ Normalize face keypoints w.r.t. the reference face keypoints.
- Parameters
keypoints (Kx2 numpy array) – target facial keypoints.
ref_keypoints (Kx2 numpy array) – reference facial keypoints.
- Returns
normalized facial keypoints.
- Return type
keypoints (Kx2 numpy array)
-
imaginaire.model_utils.fs_vid2vid.
pick_image
(images, idx)[source]¶ Pick the image among images according to idx.
- Parameters
images (B x N x C x H x W tensor or list of tensors) – N images.
idx (B tensor) – indices to select.
- Returns
Selected images.
- Return type
image (B x C x H x W)
-
imaginaire.model_utils.fs_vid2vid.
pre_process_densepose
(pose_cfg, pose_map, is_infer=False)[source]¶ Pre-process the DensePose part of input label map.
- Parameters
pose_cfg (obj) – Pose data configuration.
pose_map (NxCxHxW tensor) – Pose label map.
is_infer (bool) – Is doing inference.
- Returns
Processed pose label map.
- Return type
pose_map (NxCxHxW tensor)
-
imaginaire.model_utils.fs_vid2vid.
random_roll
(tensors)[source]¶ Randomly roll the input tensors along x and y dimensions. Also randomly flip the tensors.
- Parameters
tensors (list of 4D tensors) – Input tensors.
- Returns
Rolled tensors.
- Return type
output (list of 4D tensors)
-
imaginaire.model_utils.fs_vid2vid.
remove_other_ppl
(labels, densemasks)[source]¶ Remove other people in the label map except for the current target by looking at the id in the densemask map.
- Parameters
labels (NxCxHxW tensor) – Input labels.
densemasks (Nx1xHxW tensor) – Densemask maps.
- Returns
Output labels.
- Return type
labels (NxCxHxW tensor)
-
imaginaire.model_utils.fs_vid2vid.
resample
(image, flow)[source]¶ Resamples an image using the provided flow.
- Parameters
image (NxCxHxW tensor) – Image to resample.
flow (Nx2xHxW tensor) – Optical flow to resample the image.
- Returns
Resampled image.
- Return type
output (NxCxHxW tensor)
-
imaginaire.model_utils.fs_vid2vid.
roll
(t, ny, nx, flip=False)[source]¶ Roll and flip the tensor by specified amounts.
- Parameters
t (4D tensor) – Input tensor.
ny (int) – Amount to roll along y dimension.
nx (int) – Amount to roll along x dimension.
flip (bool) – Whether to flip input.
- Returns
Output tensor.
- Return type
t (4D tensor)
-
imaginaire.model_utils.fs_vid2vid.
select_object
(data, obj_indices=None)[source]¶ Select the object/person in the dict according to the object index. Currently it’s used to select the target person in OpenPose dict.
- Parameters
data (dict) – Input data.
obj_indices (list of int) – Indices for the objects to select.
- Returns
Output data.
- Return type
data (dict)
imaginaire.model_utils.label module¶
-
imaginaire.model_utils.label.
make_one_hot
(cfg, is_inference, data)[source]¶ Convert appropriate image data types to one-hot representation.
- Parameters
data (dict) – Dict containing data_type as key, with each value as a list of torch.Tensors.
- Returns
same as input data, but with one-hot for selected types.
- Return type
data (dict)
imaginaire.model_utils.pix2pixHD module¶
Utils for the pix2pixHD model.
-
imaginaire.model_utils.pix2pixHD.
cluster_features
(cfg, train_data_loader, net_E, preprocess=None, small_ratio=0.0625, is_cityscapes=True)[source]¶ Use clustering to compute the features.
- Parameters
cfg (obj) – Global configuration file.
train_data_loader (obj) – Dataloader for iterate through the training set.
net_E (nn.Module) – Pytorch network.
preprocess (function) – Pre-processing function.
small_ratio (float) – We only consider instance that at least occupy $(small_ratio) amount of image space.
is_cityscapes (bool) – Is this is the cityscape dataset? In the Cityscapes dataset, the instance labels for car start with 26001, 26002, …
- Returns
cluster centers.
- Return type
( num_labels x num_cluster_centers x feature_dims)
-
imaginaire.model_utils.pix2pixHD.
encode_features
(net_E, feat_nc, label_nc, image, inst, is_cityscapes=True)[source]¶ Compute feature embeddings for an image image. TODO(Ting-Chun): To make this funciton dataset independent.
- Parameters
net_E (nn.Module) – The encoder network.
feat_nc (int) – Feature dimensions
label_nc (int) – Number of segmentation labels.
image (tensor) – Input image tensor.
inst (tensor) – Input instance map.
is_cityscapes (bool) – Is this is the cityscape dataset? In the Cityscapes dataset, the instance labels for car start with 26001, 26002, …
- Returns
- We will have $(label_nc)
list. For each list, it will record a list of feature vectors of dimension $(feat_nc+1) where the first $(feat_nc) dimensions is the representative feature of an instance and the last dimension is the proportion.
- Return type
(list of list of numpy vectors)
-
imaginaire.model_utils.pix2pixHD.
get_edges
(t)[source]¶ Compute edge maps for a given input instance map.
- Parameters
t (4D tensor) – Input instance map.
- Returns
Output edge map.
- Return type
(4D tensor)
-
imaginaire.model_utils.pix2pixHD.
get_optimizer_with_params
(cfg, net_G, net_D, param_names_start_with=[], param_names_include=[])[source]¶ Return the optimizer object.
- Parameters
cfg (obj) – Global config.
net_G (obj) – Generator network.
net_D (obj) – Discriminator network.
param_names_start_with (list of strings) – Params whose names start with any of the strings will be trained.
param_names_include (list of strings) – Params whose names include any of the strings will be trained.
-
imaginaire.model_utils.pix2pixHD.
get_train_params
(net, param_names_start_with=[], param_names_include=[])[source]¶ Get train parameters.
- Parameters
net (obj) – Network object.
param_names_start_with (list of strings) – Params whose names start with any of the strings will be trained.
param_names_include (list of strings) – Params whose names include any of the strings will be trained.