Preprocess images for deep learning

MIRP can be used to preprocess images for deep learning. Images are processed using the standard image processing workflow that is compliant with Image Biomarker Standardisation Initiative (IBSI), with a final cropping step (if any).

The deep learning preprocessing function comes in two versions:

Example

MIRP can be used to crop images, e.g. to make them conform to the input of convolutional neural networks:

from mirp import deep_learning_preprocessing

processed_data = deep_learning_preprocessing(
    image="path to image",
    mask="path to mask",
    crop_size=[50, 224, 224]
)

Parallel processing example

MIRP supports parallel processing using ray and joblib. Using parallel processing, multiple images can be processed at the same time. There two relevant parameters: num_cpus and parallel_backend. num_cpus determines the number of workers that will be spawned. parallel_backend determines the backend using for parallel processing, i.e. "ray" or "joblib".

In the example below, we extract features from images using 2 workers on a joblib backend.

from mirp import deep_learning_preprocessing

feature_data = deep_learning_preprocessing(
    image="path to image",
    mask="path to mask",
    crop_size=[50, 224, 224]
    num_cpus=2,
    parallel_backend="joblib"
)

joblib can also be used within a generator context, i.e. with deep_learning_preprocessing_generator, but ray cannot. Both ray and joblib are optional dependencies of MIRP and need to be installed separately.

API documentation

mirp.deep_learning_preprocessing.deep_learning_preprocessing(output_slices: bool = False, crop_size: None | list[float] | list[int] = None, image_export_format: str = 'dict', write_file_format: str = 'numpy', export_images: None | bool = None, write_images: None | bool = None, write_dir: None | str = None, num_cpus: None | int = None, parallel_backend: None | str = None, **kwargs) None | list[Any][source]

Pre-processes images for deep learning.

Parameters:
  • output_slices (bool, optional, default: False) – Determines whether separate slices should be extracted.

  • crop_size (list of float or list of int, optional, default: None) –

    Size to which the images and masks should be cropped. Images and masks are cropped around the center of the mask(s).

    Note

    MIRP follows the numpy convention for indexing (z, y, x). The final element always corresponds to the x dimension.

  • image_export_format ({"dict", "native", "numpy"}, default: "dict") – Return format for processed images and masks. "dict" returns dictionaries of images and masks as numpy arrays and associated characteristics. "native" returns images and masks in their internal format. "numpy" returns images and masks in numpy format. This argument is only used if export_images=True.

  • write_file_format ({"nifti", "numpy"}, default: "numpy") – File format for processed images and masks. "nifti" writes images and masks in the NIfTI file format, and "numpy" writes images and masks as numpy files. This argument is only used if write_images=True.

  • export_images (bool, optional) – Determines whether processed images and masks should be returned by the function.

  • write_images (bool, optional) – Determines whether processed images and masks should be written to the directory indicated by the write_dir keyword argument.

  • write_dir (str, optional) – Path to directory where processed images and masks should be written. If not set, processed images and masks are returned by this function. Required if write_images=True.

  • num_cpus (int, optional, default: None) – Number of CPU nodes that should be used for parallel processing. Image and mask processing can be parallelized using the ray or joblib packages. If a ray cluster is defined by the user, this cluster will be used instead. By default, image and mask processing are processed sequentially.

  • parallel_backend ({"none", "ray", "joblib"}, optional, default: "none") – Type of backend to use. Default is the sequential backend ("none"). Alternative backends are "ray" and "joblib", which rely on the ray and joblib libraries respectively.

  • **kwargs – Keyword arguments passed for importing images and masks ( import_image_and_mask()) and configuring settings (notably ImagePostProcessingClass, ImagePerturbationSettingsClass), among others.

Returns:

List of images and masks in the format indicated by image_export_format, if export_images=True.

Return type:

None | list[Any]

See also

Keyword arguments can be provided to configure the following:

mirp.deep_learning_preprocessing.deep_learning_preprocessing_generator(output_slices: bool = False, crop_size: None | list[float] | list[int] = None, image_export_format: str = 'dict', write_file_format: str = 'numpy', export_images: None | bool = None, write_images: None | bool = None, write_dir: None | str = None, num_cpus: None | int = None, parallel_backend: None | str = None, **kwargs) Generator[Any, None, None][source]

Generator for pre-processing images for deep learning.

Parameters:
  • output_slices (bool, optional, default: False) – Determines whether separate slices should be extracted.

  • crop_size (list of float or list of int, optional, default: None) –

    Size to which the images and masks should be cropped. Images and masks are cropped around the center of the mask(s).

    Note

    MIRP follows the numpy convention for indexing (z, y, x). The final element always corresponds to the x dimension.

  • image_export_format ({"dict", "native", "numpy"}, default: "dict") – Return format for processed images and masks. "dict" returns dictionaries of images and masks as numpy arrays and associated characteristics. "native" returns images and masks in their internal format. "numpy" returns images and masks in numpy format. This argument is only used if export_images=True.

  • write_file_format ({"nifti", "numpy"}, default: "numpy") – File format for processed images and masks. "nifti" writes images and masks in the NIfTI file format, and "numpy" writes images and masks as numpy files. This argument is only used if write_images=True.

  • export_images (bool, optional) – Determines whether processed images and masks should be returned by the function.

  • write_images (bool, optional) – Determines whether processed images and masks should be written to the directory indicated by the write_dir keyword argument.

  • write_dir (str, optional) – Path to directory where processed images and masks should be written. If not set, processed images and masks are returned by this function. Required if write_images=True.

  • num_cpus (int, optional, default: None) – Number of CPU nodes that should be used for parallel processing. Image and mask processing can be parallelized using the joblib package. By default, image and mask processing are processed sequentially.

  • parallel_backend ({"none", "joblib"}, optional, default: "none") – Type of backend to use. Default is the sequential backend ("none"). "joblib" can be used as an alternative backend. "ray" cannot be used in a generator context, because only a single worker will be used.

  • **kwargs – Keyword arguments passed for importing images and masks ( import_image_and_mask()) and configuring settings (notably ImagePostProcessingClass, ImagePerturbationSettingsClass), among others.

Yields:

None | list[Any] – List of images and masks in the format indicated by image_export_format, if export_images=True.

See also

Keyword arguments can be provided to configure the following: