Preprocess images for deep learning
MIRP can be used to preprocess images for deep learning. Images are processed using the standard image processing workflow that is compliant with Image Biomarker Standardisation Initiative (IBSI), with a final cropping step (if any).
The deep learning preprocessing function comes in two versions:
deep_learning_preprocessing()
: conventional function that processes images.deep_learning_preprocessing_generator()
: generator that yields processed images.
Example
MIRP can be used to crop images, e.g. to make them conform to the input of convolutional neural networks:
from mirp import deep_learning_preprocessing
processed_data = deep_learning_preprocessing(
image="path to image",
mask="path to mask",
crop_size=[50, 224, 224]
)
Parallel processing example
MIRP supports parallel processing using ray
and joblib
. Using parallel processing, multiple images can be
processed at the same time. There two relevant parameters: num_cpus
and parallel_backend
. num_cpus
determines the number of workers that will be spawned. parallel_backend
determines the backend using for parallel
processing, i.e. "ray"
or "joblib"
.
In the example below, we extract features from images using 2 workers on a joblib
backend.
from mirp import deep_learning_preprocessing
feature_data = deep_learning_preprocessing(
image="path to image",
mask="path to mask",
crop_size=[50, 224, 224]
num_cpus=2,
parallel_backend="joblib"
)
joblib
can also be used within a generator context, i.e. with deep_learning_preprocessing_generator
, but ray
cannot.
Both ray
and joblib
are optional dependencies of MIRP and need to be installed separately.
API documentation
- mirp.deep_learning_preprocessing.deep_learning_preprocessing(output_slices: bool = False, crop_size: None | list[float] | list[int] = None, image_export_format: str = 'dict', write_file_format: str = 'numpy', export_images: None | bool = None, write_images: None | bool = None, write_dir: None | str = None, num_cpus: None | int = None, parallel_backend: None | str = None, **kwargs) None | list[Any] [source]
Pre-processes images for deep learning.
- Parameters:
output_slices (bool, optional, default: False) – Determines whether separate slices should be extracted.
crop_size (list of float or list of int, optional, default: None) –
Size to which the images and masks should be cropped. Images and masks are cropped around the center of the mask(s).
Note
MIRP follows the numpy convention for indexing (z, y, x). The final element always corresponds to the x dimension.
image_export_format ({"dict", "native", "numpy"}, default: "dict") – Return format for processed images and masks.
"dict"
returns dictionaries of images and masks as numpy arrays and associated characteristics."native"
returns images and masks in their internal format."numpy"
returns images and masks in numpy format. This argument is only used ifexport_images=True
.write_file_format ({"nifti", "numpy"}, default: "numpy") – File format for processed images and masks.
"nifti"
writes images and masks in the NIfTI file format, and"numpy"
writes images and masks as numpy files. This argument is only used ifwrite_images=True
.export_images (bool, optional) – Determines whether processed images and masks should be returned by the function.
write_images (bool, optional) – Determines whether processed images and masks should be written to the directory indicated by the
write_dir
keyword argument.write_dir (str, optional) – Path to directory where processed images and masks should be written. If not set, processed images and masks are returned by this function. Required if
write_images=True
.num_cpus (int, optional, default: None) – Number of CPU nodes that should be used for parallel processing. Image and mask processing can be parallelized using the
ray
orjoblib
packages. If a ray cluster is defined by the user, this cluster will be used instead. By default, image and mask processing are processed sequentially.parallel_backend ({"none", "ray", "joblib"}, optional, default: "none") – Type of backend to use. Default is the sequential backend (
"none"
). Alternative backends are"ray"
and"joblib"
, which rely on the ray and joblib libraries respectively.**kwargs – Keyword arguments passed for importing images and masks (
import_image_and_mask()
) and configuring settings (notablyImagePostProcessingClass
,ImagePerturbationSettingsClass
), among others.
- Returns:
List of images and masks in the format indicated by
image_export_format
, ifexport_images=True
.- Return type:
None | list[Any]
See also
Keyword arguments can be provided to configure the following:
image and mask import (
import_image_and_mask()
)image post-processing (
ImagePostProcessingClass
)image perturbation / augmentation (
ImagePerturbationSettingsClass
)image interpolation / resampling (
ImageInterpolationSettingsClass
andMaskInterpolationSettingsClass
)mask resegmentation (
ResegmentationSettingsClass
)
- mirp.deep_learning_preprocessing.deep_learning_preprocessing_generator(output_slices: bool = False, crop_size: None | list[float] | list[int] = None, image_export_format: str = 'dict', write_file_format: str = 'numpy', export_images: None | bool = None, write_images: None | bool = None, write_dir: None | str = None, num_cpus: None | int = None, parallel_backend: None | str = None, **kwargs) Generator[Any, None, None] [source]
Generator for pre-processing images for deep learning.
- Parameters:
output_slices (bool, optional, default: False) – Determines whether separate slices should be extracted.
crop_size (list of float or list of int, optional, default: None) –
Size to which the images and masks should be cropped. Images and masks are cropped around the center of the mask(s).
Note
MIRP follows the numpy convention for indexing (z, y, x). The final element always corresponds to the x dimension.
image_export_format ({"dict", "native", "numpy"}, default: "dict") – Return format for processed images and masks.
"dict"
returns dictionaries of images and masks as numpy arrays and associated characteristics."native"
returns images and masks in their internal format."numpy"
returns images and masks in numpy format. This argument is only used ifexport_images=True
.write_file_format ({"nifti", "numpy"}, default: "numpy") – File format for processed images and masks.
"nifti"
writes images and masks in the NIfTI file format, and"numpy"
writes images and masks as numpy files. This argument is only used ifwrite_images=True
.export_images (bool, optional) – Determines whether processed images and masks should be returned by the function.
write_images (bool, optional) – Determines whether processed images and masks should be written to the directory indicated by the
write_dir
keyword argument.write_dir (str, optional) – Path to directory where processed images and masks should be written. If not set, processed images and masks are returned by this function. Required if
write_images=True
.num_cpus (int, optional, default: None) – Number of CPU nodes that should be used for parallel processing. Image and mask processing can be parallelized using the
joblib
package. By default, image and mask processing are processed sequentially.parallel_backend ({"none", "joblib"}, optional, default: "none") – Type of backend to use. Default is the sequential backend (
"none"
)."joblib"
can be used as an alternative backend."ray"
cannot be used in a generator context, because only a single worker will be used.**kwargs – Keyword arguments passed for importing images and masks (
import_image_and_mask()
) and configuring settings (notablyImagePostProcessingClass
,ImagePerturbationSettingsClass
), among others.
- Yields:
None | list[Any] – List of images and masks in the format indicated by
image_export_format
, ifexport_images=True
.
See also
Keyword arguments can be provided to configure the following:
image and mask import (
import_image_and_mask()
)image post-processing (
ImagePostProcessingClass
)image perturbation / augmentation (
ImagePerturbationSettingsClass
)image interpolation / resampling (
ImageInterpolationSettingsClass
andMaskInterpolationSettingsClass
)mask resegmentation (
ResegmentationSettingsClass
)