Configure image and mask import
Many relevant MIRP functions require images, masks or both as input. This section provides details on how image and mask import is configured.
Specifying input
MIRP processes and analyses images and masks. There are multiple ways to provide images and masks:
- By specifying the directory where images and/or masks are found:
Nested flat layout: In a nested flat layout all images and masks are separated for each sample. For example, an image dataset of 128 samples may be organised as follows:
image_root_directory ├─ sample_001 │ └─ ... ├─ ... └─ sample_127 └─ image_sub_folder ├─ CT_dicom_000.dcm ├─ ... └─ CT_dicom_255.dcm └─ mask.dcm
Images and mask files are directly under the sample directory. Only one keyword argument is required:
some_function( ..., image = "image_root_directory", ... )
MIRP is generally able to determine which files are images and which files are masks. However, there may be cases where MIRP is unable to determine if a file is an image or a mask. In those cases, additional keyword arguments may be provided:
some_function( ..., image = "image_root_directory", image_name = "CT_dicom_*", mask_name = "mask" ... )
Here,
image_name
andmask_name
contain patterns for image and mask files, respectively."CT_dicom_*"
contains a wildcard character (*
) that matches any pattern starting with"CT_dicom_"
. File extensions are never of the pattern.Fully nested structure: In a nested structure all images and masks are separated for each. Unlike the above example, image and mask data may be organised into different subdirectory structures:
image_root_directory ├─ sample_001 │ └─ image_sub_folder │ │ └─ ... │ └─ mask_sub_folder │ └─ ... ├─ ... └─ sample_127 └─ image_sub_folder │ ├─ CT_dicom_000.dcm │ ├─ ... │ └─ CT_dicom_255.dcm └─ mask_sub_folder └─ mask.dcm
Here the directory for each sample contains consistently named subdirectory structures (
image_sub_folder
andmask_sub_folder
), that contains the set of DICOM images and a mask, respectively. Then the following keyword arguments may be specified:some_function( ..., image = "image_root_directory", image_sub_folder = "image_sub_folder", mask_sub_folder = "mask_sub_folder", ... )
The
mask
keyword argument is automatically assumed to be equal toimage
, i.e. images and masks are under the same root directory. If this is not the case,mask
should be specified as well.Note
MIRP will interpret the name of the directory that is neither part of the root directory or the subdirectory structures as the sample name, unless the sample name can be determined from metadata (i.e. DICOM files). In the example above, sample names based on the directory structure would be
"sample_001"
to"sample_127"
.Flat layout: In a flat layout, all image and mask files are contained in the same directory:
image_root_directory ├─ sample_001_CT_dicom_000.dcm ├─ ... ├─ sample_001_CT_dicom_319.dcm ├─ sample_127_CT_dicom_000.dcm ├─ ... ├─ sample_127_CT_dicom_255.dcm ├─ sample_001_mask.dcm ├─ ... └─ sample_127_mask.dcm
Flat layouts are somewhat more challenging for MIRP, as sample identifiers have to be inferred, and images and masks may be hard to associate. For DICOM images sample names and other association data typically can be obtained from the DICOM metadata. For other types of images, e.g. NIfTI or numpy, in a flat layout,
image_name
andmask_name
should be provided:some_function( ..., image = "image_root_directory", image_name = "#_CT_dicom_*", mask_name = "#_mask", ... )
The above example contain two wildcards:
#
and*
that fulfill different roles. While*
matches any pattern,#
matches any pattern and uses that pattern as the sample name. This way, sample identifiers can be determined for flat layouts.
- By providing a direct path to image and mask files:
Single image and mask: A path to an image and mask may be provided as follows:
some_function( ..., image = "image_directory/image.nii.gz", mask = "mask_directory/mask.nii.gz", ... )
Here
"image.nii.gz"
is an image file in NIfTI format, located in the"image_directory"
directory. Similarly,"mask.nii.gz"
is a mask file (containing integer-value labels) that is located in the"mask_directory"
directory.Multiple images and masks: Multiple images and masks can be provided as lists.
some_function( ..., image = ["image_directory/image_001.nii.gz", "image_directory/image_002.nii.gz"], mask = ["mask_directory_001/mask.nii.gz", "mask_directory_002/mask.nii.gz"], ... )
Note
It is possible to provide multiple masks for each image as long as their is some way to associate the image with its masks, e.g. on sample name or frame of reference.
Note
In absence of any further identifiers for associating images and masks, MIRP will treat image and mask lists of equal length as being sorted by element, and associate the first mask with the first image, the second mask with the second image, and so forth.
- By providing the image and mask directly:
Images and masks can be provided directly using
numpy.ndarray
objects.Warning
Even though images can be directly provided as
numpy
arrays, this should only be done if all data has the same (physical) resolution, or if physical resolution does not matter. This is becausenumpy
arrays only contain values, and no metadata concerning pixel or voxel spacing. Internally, MIRP will use a default value of 1.0 × 1.0 × 1.0.Single image and mask: Let
numpy_image
andnumpy_mask
be twonumpy
arrays with the same dimension. Then, these objects can be provided as follows:some_function( ..., image = numpy_image, mask = numpy_mask, ... )
Multiple images and masks: Multiple images and masks can be provided as lists of
numpy
arrays:some_function( ..., image = [numpy_image_001, numpy_image_002] mask = [numpy_mask_001, numpy_mask_002], ... )
Warning
While it is possible to provide multiple masks for each image, in practice there is no safe way to do so. The only way to associate image and masks is by their image dimension, which may be the same for different images. Hence, providing one mask per image is recommended. MIRP will treat image and mask lists of equal length as being sorted by element, and associate the first mask with the first image, the second mask with the second image, and so forth.
By specifying the configuration in a stand-alone data
xml
file. An empty copy of thexml
file can be created usingmirp.utilities.config_utilities.get_data_xml()
. The tags of the``xml`` file are the same as the arguments ofimport_image_and_mask()
, that are listed below.
Selecting specific images and masks
On occasion, input should be more selective. This can be done by specifying additional arguments:
- Select specific samples using
sample_name
: Sample names can be provided as a list of strings to filter images and masks and exclude those that do not appear in the provided list.
Note
If sample names cannot be determined from metadata, directory structure or file names, MIRP cannot filter image and mask files using the provided sample names. In this case, should the list of provided sample names equal that of the images, the provided sample names will be associated one-to-one with images. Otherwise, MIRP will randomly generate sample names.
- Select specific samples using
- Select specific image and mask files based on their file names using
image_name
andmask_name
: MIRP can filter image and mask files based on file names.
image_name
andmask_name
arguments each take a single string as argument. This string is matched exactly, and only file names that match that string are selected. File extensions are ignored.To allow for some flexibility, wildcard characters can be used. MIRP recognises two types of wildcard characters:
*
and#
.*
denotes any character. For example, if files are namedimage_001.nii.gz
,image_002.nii.gz
andanother_image_001.nii.gz
, usingimage_name="image_*"
will selectimage_001.nii.gz
,image_002.nii.gz
. Usingimage_name="*image_*"
will select all three.The other wildcard character (
#
) denotes the part of the file name that is the sample name. For example, if files are namedsample_001_image_001.nii.gz
,sample_001_image_002.nii.gz
andsample_002_image_001.nii.gz
, usingimage_name="#_image_*"
will select all three files, and assignsample_001
,sample_001
andsample_002
as sample names, respectively.The
mask_name
argument functions exactly the same asimage_name
.
- Select specific image and mask files based on their file names using
- Select the image and mask file types using
image_file_type
andmask_file_type
: MIRP can filter image and mask files based on the file type. MIRP currently supports DICOM (
"dicom"
), NIfTI ("nifti"
), NRRD ("nrrd"
) and numpy ("numpy"
) files as file format.
- Select the image and mask file types using
- Select image files based on image modality using
image_modality
: MIRP can filter image files based on the image modality. Aside from generic image modality, MIRP specifically checks for the following modalities:
Computed tomography (CT):
"ct"
Positron emission tomography (PET):
"pet"
or"pt"
Magnetic resonance imaging (MRI):
"mri"
or"mr"
Apparent diffusion coefficient MR map:
"adc"
Diffusion contrast-enhanced MR map:
"dce"
Radiotherapy dose (RTDOSE):
"rtdose"
Computed radiography (CR):
"cr"
or"computed_radiography"
Digital x-ray (DX):
"dx"
or"digital_xray"
Digital mammography (MG):
"mg"
,"mammography"
or"digital_mammography"
Images from other modalities are currently not fully supported, and a default
"generic"
image modality will be assigned.Note
Image modality is important because it adapts the image processing workflow to the requirements and possibilities of each modality. For example, bias-field correction can only be performed on MR imaging, and Hounsfield units are automatically rounded for CT imaging.
Warning
Only DICOM images contain metadata concerning image modality. Images from other file types are interpreted as
"generic"
by default and cannot be filtered usingimage_modality
. For these image, theimage_modality
argument sets the actual image modality.
- Select image files based on image modality using
- Select mask files based on mask modality using
mask_modality
: MIRP can filter mask files based on the modality of the mask. Aside form generic masks, MIRP specifically checks for DICOM radiotherapy structure (RTSTRUCT) and DICOM segmentation (SEG) files.
Note
Only DICOM images contain metadata concerning mask modality. Masks from other file types are interpreted as
"generic_mask"
by default and cannot be filtered usingmask_modality
.Note
Since version
2.1.0
MIRP does not require that images and masks have the exact same dimensions, origin, spacing and orientation, with the exception of numpy images and masks. This is explicitly true for DICOM radiotherapy structure (RTSTRUCT) sets. These are either mapped to the corresponding image if image slices are referenced in the structure set, or use internal data to generate a voxel-based mask. However, images and their masks should share the same frame of reference.
- Select mask files based on mask modality using
- Select the specific regions of interest using
roi_name
: A mask file may contain multiple masks. By default, MIRP will assess all masks in a file. The
roi_name
argument can be used to specify the list of regions of interest that should be assessed. For DICOM mask files, names of regions of interest are provided in the metadata. For other mask file types, masks are either boolean, or non-negative integers. For these,False
or0
are interpreted as background, and not assessed. If, for example, regions of interest are labelled with1
,2
and3
, MIRP will recognize bothroi_name=["1", "2", "3"]
androi_name=["region_1", "region_2", "region_3"]
.You can use the
extract_mask_labels()
function to identify the names of the regions of interest in mask files.
- Select the specific regions of interest using
API documentation
Note
The import_image_and_mask()
function is called internally by other
functions. These function pass through keyword arguments to
import_image_and_mask()
.
- mirp.data_import.import_image_and_mask.import_image_and_mask(image, mask=None, sample_name: None | str | list[str] = None, image_name: None | str | list[str] = None, image_file_type: None | str = None, image_modality: None | str | list[str] = None, image_sub_folder: None | str = None, mask_name: None | str | list[str] = None, mask_file_type: None | str = None, mask_modality: None | str | list[str] = None, mask_sub_folder: None | str = None, roi_name: None | str | list[str] | dict[str, str] = None, association_strategy: None | str | list[str] = None, stack_images: str = 'auto', stack_masks: str = 'auto') list[ImageFile] [source]
Creates and curates references to image and mask files. This function is usually called internally by other functions such as
extract_features()
.- Parameters:
image (Any) – A path to an image file, a path to a directory containing image files, a path to a config_data.xml file, a path to a csv file containing references to image files, a pandas.DataFrame containing references to image files, or a numpy.ndarray.
mask (Any) – A path to a mask file, a path to a directory containing mask files, a path to a config_data.xml file, a path to a csv file containing references to mask files, a pandas.DataFrame containing references to mask files, or a numpy.ndarray.
sample_name (str or list of str, default: None) – Name of expected sample names. This is used to select specific image files. If None, no image files are filtered based on the corresponding sample name (if known).
image_name (str, optional, default: None) – Pattern to match image files against. The matches are exact. Use wildcard symbols (“*”) to match varying structures. The sample name (if part of the file name) can also be specified using “#”. For example, image_name = ‘#_*_image’ would find John_Doe in John_Doe_CT_image.nii or John_Doe_001_image.nii. File extensions do not need to be specified. If None, file names are not used for filtering files and setting sample names.
image_file_type ({"dicom", "nifti", "nrrd", "numpy", "itk"}, optional, default: None) – The type of file that is expected. If None, the file type is not used for filtering files. “itk” comprises “nifti” and “nrrd” file types.
image_modality ({"ct", "pet", "pt", "mri", "mr", "rtdose", "generic"}, optional, default: None) – The type of modality that is expected. If None, modality is not used for filtering files. Note that only DICOM files contain metadata concerning modality.
image_sub_folder (str, optional, default: None) – Fixed directory substructure where image files are located. If None, the directory substructure is not used for filtering files.
mask_name (str or list of str, optional, default: None) – Pattern to match mask files against. The matches are exact. Use wildcard symbols (“*”) to match varying structures. The sample name (if part of the file name) can also be specified using “#”. For example, mask_name = ‘#_*_mask’ would find John_Doe in John_Doe_CT_mask.nii or John_Doe_001_mask.nii. File extensions do not need to be specified. If None, file names are not used for filtering files and setting sample names.
mask_file_type ({"dicom", "nifti", "nrrd", "numpy", "itk"}, optional, default: None) – The type of file that is expected. If None, the file type is not used for filtering files. “itk” comprises “nifti” and “nrrd” file types.
mask_modality ({"rtstruct", "seg", "generic_mask"}, optional, default: None) – The type of modality that is expected. If None, modality is not used for filtering files. Note that only DICOM files contain metadata concerning modality. Masks from non-DICOM files are considered to be “generic_mask”.
mask_sub_folder (str, optional, default: None) – Fixed directory substructure where mask files are located. If None, the directory substructure is not used for filtering files.
roi_name (str or list of str or dict, optional, default: None) – Name of the regions of interest that should be assessed.
association_strategy ({"frame_of_reference", "sample_name", "file_distance", "file_name_similarity", "list_order", "position", "single_image"}) – The preferred strategy for associating images and masks. File association is preferably done using frame of reference UIDs (DICOM), or sample name (NIfTI, numpy). Other options are relatively frail, except for list_order which may be applicable when a list with images and a list with masks is provided and both lists are of equal length.
stack_images ({"auto", "yes", "no"}, optional, default: "str") – If image files in the same directory cannot be assigned to different samples, and are 2D (slices) of the same size, they might belong to the same 3D image stack. “auto” will stack 2D numpy arrays, but not other file types. “yes” will stack all files that contain 2D images, that have the same dimensions, orientation and spacing, except for DICOM files. “no” will not stack any files. DICOM files ignore this argument, because their stacking can be determined from metadata.
stack_masks ({"auto", "yes", "no"}, optional, default: "str") – If mask files in the same directory cannot be assigned to different samples, and are 2D (slices) of the same size, they might belong to the same 3D mask stack. “auto” will stack 2D numpy arrays, but not other file types. “yes” will stack all files that contain 2D images, that have the same dimensions, orientation and spacing, except for DICOM files. “no” will not stack any files. DICOM files ignore this argument, because their stacking can be determined from metadata.
- Returns:
The functions returns a list of ImageFile objects, if any were found with the specified filters.
- Return type:
list[ImageFile]
- mirp.utilities.config_utilities.get_data_xml(target_dir: str | Path)[source]
Creates a local copy of the data
xml
file. This file can be used to configure import of images and masks.- Parameters:
target_dir (str or Path) – Path where the data
xml
file should be copied to.- Returns:
No return values. The data
xml
is copied to the intended directory.- Return type:
None