Tutorial: Applying image filters

This tutorial describes how to apply image filters using MIRP.

Download example data

Here we use a publicly available chest CT dataset, that consist of a single image dataset and a mask.

[2]:

from urllib.request import urlopen
from io import BytesIO
from zipfile import ZipFile

url = r"https://github.com/oncoray/mirp/raw/598293f7afb179b525b49f9b8300a9914fbdebd4/data/tutorial_radiomics_chest_ct_data.zip"

# Specify location where the data is stored.
save_dir = "."

with urlopen(url) as zip_url_pointer:
    with ZipFile(BytesIO(zip_url_pointer.read())) as example_data:
        example_data.extractall(save_dir)

This creates a folder with the following structure:

chest_ct
 ├─ image
 |  ├─ DCM_IMG_00000.dcm
 |  ├─ ...
 |  └─ DCM_IMG_00059.dcm
 └─ mask
    └─ DCM_RS_00060.dcm

In this example dataset, the CT image slices are stored in DICOM format in the image directory. A segmentation mask is stored in DICOM format in the mask subdirectory.

Finding mask labels

Radiomics features are typically computed from regions of interest, such as a tumour. These regions are delineated by experts or auto-segmentation AI, and stored as segmentation masks. MIRP needs to know which mask label (region of interest) should be used for computing features. A first step is to identify which mask labels exist. This can be done using the extract_mask_labels function. In this example, we directly provide a path to the mask file (.../chest_ct/mask/DCM_RS_00060.dcm).

[3]:

import os
from mirp import extract_mask_labels

extract_mask_labels(
    mask=os.path.join(save_dir, "chest_ct", "mask", "DCM_RS_00060.dcm"),
)

[3]:

	sample_name	modality	dir_path	file_name	series_instance_uid	frame_of_reference_uid	roi_label	mask_index
0	1	rtstruct	.\chest_ct\mask	DCM_RS_00060.dcm	1.3.6.1.4.1.9590.100.1.2.258301620411152643708...	1.3.6.1.4.1.9590.100.1.2.437537500115184941017...	GTV-1	0

The mask file contains only a single mask, called GTV-1.

Visualising images

It is often useful to inspect images before computing radiomics features. External viewers for DICOM and many other image types exist, but MIRP also has a simple visualisation tool. You can visualise images by exporting them in MIRP internal formats using extract_images:

[4]:

from mirp import extract_images

images = extract_images(
    image=os.path.join(save_dir, "chest_ct", "image"),
    mask=os.path.join(save_dir, "chest_ct", "mask", "DCM_RS_00060.dcm"),
    roi_name="GTV-1",
    image_export_format="native"
)

INFO    : MainProcess    2024-06-18 08:26:19,447         Initialising image extraction using ct images for 1.

By default, extract_images will export dictionaries containing image and mask data (as numpy.ndarray) and associated metadata. That way extract_images can be used to read and process images as part of an external workflow. The default output can be visualised using matplotlib and other tools. Here we use image_export_format="native" to export images and masks in the native MIRP format. The output of extract_images is a list of images and masks, with one entry per image dataset. We only assess a single image here, which means that images only has one element. The nested list always consists of the image – and any derivatives, such as filtered images – and masks associated with the image. We can visualise an exported image using its show method as follows:

[5]:

image, mask = images[0]
image[0].show(mask=mask[0], slice_id=25)

_images/tutorial_apply_image_filter_8_0.png

Though just an image is shown here, executing this code outside of a Jupyter Notebook will start an interactive plotter that can be scrolled through.

The CT image appears as expected: a large solid tumour is located in the right lung lobe.

Assessing image metadata

Image metadata are important for understanding the image and how it was acquired and reconstructed. MIRP allows for exporting image metadata from DICOM and other image formats, though for non-DICOM formats metadata will be considerably more limited.

[6]:

from mirp import extract_image_parameters

extract_image_parameters(
    image=os.path.join(save_dir, "chest_ct", "image")
)

[6]:

	sample_name	modality	dir_path	spacing_z	spacing_y	spacing_x	file_name	series_instance_uid	frame_of_reference_uid	scanner_type	manufacturer	image_type	image_index
0	1	ct	chest_ct\image	3.0	0.977	0.977	DCM_IMG_00059.dcm	1.3.6.1.4.1.9590.100.1.2.296658988911737913102...	1.3.6.1.4.1.9590.100.1.2.437537500115184941017...	CERR	CMS, Inc.	['ORIGINAL', 'PRIMARY', 'AXIAL']	0

Only known metadata are shown. For example, tube voltage was not present in the image metadata in this example.

The metadata, and our use case, have important implications for the image processing:

Since we want to apply filters, we should ensure that pixel or voxel spacing should be isotropic.
The in-plane resolution is a bit higher than the distance between slices, but can be resampled to isotropic voxels, e.g. 1.0 by 1.0 mm by 1.0 mm. We can apply a filter in 3D.

Applying filters

Image filters can be used to enhance specific characteristics of an image. Here we will apply to three filters: a mean filter, a Laplacian-of-Gaussian filter, and a non-separable wavelet filter. To do so, we specify the following parameters:

Voxels are resampled to 1.0 by 1.0 by 1.0 mm (new_spacing=1.0).
Select filters (filter_kernels=["mean", "laplacian_of_gaussian", "nonseparable_wavelet"])
Set width of the mean filter (in voxels) (mean_filter_kernel_size=5).
Set width of the Laplacian-of-Gaussian filter (in mm) (laplacian_of_gaussian_sigma=[1.0, 3.0]). Here we apply two filters: one with σ = 1.0 mm, and one with σ = 3.0.
Use the Simoncelli wavelet (nonseparable_wavelet_families="simoncelli").
Find the first three decompositions of the Simoncelli wavelet (nonseparable_wavelet_decomposition_level=[1, 2, 3]).

Many more parameters can be specified, see Configure the image processing and feature extraction workflow.

[7]:

from mirp import extract_images

images = extract_images(
    image=os.path.join(save_dir, "chest_ct", "image"),
    mask=os.path.join(save_dir, "chest_ct", "mask", "DCM_RS_00060.dcm"),
    roi_name="GTV-1",
    image_export_format="native",
    new_spacing=1.0,
    filter_kernels=["mean", "laplacian_of_gaussian", "nonseparable_wavelet"],
    mean_filter_kernel_size=5,
    laplacian_of_gaussian_sigma=[1.0, 3.0],
    nonseparable_wavelet_families="simoncelli",
    nonseparable_wavelet_decomposition_level=[1, 2, 3]
)
image, mask = images[0]

INFO    : MainProcess    2024-06-18 08:26:22,640         Initialising image extraction using ct images for 1.

Now we can first plot the base image. The base image is always part of the export. Because we resampled the image slices from 3.0 to 1.0 mm, the number of slices increased compared to the original image.

[8]:

image[0].show(slice_id=75)

_images/tutorial_apply_image_filter_14_0.png

Derivative (i.e. filtered) images are stored in the same list as the base image (image[0]).

Mean filter

[9]:

image[1].show(slice_id=75)

_images/tutorial_apply_image_filter_16_0.png

Laplacian-of-Gaussian filter

With σ = 1.0 mm:

[10]:

image[2].show(slice_id=75)

_images/tutorial_apply_image_filter_18_0.png

With σ = 3.0 mm:

[11]:

image[3].show(slice_id=75)

_images/tutorial_apply_image_filter_20_0.png

Nonseparable Simoncelli wavelet filter

First decomposition level:

[12]:

image[4].show(slice_id=75)

_images/tutorial_apply_image_filter_22_0.png

Second decomposition level:

[13]:

image[5].show(slice_id=75)

_images/tutorial_apply_image_filter_24_0.png

Third decomposition level:

[14]:

image[6].show(slice_id=75)

_images/tutorial_apply_image_filter_26_0.png

Computing features

Features can also be computed from filtered images. By default, only statistical features are computed, in addition to features computed from the base image.

For the base image, we need to define parameters related to intensity discretisation for computing histogram-based and texture features. Since a CT image consists of Hounsfield units (HU), which are typically calibrated, we will use a fixed bin size algorithm (base_discretisation_method="fixed_bin_size") with a bin size of 20 HU (base_discretisation_bin_size=20.0). This method requires setting the lowest intensity of the first bin. By default, MIRP uses -1000 HU (corresponding to air). However, this is not always the most useful setting. Here, we will instead specify a soft-tissue window (resegmentation_intensity_range=[-180.0, 200.0]), which sets the lowest intensity of the first bin to -180 HU.

[15]:

import pandas as pd
from mirp import extract_features

features = extract_features(
    image=os.path.join(save_dir, "chest_ct", "image"),
    mask=os.path.join(save_dir, "chest_ct", "mask", "DCM_RS_00060.dcm"),
    roi_name="GTV-1",
    image_export_format="native",
    new_spacing=1.0,
    resegmentation_intensity_range=[-180.0, 200.0],
    base_discretisation_method="fixed_bin_size",
    base_discretisation_bin_width=20.0,
    filter_kernels=["mean", "laplacian_of_gaussian", "nonseparable_wavelet"],
    mean_filter_kernel_size=5,
    laplacian_of_gaussian_sigma=[1.0, 3.0],
    nonseparable_wavelet_families="simoncelli",
    nonseparable_wavelet_decomposition_level=[1, 2, 3]
)

pd.concat(features)

INFO    : MainProcess    2024-06-18 08:26:38,268         Initialising feature computation using ct images for 1.

[15]:

	sample_name	image_file_name	image_directory	image_study_date	image_study_description	image_series_description	image_series_instance_uid	image_modality	image_pet_suv_type	image_mask_label	...	wavelet_simoncelli_level_3_stat_max	wavelet_simoncelli_level_3_stat_iqr	wavelet_simoncelli_level_3_stat_range	wavelet_simoncelli_level_3_stat_mad	wavelet_simoncelli_level_3_stat_rmad	wavelet_simoncelli_level_3_stat_medad	wavelet_simoncelli_level_3_stat_cov	wavelet_simoncelli_level_3_stat_qcod	wavelet_simoncelli_level_3_stat_energy	wavelet_simoncelli_level_3_stat_rms
0	1	None	chest_ct\image	None	None	None	1.3.6.1.4.1.9590.100.1.2.296658988911737913102...	ct	None	GTV-1	...	531.404254	77.65685	863.020984	61.349115	35.8172	58.114405	3.115791	2.045865	2.349373e+09	88.482671

1 rows × 311 columns

This results in a pandas.DataFrame that has a row per image and mask. The first several columns contain parameters related to that image and mask, and how these were processed. The features computed from filtered images are appended after the features from the base image. These features can be used for, e.g., machine learning using scikit-learn or familiar.