Extracting patches from Whole Slide Images(WSI)

4 min readJul 9, 2021

Digital pathology has emerged with the digitization of patient tissue samples and in particular the use of digital whole slide images (WSIs). Whole slide imaging (WSI), which refers to the scanning of conventional glass slides in order to produce digital slides, is the most recent imaging modality being employed by pathology departments worldwide.

As the WSI has very large dimensions, analyzing them at the original size creates two problems:

Large input to the Deep Learning model
Data scarcity

Why patch-based method?

Using CNN directly for WSI classification has several drawbacks. First, extensive image downsampling is required by which most of the discriminative details could be lost. Second, it is possible that a CNN might only learn from one of the multiple discriminative patterns in an image, resulting in data inefficiency. Discriminative information is encoded in high-resolution image patches. There- fore, one solution is to train a CNN on high-resolution image patches and predict the label of a WSI based on patch-level predictions.

WSI of Human Kidney tissue with glomerulus

The above image is of shape 16000 X 18500 and stored in ‘tiff’ format, also the annotations of each glomerulus are available in the ‘xml’ file which has the type of glomerulus(Normal or Abnormal) and their coordinates ( as a polygon).

The coordinates of one of the glomeruli and its type in the annotation file

I have used the OpenSlide library that allows us to read the WSI at different levels of magnification. However, they might be challenges in importing the library because of its dependencies.

import numpy as np
import matplotlib.pyplot as plt
import cv2 as cv
import os
import openslide
import xml.etree.cElementTree as ET
import random
import os 
import glob

Define at which level of magnification you want to read the WSI. When mag_level = 0, that means you reading the WSI at the original level without any downsampling.

mag_level = 2               # here I am reading the WSI at level 2
factor = 2**mag_level

Path to WSI folder and Annotation folder and their name:

slidepath = "F:/Renal_Vasculitis/Data/WSI"  #path to folder of WSIs
annotpath = "F:/Renal_Vasculitis/Data/Annotations" 
slidename = "EE 833604 nr17clean.tiff"
annotname = "EE 833604 nr17clean.xml"

We get the annotation list of each glomerulus present in the “EE 833604 nr17clean” WSI by parsing the xml file and reading the coordinates under the tag <Coordinates> and diving them by the factor by which they are downsampled.

You can get the list of annotations present in the WSI by:

annolist = parse_xml(os.path.join(annotpath,annotname))

Extracting patches using the above annotation list only which discards the background and non-significant part of the WSI:

Line 3-4: Reading the WSI using OpenSlide and getting the annotation list as annolist

Line 5-7: For each glomerulus i.e. annolist[i], take the coordinates as coords and generate a bounding box that returns the top-left corner (x and y) . We will use this x and y to read the portions from the WSI.

Line 8-11: This for loop is to generate 30 patches for each glomerulus by with random offset to the original x and y i.e. spointx, spointy. The range of k is decided by the user as per the requirement and can alter the range of offset to get more/less patches for the same glomerulus.

Line 12: Here we multiple by the factor to get the original co-ordinates as in WSI and not as per the level. (As the coordinates in the annolist are as per the level and not the original WSI coordinates)

Line 13&14: Using read_region function from the OpenSlide that takes the inputs:

i. The top left corner (spointx0, spointy0)

ii. The magnification level to read the WSI

iii. The patch size (eg. 256x256)

Line13: Storing the image using cv.imwrite

Patch extracted at mlevel=2 and patchsize = 256

Patch extracted at mlevel=1 and patch_size= 512

Happy Coding !!!!

Extracting patches from Whole Slide Images(WSI)

Written by Shivam Singh