GemIdent

Last updated
GemIdent logo GemIdent logo.jpg
GemIdent logo

GemIdent is an interactive image recognition program that identifies regions of interest in images and photographs. It is specifically designed for images with few colors, where the objects of interest look alike with small variation. For example, color image segmentation of:

Contents

GemIdent also packages data analysis tools to investigate spatial relationships among the objects identified.

History

GemIdent was developed at Stanford University by Adam Kapelner from June, 2006 until January, 2007 in the lab of Dr. Peter Lee under the tutelage of Professor Susan Holmes. [1] The concept was inspired by data Kohrt et al. [2] who analyzed immune profiles of lymph nodes in breast cancer patients. Hence, GemIdent works well when identifying cells in IHC-stained tissue imaged via automated light microscopy when the nuclear background stain and membrane/cytoplasmic stain are well-defined. In 2008, it was adapted to support multispectral imaging techniques. [3]

Methodology

GemIdent uses supervised learning to perform automated identification of regions of interest in the images. Therefore, the user must do a substantial amount of work first supplying the relevant colors, then pointing out examples of the objects or regions themselves as well as negatives (training set creation).

When a user clicks on a pixel, many scores are generated using the surrounding color information via Mahalanobis Ring Score attribute generation (read the JSS paper for a detailed exposition). These scores are then used to build a random forest machine-learning classifier which will then classify pixels in any given image.

After classification, there may be mistakes. The user can return to training and point out the specific mistakes and then reclassify. These training-classifying-retraining-reclassifying iterations (considered interactive boosting) can result in a highly accurate segmentation.

Recent applications

In 2010, Setiadi et al. [4] analyzed histological sections of lymph nodes looking at spatial densities of B and T cells. "Cell numbers do not capture the full range of information encoded within tissues".

Source code

The Java source code is now open source under GPL2. [5]

Examples

GemIdent identifying oranges in an orange grove OrangeExample.jpg
GemIdent identifying oranges in an orange grove

The raw photograph (left), a superimposed mask showing the pixel classification results (center), and finally the photograph is marked with the centroids of the object of interest - the oranges (right)

GemIdent identifying cancer cells in a microscopic image CancerExample.jpg
GemIdent identifying cancer cells in a microscopic image

The raw microscopic image of a stained lymph node (left) from the Kohrt study, [2] a superimposed mask showing the pixel classification results (center), and finally the image is marked with the centroids of the object of interest - the cancer nuclei (right)

GemIdent identifying cancer cells, T-cells, and background nuclei in a microscopic image GemIdentComposite.jpg
GemIdent identifying cancer cells, T-cells, and background nuclei in a microscopic image

This example illustrates GemIdent's ability to find multiple phenotypes in the same image: the raw microscopic image of a stained lymph node (top left) from the Kohrt study, [2] a superimposed mask showing the pixel classification results (top right), and finally the image marked with the centroids of the objects of interest - the cancer nuclei (in green stars), the T-cells (in yellow stars), and non-specific background nuclei (in cyan stars).

GemIdent analyzing results using data analysis and visualization tools Analysisdata.jpg
GemIdent analyzing results using data analysis and visualization tools

The command-line data analysis and visualization interface in action analyzing results of a classification of a lymph node from the Kohrt study. [2] The histogram displays the distribution of distances from T-cells to neighboring cancer cells. The binary image of cancer membrane is the result of a pixel-only classification. The open PDF document is the autogenerated report of the analysis which includes a thumbnail view of the entire lymph node, counts and Type I error rates for all phenotypes, as well as a transcript of the analyses performed.

Related Research Articles

<span class="mw-page-title-main">Lymph node</span> Organ of the lymphatic system

A lymph node, or lymph gland, is a kidney-shaped organ of the lymphatic system and the adaptive immune system. A large number of lymph nodes are linked throughout the body by the lymphatic vessels. They are major sites of lymphocytes that include B and T cells. Lymph nodes are important for the proper functioning of the immune system, acting as filters for foreign particles including cancer cells, but have no detoxification function.

<span class="mw-page-title-main">Cytopathology</span> A branch of pathology that studies and diagnoses diseases on the cellular level

Cytopathology is a branch of pathology that studies and diagnoses diseases on the cellular level. The discipline was founded by George Nicolas Papanicolaou in 1928. Cytopathology is generally used on samples of free cells or tissue fragments, in contrast to histopathology, which studies whole tissues. Cytopathology is frequently, less precisely, called "cytology", which means "the study of cells".

<span class="mw-page-title-main">Lymphocyte</span> Subtype of white blood cell

A lymphocyte is a type of white blood cell (leukocyte) in the immune system of most vertebrates. Lymphocytes include T cells, B cells, and Innate lymphoid cells (ILCs), of which natural killer cells are an important subtype. They are the main type of cell found in lymph, which prompted the name "lymphocyte". Lymphocytes make up between 18% and 42% of circulating white blood cells.

Image analysis or imagery analysis is the extraction of meaningful information from images; mainly from digital images by means of digital image processing techniques. Image analysis tasks can be as simple as reading bar coded tags or as sophisticated as identifying a person from their face.

<span class="mw-page-title-main">Castleman disease</span> Group of lymphoproliferative disorders

Castlemandisease (CD) describes a group of rare lymphoproliferative disorders that involve enlarged lymph nodes, and a broad range of inflammatory symptoms and laboratory abnormalities. Whether Castleman disease should be considered an autoimmune disease, cancer, or infectious disease is currently unknown.

<span class="mw-page-title-main">Image segmentation</span> Partitioning a digital image into segments

In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

The TNM Classification of Malignant Tumors (TNM) is a globally recognised standard for classifying the anatomical extent of the spread of malignant tumours (cancer). It has gained wide international acceptance for many solid tumor cancers, but is not applicable to leukaemia or tumors of the central nervous system. Most common tumors have their own TNM classification. The TNM staging system is sometimes referred to as the AJCC/UICC staging system or the UICC/AJCC staging system.

<span class="mw-page-title-main">Seminoma</span> Medical condition

A seminoma is a germ cell tumor of the testicle or, more rarely, the mediastinum or other extra-gonadal locations. It is a malignant neoplasm and is one of the most treatable and curable cancers, with a survival rate above 95% if discovered in early stages.

<span class="mw-page-title-main">Invasive carcinoma of no special type</span> Medical condition

Invasive carcinoma of no special type (NST) is also referred to as invasive ductal carcinoma or infiltrating ductal carcinoma(IDC) and invasive ductal carcinoma, not otherwise specified (NOS). Each of these terms represents to the same disease entity, but for international audiences this article will use invasive carcinoma NST because it is the preferred term of the World Health Organization (WHO).

<span class="mw-page-title-main">Papillary thyroid cancer</span> Medical condition

Papillary thyroid cancer is the most common type of thyroid cancer, representing 75 percent to 85 percent of all thyroid cancer cases. It occurs more frequently in women and presents in the 20–55 year age group. It is also the predominant cancer type in children with thyroid cancer, and in patients with thyroid cancer who have had previous radiation to the head and neck. It is often well-differentiated, slow-growing, and localized, although it can metastasize.

Cancer of unknown primary origin (CUP) is a cancer that is determined to be at the metastatic stage at the time of diagnosis, but a primary tumor cannot be identified. A diagnosis of CUP requires a clinical picture consistent with metastatic disease and one or more biopsy results inconsistent with a tumor cancer

<span class="mw-page-title-main">Medullary breast carcinoma</span> Rare type of breast cancer

Medullary breast carcinoma is a rare type of breast cancer that is characterized as a relatively circumscribed tumor with pushing, rather than infiltrating, margins. It is histologically characterized as poorly differentiated cells with abundant cytoplasm and pleomorphic high grade vesicular nuclei. It involves lymphocytic infiltration in and around the tumor and can appear to be brown in appearance with necrosis and hemorrhage. Prognosis is measured through staging but can often be treated successfully and has a better prognosis than other infiltrating breast carcinomas.

Breast cancer classification divides breast cancer into categories according to different schemes criteria and serving a different purpose. The major categories are the histopathological type, the grade of the tumor, the stage of the tumor, and the expression of proteins and genes. As knowledge of cancer cell biology develops these classifications are updated.

Bioimage informatics is a subfield of bioinformatics and computational biology. It focuses on the use of computational techniques to analyze bioimages, especially cellular and molecular images, at large scale and high throughput. The goal is to obtain useful knowledge out of complicated and heterogeneous image and related metadata.

The random walker algorithm is an algorithm for image segmentation. In the first description of the algorithm, a user interactively labels a small number of pixels with known labels, e.g., "object" and "background". The unlabeled pixels are each imagined to release a random walker, and the probability is computed that each pixel's random walker first arrives at a seed bearing each label, i.e., if a user places K seeds, each with a different label, then it is necessary to compute, for each pixel, the probability that a random walker leaving the pixel will first arrive at each seed. These probabilities may be determined analytically by solving a system of linear equations. After computing these probabilities for each pixel, the pixel is assigned to the label for which it is most likely to send a random walker. The image is modeled as a graph, in which each pixel corresponds to a node which is connected to neighboring pixels by edges, and the edges are weighted to reflect the similarity between the pixels. Therefore, the random walk occurs on the weighted graph.

Ilastik is a user-friendly free open source software for image classification and segmentation. No previous experience in image processing is required to run the software.

<span class="mw-page-title-main">Unicentric Castleman disease</span> Medical condition

Unicentric Castleman disease is a subtype of Castleman disease, a group of lymphoproliferative disorders characterized by lymph node enlargement, characteristic features on microscopic analysis of enlarged lymph node tissue, and a range of symptoms and clinical findings.

<span class="mw-page-title-main">Histopathology of colorectal adenocarcinoma</span>

The histopathology of colorectal cancer of the adenocarcinoma type involves analysis of tissue taken from a biopsy or surgery. A pathology report contains a description of the microscopical characteristics of the tumor tissue, including both tumor cells and how the tumor invades into healthy tissues and finally if the tumor appears to be completely removed. The most common form of colon cancer is adenocarcinoma, constituting between 95% and 98% of all cases of colorectal cancer. Other, rarer types include lymphoma, adenosquamous and squamous cell carcinoma. Some subtypes have been found to be more aggressive.

Papillary carcinomas of the breast (PCB), also termed malignant papillary carcinomas of the breast, are rare forms of the breast cancers. The World Health Organization (2019) classified papillary neoplasms of the breast into 5 types: intraductal papilloma, papillary ductal carcinoma in situ (PDCIS), encapsulated papillary carcinoma (EPC), solid-papillary carcinoma (SPC), and invasive papillary carcinoma (IPC). The latter four carcinomas are considered here; intraductal papilloma is a benign neoplasm. The World Health Organization regarded solid papillary carcinoma as having two subtypes: in situ and invasive SPC.

<span class="mw-page-title-main">Pure apocrine carcinoma of the breast</span> Medical condition

Pure apocrine carcinoma of the breast (PACB) is a rare carcinoma derived from the epithelial cells in the lactiferous ducts of the mammary gland. The mammary gland is an apocrine gland. Its lactiferous ducts have two layers of epithelial cells, a luminal layer which faces the duct's lumen and a basal layer which lies beneath the luminal layer. There are at least 4 subtypes of epithelial cells in these ducts: luminal progenitor cells and luminal mature cells which reside in the luminal layer and mammary stem cells and basal cells which reside in the basal layer. Examination of the genes expressed in PACB cancer cells indicate that most of these tumors consist of cells derived from luminal cells but a minority of these tumors consist of cells derived from basal cells.

References

  1. Kapelner, Adam; Peter P. Lee; Susan Holmes (July 2007). "An Interactive Statistical Image Segmentation and Visualization System". International Conference on Medical Information Visualisation - BioMedical Visualisation (MediVis 2007). pp. 81–86. doi:10.1109/MEDIVIS.2007.5. ISBN   978-0-7695-2904-2. S2CID   16260264. Archived from the original on 2013-04-15.{{cite book}}: |journal= ignored (help)
  2. 1 2 3 4 Kohrt, Holbrook E; Navid Nouri; Kent Nowels; Denise Johnson; Susan Holmes; Peter P Lee (September 2005). "Profile of Immune Cells in Axillary Lymph Nodes Predicts Disease-Free Survival in Breast Cancer". PLOS Medicine. 2 (9): e284. doi: 10.1371/journal.pmed.0020284 . ISSN   1549-1676. PMC   1198041 . PMID   16124834.
  3. Holmes, Susan; Adam Kapelner; Peter P. Lee (January 15, 2009). "An Interactive Java Statistical Image Segmentation System: GemIdent". Journal of Statistical Software. 30 (10): 1–20. doi:10.18637/jss.v030.i10. ISSN   1548-7660. PMC   3100170 . PMID   21614138.
  4. Setiadi, Francesca; Nelson C. Ray; Holbrook E. Kohrt; Adam Kapelner; Valeria Carcamo-Cavazos; Edina B. Levic; Sina Yadegarynia; Chris M. van der Loos; Erich J. Schwartz; Susan Holmes; Peter P. Lee (Aug 25, 2010). "Quantitative, Architectural Analysis of Immune Cell Subsets in Tumor-Draining Lymph Nodes from Breast Cancer Patients and Healthy Lymph Nodes". PLOS ONE. 5 (8): e12420. Bibcode:2010PLoSO...512420S. doi: 10.1371/journal.pone.0012420 . PMC   2928294 . PMID   20811638.
  5. "Kapelner/GemIdent". GitHub . April 2019.