Flow Cytometry Standard

Last updated

Flow Cytometry Standard (FCS) is a data file standard for the reading and writing of data from flow cytometry experiments. The FCS specification has traditionally been developed and maintained by the International Society for Advancement of Cytometry (ISAC). [1] FCS used to be the only widely adopted file format in flow cytometry. Recently, additional standard file formats have been developed by ISAC.

Contents

File Format

The FCS file format describes a file that is a combination of textual data followed by binary data. The order of the file layout is as follows:

  1. HEADER segment
  2. TEXT segment
  3. DATA segment
  4. Optional ANALYSIS segment
  5. CRC value
  6. Optional OTHER segments

The HEADER segment is an ASCII text string that begins by identifying the version of the FCS standard used, followed by three pairs of byte offsets that designate the positions of the TEXT, DATA, and ANALYSIS segments. An example header segment is given below

FCS3.0          58    4380    4381    5586       0       0

Because the field width of the header segment byte positions is constrained by 8 characters, the maximum position it is capable of storing is 99,999,999. Anything beyond that is encoded as a 0 for both the start and end position, and the corresponding TEXT segment keyword is used instead.

The text segment is an ASCII text string that is divided into a series of key-value pairs that are delimited by some chosen character, e.g. '|'. The first character immediately following the header segment is the delimiter. An example of a header and text segment is given below

FCS3.0          58    4380    4381    5586       0       0|$BEGINANALYSIS|0|$BEGINDATA|4381|$BEGINSTEXT|0|$BTIM|08:24:37.64|$BYTEORD|1,2,3,4|$CELLS|RBC|...|

To be a valid FCS file, the text segment must contain all required keywords, which describe the DATA segment format and encoding. For FCS version 3.1, the required FCS primary TEXT segment keywords are as follows:

KeywordDescription
$BEGINANALYSISByte-offset to the beginning of the ANALYSIS segment.
$BEGINDATAByte-offset to the beginning of the DATA segment.
$BEGINSTEXTByte-offset to the beginning of a supplemental TEXT segment.
$BYTEORDByte order for data acquisition computer.
$DATATYPEType of data in DATA segment (ASCII, integer, floating point).
$ENDANALYSISByte-offset to the last byte of the ANALYSIS segment.
$ENDDATAByte-offset to the last byte of the DATA segment.
$ENDSTEXTByte-offset to the last byte of a supplemental TEXT segment.
$MODEData mode (list mode - preferred, histogram - deprecated).
$NEXTDATAByte offset to next data set in the file.
$PARNumber of parameters in an event.
$PnBNumber of bits reserved for parameter number n.
$PnEAmplification type for parameter n.
$PnNShort name for parameter n.
$PnRRange for parameter number n.
$TOTTotal number of events in the data set.

The DATA segment of the FCS file follows after the TEXT segment and is laid out event-wise (row-wise) according to the order described in the parameters (a.k.a. channels) $P1N $P2N...$PnN. An event is either an actual biological cell or some other mass that was large enough to trigger the data acquisition capturing device of the flow cytometer instrument. Data segments hold the following layout:

Data Segment [Event1][Event2][Event3]...[Event$TOT]

Each event is laid out according to the number of bytes described by $PnB for each parameter. These bytes are to be interpreted according to the combination specified by $BYTEORD and $DATATYPE.

Event [$P1B][$P2B][$P3B]...[$PnB] 

Data structure

Flow cytometry data is typically saved for analysis in the form of an array, with fluorescence and scatter channels represented in columns, and individual "events" (most of which are cells) forming the rows. The number of events acquired from each sample usually ranges between the low thousands and the low millions.

Representation of flow cytometry data from an instrument with three scatter channels and 13 fluorescent channels. Only the values for the first 30 (of hundreds of thousands) of cells are shown. Fcsmatrix.svg
Representation of flow cytometry data from an instrument with three scatter channels and 13 fluorescent channels. Only the values for the first 30 (of hundreds of thousands) of cells are shown.

History

The first version of a Flow Cytometry Standard (FCS) was developed in 1984. [2] Since then, FCS became the standard file format supported by all flow cytometry software and hardware vendors. FCS is a binary file format with three main segments: a text segment containing meta data in keyword/value pairs structures, a data segment usually containing a matrix of detected expression values (so called list mode format), and a rarely used analysis segment.

Over the years, updates were incorporated to adapt to technological advancements in both flow cytometry and computing technologies.

In 1990, FCS 2.0 was introduced. [3] [4] Features introduced in FCS 2.0 included the option of multiple data sets within a data file, the use of different byte orders accommodating hardware variations on different computing platforms, and basic compensation and scaling information. FCS 2.0 was followed by FCS 3.0 in 1997, which introduced the possibility of storing data sets larger than 100MB. [5]

The latest version, FCS 3.1, was introduced in 2010. [6] [7] It retains the basic FCS file structure and most features of previous versions of the standard. Changes included in FCS 3.1 address potential ambiguities in the previous versions and provide a more robust standard. They include simplified support for international characters and improved support for storing compensation. The major additions are support for preferred display scale, a standardized way of capturing the sample volume, information about the origins of the data file, and support for plate and well identification in high throughput, plate based experiments.

See also

Related Research Articles

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

The JPEG File Interchange Format (JFIF) is an image file format standard. It defines supplementary specifications for the container format that contains the image data encoded with the JPEG algorithm. The base specifications for a JPEG container format are defined in Annex B of the JPEG standard, known as JPEG Interchange Format (JIF). JFIF builds over JIF to solve some of JIF's limitations, including unnecessary complexity, component sample registration, resolution, aspect ratio, and color space. Because JFIF is a supplementary standard, the resulting file format may be referred to as "JPEG/JFIF".

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and Mac OS X.

Flexible Image Transport System (FITS) is an open standard defining a digital file format useful for storage, transmission and processing of data: formatted as multi-dimensional arrays, or tables. FITS is the most commonly used digital file format in astronomy. The FITS standard was designed specifically for astronomical data, and includes provisions such as describing photometric and spatial calibration information, together with image origin metadata.

The BMP file format, also known as bitmap image file, device independent bitmap (DIB) file format and bitmap, is a raster graphics image file format used to store bitmap digital images, independently of the display device, especially on Microsoft Windows and OS/2 operating systems.

The archiver, also known simply as ar, is a Unix utility that maintains groups of files as a single archive file. Today, ar is generally used only to create and update static library files that the link editor or linker uses and for generating .deb packages for the Debian family; it can be used to create archives for any purpose, but has been largely replaced by tar for purposes other than static libraries. An implementation of ar is included as one of the GNU Binutils.

Flow cytometry Lab technique in biology and chemistry

Flow cytometry (FC) is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles.

COM file

A COM file is a type of simple executable file. On the Digital Equipment operating systems of the 1970s, .COM was used as a filename extension for text files containing commands to be issued to the operating system. With the introduction of CP/M, the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over to DOS. Even when complemented by the more general EXE file format for executables, the compact COM files remained viable and frequently used under DOS.

Binary file

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

Real-Time Messaging Protocol (RTMP) was initially a proprietary protocol developed by Macromedia for streaming audio, video and data over the Internet, between a Flash player and a server. Macromedia is now owned by Adobe, which has released an incomplete version of the specification of the protocol for public use.

Data Interchange Format (.dif) is a text file format used to import/export single spreadsheets between spreadsheet programs. One limitation is that DIF format cannot handle multiple spreadsheets in a single workbook.

The Image Cytometry Standard (ICS) is a digital multidimensional image file format used in life sciences microscopy. It stores not only the image data, but also the microscopic parameters describing the optics during the acquisition.

SEG-Y

The SEG-Y file format is one of several standards developed by the Society of Exploration Geophysicists (SEG) for storing geophysical data. It is an open standard, and is controlled by the SEG Technical Standards Committee, a non-profit organization.

PLY (file format)

PLY is a computer file format known as the Polygon File Format or the Stanford Triangle Format. It was principally designed to store three-dimensional data from 3D scanners. The data storage format supports a relatively simple description of a single object as a list of nominally flat polygons. A variety of properties can be stored, including: color and transparency, surface normals, texture coordinates and data confidence values. The format permits one to have different properties for the front and back of a polygon. There are two versions of the file format, one in ASCII, the other in binary.

The Relocatable Object Module Format (OMF) is an object file format used primarily for software intended to run on Intel 80x86 microprocessors. Version 4.0 was released by Intel in 1981 under the name Object Module Format, and is perhaps best known to DOS users as an .OBJ file. It has since been standardized by the Tool Interface Standards Committee.

In computer networking, an Ethernet frame is a data link layer protocol data unit and uses the underlying Ethernet physical layer transport mechanisms. In other words, a data unit on an Ethernet link transports an Ethernet frame as its payload.

FlowJo is a software package for analyzing flow cytometry data. Files produced by modern flow cytometers are written in the Flow Cytometry Standard format with an .fcs file extension. FlowJo will import and analyze cytometry data regardless of which flow cytometer is used to collect the data.

Mass cytometry

Mass cytometry is a mass spectrometry technique based on inductively coupled plasma mass spectrometry and time of flight mass spectrometry used for the determination of the properties of cells (cytometry). In this approach, antibodies are conjugated with isotopically pure elements, and these antibodies are used to label cellular proteins. Cells are nebulized and sent through an argon plasma, which ionizes the metal-conjugated antibodies. The metal signals are then analyzed by a time-of-flight mass spectrometer. The approach overcomes limitations of spectral overlap in flow cytometry by utilizing discrete isotopes as a reporter system instead of traditional fluorophores which have broad emission spectra.

Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results.

In cytometry, compensation is a mathematical correction of a signal overlap between the channels of the emission spectra of different fluorochromes.

References

  1. "International Society for Advancement of Cytometry" . Retrieved 15 January 2015.
  2. Murphy, R. F.; Chused, T. M. (1984). "A proposal for a flow cytometric data file standard". Cytometry. 5 (5): 553–555. doi: 10.1002/cyto.990050521 . PMID   6489069.
  3. Dean, P. N.; Bagwell, C. B.; Lindmo, T.; Murphy, R. F.; Salzman, G. C. (1990). "Introduction to flow cytometry data file standard". Cytometry. 11 (3): 321–322. doi: 10.1002/cyto.990110302 . PMID   2340768.
  4. Dean, PN; Bagwell, CB; Lindmo, T; Murphy, RF; Salzman, GC (1990). "Data file standard for flow cytometry. Data File Standards Committee of the Society for Analytical Cytology". Cytometry. 11 (3): 323–332. doi: 10.1002/cyto.990110303 . PMID   2340769.
  5. Seamer, L. C.; Bagwell, C. B.; Barden, L.; Redelman, D.; Salzman, G. C.; Wood, J. C. S.; Murphy, R. F. (1997). "Proposed new data file standard for flow cytometry, version FCS 3.0". Cytometry. 28 (2): 118–122. doi: 10.1002/(SICI)1097-0320(19970601)28:2<118::AID-CYTO3>3.0.CO;2-B . PMID   9181300.
  6. Spidlen, J.; Moore, W.; Parks, D.; Goldberg, M.; Bray, C.; Bierre, P.; Gorombey, P.; Hyun, B.; Hubbard, M.; Lange, S.; Lefebvre, R.; Leif, R.; Novo, D.; Ostruszka, L.; Treister, A.; Wood, J.; Murphy, R. F.; Roederer, M.; Sudar, D.; Zigon, R.; Brinkman, R. R. (2009). "Data File Standard for Flow Cytometry, version FCS 3.1". Cytometry Part A. 77 (1): 97–100. doi:10.1002/cyto.a.20825. PMC   2892967 . PMID   19937951.
  7. "Data File Standard for Flow Cytometry, Version FCS 3.1 - Normative Reference" (PDF). International Society for Advancement of Cytometry. International Society for Advancement of Cytometry. Archived from the original (PDF) on 9 February 2015. Retrieved 15 January 2015.