SEG-Y

Last updated
SEG-Y
Seg-y picture.gif
An image of seismic line that was originally encoded in the SEG-Y format.
Filename extension .sgy, .segy
Developed bySEG Technical Standards Committee
Initial release1975;50 years ago (1975)
Latest release
rev 2.1
October 2023;1 year ago (2023-10)
Type of format Reflection seismology data
Extended fromSEG "Ex"
Website https://library.seg.org/pb-assets/technical-standards/seg_y_rev1-1686080991247.pdf

The SEG-Y (sometimes SEG Y) file format is one of several standards developed by the Society of Exploration Geophysicists (SEG) for storing geophysical data. It is an open standard, and is controlled by the SEG Technical Standards Committee, a non-profit organization.

Contents

History

The format was originally developed in 1973 to store single-line seismic reflection digital data on magnetic tapes. The specification was published in 1975. [1]

The format and its name evolved from the SEG "Ex" or Exchange Tape Format. [1] [2] However, since its release, there have been significant advancements in geophysical data acquisition, such as 3-dimensional seismic techniques and high speed, high capacity recording.

The most recent revision of the SEG-Y format was published in 2023, named the rev 2.1 specification. [3] It still features certain legacies of the original format (referred as rev 0), such as an optional SEG-Y tape label, the main 3200 byte textual EBCDIC character encoded tape header and a 400 byte binary header. [4]

Data structure

SEG-Y Files are stored in a hierarchical byte-stream format that combines both textual and binary data segments. The following chart shows the byte stream structure of revision 1 (2002), [5] with revision 2 (2017) only adding an optional data trailer for 1 or more 3200-byte records at the end: [6] [7]

SEGY file byte stream structure.svg

Since the first SEG-Y standard was published, many companies dealing with seismic data have produced variants of the SEG-Y standard which have run contrary to the aims of defining a standard for universal interchange, thus generally causing confusion and delay when data received by a company in expected SEG-Y format turns out to be a variant of that format. Initially, many of these derived from the fact that the format was based on the de facto standard of using IBM computers for digital processing where character data was coded in EBCDIC and number data in IBM Floating Point, whereas processing systems in use quickly evolved based on ASCII character and IEEE number representations.

Even before the SEG-Y standard was agreed and published, earlier seismic data format standards published by the SEG such as SEG-A, SEG-B and SEG-C were modified by seismic acquisition companies, though not to the wide extent that the SEG-Y format has been modified by seismic acquisition and processing companies and oil companies using their own inhouse software.

As magnetic tape technology developed, the original SEG-Y format using individual small data blocks for each distinct seismic trace became very inefficient in terms of tape performance so the first and subsequent revisions allowed for larger tape data blocks containing many individual traces.

There have been many suggestions for including different kinds of metadata within the standard over the years, since when the standard was first proposed the processes of acquisition and processing were technologically much simpler.

For example geographical positioning information either real world or relative wasn't stored in the trace header at acquisition or final processing whereas it is routine today.

However, the relative simplicity of the SEG-Y format has meant that it has worked well for interchange of seismic data allowing anyone to read and understand data recorded in the 1970s on half inch magnetic tape as easily as reading it on modern tape media or from a disk file, as long as the standard has been adhered to.

While SEG-Y format data is still stored on magnetic tape as a permanent archive, SEG-Y data is increasingly stored as disk files for online and near-line ease of access, and later format revisions allow for character and number data to be stored in native system representations as ASCII and IEEE.

See also

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, an acronym for American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. ASCII has just 128 code points, of which only 95 are printable characters, which severely limit its scope. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers, including Unicode which has over a million code points, but the first 128 of these are the same as ASCII.

Extended Binary Coded Decimal Interchange Code is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Unisys MCP and ICL VME.

UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit. Almost every webpage is stored in UTF-8.

The JPEG File Interchange Format (JFIF) is an image file format standard published as ITU-T Recommendation T.871 and ISO/IEC 10918-5. It defines supplementary specifications for the container format that contains the image data encoded with the JPEG algorithm. The base specifications for a JPEG container format are defined in Annex B of the JPEG standard, known as JPEG Interchange Format (JIF). JFIF builds over JIF to solve some of JIF's limitations, including unnecessary complexity, component sample registration, resolution, aspect ratio, and color space. Because JFIF is not the original JPG standard, one might expect another MIME type. However, it is still registered as "image/jpeg".

Resource Interchange File Format (RIFF) is a generic file container format for storing data in tagged chunks. It is primarily used for audio and video, though it can be used for arbitrary data.

Tag Image File Format or Tagged Image File Format, commonly known by the abbreviations TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processing, optical character recognition, image manipulation, desktop publishing, and page-layout applications. The format was created by the Aldus Corporation for use in desktop publishing. It published the latest version 6.0 in 1992, subsequently updated with an Adobe Systems copyright after the latter acquired Aldus in 1994. Several Aldus or Adobe technical notes have been published with minor extensions to the format, and several specifications have been based on TIFF 6.0, including TIFF/EP, TIFF/IT, TIFF-F and TIFF-FX.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at the University of California, Berkeley in 1980, for encoding binary data for transmission in email systems.

<span class="mw-page-title-main">Reflection seismology</span> Explore subsurface properties with seismology

Reflection seismology is a method of exploration geophysics that uses the principles of seismology to estimate the properties of the Earth's subsurface from reflected seismic waves. The method requires a controlled seismic source of energy, such as dynamite or Tovex blast, a specialized air gun or a seismic vibrator. Reflection seismology is similar to sonar and echolocation.

<span class="mw-page-title-main">Society of Exploration Geophysicists</span> Nonprofit geoscience organization

The Society of Exploration Geophysicists (SEG) is a learned society dedicated to promoting the science and education of exploration geophysics in particular and geophysics in general. The Society fosters the expert and ethical practice of geophysics in the exploration and development of natural resources, in characterizing the near-surface, and in mitigating earth hazards. As of November 2019, SEG has more than 14,000 members working in more than 114 countries. SEG was founded in 1930 in Houston, Texas but its business office has been headquartered in Tulsa, Oklahoma since the mid-1940s. While most SEG members are involved in exploration for petroleum, SEG members also are involved in application of geophysics methods to mineral exploration as well as environmental and engineering problems, archaeology, and other scientific endeavors. SEG publishes The Leading Edge (TLE), a monthly professional magazine, Geophysics, a peer-reviewed archival publication, and Interpretation, a peer-reviewed journal co-published by SEG and the American Association of Petroleum Geologists.

UTF-EBCDIC is a character encoding capable of encoding all 1,112,064 valid character code points in Unicode using 1 to 5 bytes. It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for existing ASCII-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.

The Burroughs B2500 through Burroughs B4900 was a series of mainframe computers developed and manufactured by Burroughs Corporation in Pasadena, California, United States, from 1966 to 1991. They were aimed at the business world with an instruction set optimized for the COBOL programming language. They were also known as Burroughs Medium Systems, by contrast with the Burroughs Large Systems and Burroughs Small Systems.

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

<span class="mw-page-title-main">STL (file format)</span> Standard Tessellation Language. File format for 3D printing and scanning applications.

STL is a file format native to the stereolithography CAD software created by 3D Systems. Chuck Hull, the inventor of stereolithography and 3D Systems’ founder, reports that the file extension is an abbreviation for stereolithography, although it is also referred to as standard triangle language or standard tessellation language.

This article compares Unicode encodings in two types of environments: 8-bit clean environments, and environments that forbid the use of byte values with the high bit set. Originally, such prohibitions allowed for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. The Standard Compression Scheme for Unicode and the Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form, making it possible to store on non-binary media such as paper tape, punch cards, etc., to display on text terminals or be printed on line-oriented printers. The format is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a object or executable file in hexadecimal format. In some applications, the Intel hex format is also used as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution. There are various tools to convert files between hexadecimal and binary format, and vice versa.

<span class="mw-page-title-main">9-track tape</span> Magnetic tape format introduced by IBM in 1964

9-track tape is a format for magnetic-tape data storage, introduced with the IBM System/360 in 1964. The 12 inch (12.7 mm) wide magnetic tape media and reels have the same size as the earlier IBM 7-track format it replaced, but the new format has eight data tracks and one parity track for a total of nine parallel tracks. Data is stored as 8-bit characters, spanning the full width of the tape. Various recording methods have been employed during its lifetime as tape speed and data density increased, including PE, GCR, and NRZI. Tapes come in various sizes up to 3,600 feet (1,100 m) in length.

A National Data Repository (NDR) is a data bank that seeks to preserve and promote a country's natural resources data, particularly data related to the petroleum exploration and production (E&P) sector.

Shell Processing Support (SPS) is a set of formatted files used for exchanging Land 3D seismic data. They can also be used for 2D.

References

  1. 1 2 Barry, K.M.; Cavers, D.A.; Kneale, C.W. (1975). "Recommended standards for digital tape formats" (PDF). Geophysics. 40 (2): 344–352. Bibcode:1975Geop...40..344B. doi:10.1190/1.1440530.
  2. Northwood, E.J.; Weisinger, R.C.; Bradley, J.J. (1967). "Recommended standards for digital tape formats" (PDF). Geophysics. 32 (6): 1073–1084. doi:10.1190/1.32060004.1.
  3. Hagelund, Rune; Stewart A. Levin & Shawn New, eds. (2023). SEG-Y_r2.1: SEG-Y revision 2.1 Data Exchange format (PDF). Tulsa, OK: Society of Exploration Geophysicists.
  4. Hagelund, Rune; Stewart A. Levin & Shawn New, eds. (2023). SEG-Y_r2.1: seg_y_rev2-1_nov2023-announcement format (PDF). Tulsa, OK: Society of Exploration Geophysicists.
  5. "SEG Y rev 1 Data Exchange format" (PDF). SEG Technical Standards Committee. May 2002.
  6. "SEG-Y_r2.0: SEG-Y revision 2.0 Data Exchange format" (PDF). SEG Technical Standards Committee. January 2017.
  7. "SEG-Y_r2.1: SEG-Y revision 2.1 Data Exchange format" (PDF). SEG Technical Standards Committee. October 2023.