MiTAP

Last updated

MiTAP, or Mitre Text and Audio Processing, is a computer system that tries to automatically gather, translate, organize, and present information "for monitoring infectious disease outbreaks and other global events." [1] It is also used in the FBI Investigative Data Warehouse.

Sources

"Multiple information sources in multiple languages are automatically captured, filtered, translated, summarized, and categorized" [1]

It uses 'web sources, electronic mailing lists, newsgroups, news feeds, and audio-video data.'. The audio-video is automatically transcribed into text by the ViTAP system. [1]

Guts

In 2002 it was reported to have used CyberTrans, the Alembic natural language analyzer, WebSumm summarizer, Lucene indexing, NewsBlaster from Columbia, Brill tagging, SOAP, HTML, NNTP, Perl, Unix scripts, and other tools. Upgrades to various components are planned. [2]

Creators

It was created at the Mitre Corporation by Damianos and a team of other researchers, with public release in 2001. [1] [3]

Users

It is being used by the FBI as part of their Investigative Data Warehouse via DARPA's TIDES program. [4] According to 2004 FBI email, MiTAP was running at San Diego State University, collecting only English language website news. It mentioned a plan to have FBI run its own version of MiTAP. [5]

It has also been used by people in the White House, the Department of Homeland Security, the Pentagon, [1] the American Red Cross, the United Nations, and the European Disaster Center [6]

Notes

  1. 1 2 3 4 5 Damianos et al, AI Magazine, Vol 23, No 4
  2. Damianos et al, AAAI Proceedings, 2002
  3. Damianos et al, The MiTAP System..., 2004
  4. FBI, IDW System Security Plan, 2003, and Security Concept of Operations IDW, 2004
  5. EFF FOIA file, 2008 Apr 8, idw02. page 30 of linked pdf
  6. Damianos et al. AI Magazine, Vol 23, No 4

Bibliography


Related Research Articles

Translingual Information Detection, Extraction and Summarization (TIDES) is a technology development program funded by the U.S. Defense Advanced Research Projects Agency (DARPA), focused on the automated processing and understanding of language data. The primary goal of the program is to enable English speakers to locate and interpret required information quickly and effectively regardless of the original language.

The Communications Assistance for Law Enforcement Act (CALEA), also known as the "Digital Telephony Act," is a United States wiretapping law passed in 1994, during the presidency of Bill Clinton.

Freedom of Information Act (United States) US statute regarding access to information held by the US government

The Freedom of Information Act (FOIA), 5 U.S.C. § 552, is a federal freedom of information law that requires the full or partial disclosure of previously unreleased information and documents controlled by the United States government upon request. The act defines agency records subject to disclosure, outlines mandatory disclosure procedures, and defines nine exemptions to the statute. The act was intended to make U.S. government agencies' functions more transparent so that the American public could more easily identify problems in government functioning and put pressure on Congress, agency officials, and the president to address them.

Carnivore (software)

Carnivore, later renamed DCS1000, was a system implemented by the Federal Bureau of Investigation (FBI) that was designed to monitor email and electronic communications. It used a customizable packet sniffer that could monitor all of a target user's Internet traffic. Carnivore was implemented in October 1997. By 2005 it had been replaced with improved commercial software.

Harris Corporation was an American technology company, defense contractor, and information technology services provider that produced wireless equipment, tactical radios, electronic systems, night vision equipment and both terrestrial and spaceborne antennas for use in the government, defense and commercial sectors. They specialized in surveillance solutions, microwave weaponry, and electronic warfare. In 2019, it merged with L3 Technologies to form L3Harris Technologies.

Carl Malamud Technologist, author, and public domain advocate

Carl Malamud is an American technologist, author, and public domain advocate, known for his foundation Public.Resource.Org. He founded the Internet Multicasting Service. During his time with this group, he was responsible for developing the first Internet radio station, for putting the U.S. Securities and Exchange Commission's EDGAR database on-line, and for creating the Internet 1996 World Exposition.

John E. Moss

John Emerson Moss was an American politician of the Democratic Party, noted for his championing of the federal Freedom of Information Act (FOIA) through multiple sessions of the United States House of Representatives where he served from 1953 to 1979.

Room 641A

Room 641A is a telecommunication interception facility operated by AT&T for the U.S. National Security Agency, as part of its warrantless surveillance program as authorized by the Patriot Act. The facility commenced operations in 2003 and its purpose was publicly revealed in 2006.

The Automated Targeting System or ATS is a United States Department of Homeland Security computerized system that, for every person who crosses U.S. borders, scrutinizes a large volume of data related to that person, and then automatically assigns a rating for which the expectation is that it helps gauge whether this person may be placed within a risk group of terrorists or other criminals. Similarly ATS analyzes data related to container cargo.

Investigative Data Warehouse (IDW) is a searchable database operated by the FBI. It was created in 2004. Much of the nature and scope of the database is classified. The database is a centralization of multiple federal and state databases, including criminal records from various law enforcement agencies, the U.S. Department of the Treasury's Financial Crimes Enforcement Network (FinCEN), and public records databases. According to Michael Morehart's testimony before the House Committee on Financial Services in 2006, the "IDW is a centralized, web-enabled, closed system repository for intelligence and investigative data. This system, maintained by the FBI, allows appropriately trained and authorized personnel throughout the country to query for information of relevance to investigative and intelligence matters."

A government database collects information for various reasons, including climate monitoring, securities law compliance, geological surveys, patent applications and grants, surveillance, national security, border control, law enforcement, public health, voter registration, vehicle registration, social security, and statistics.

Project Mockingbird was a wiretapping operation initiated by United States President John F. Kennedy to identify the sources of government leaks by eavesdropping on the communications of journalists.

Digital Collection System Network

The Digital Collection System Network (DCSNet) is the Federal Bureau of Investigation (FBI)'s point-and-click surveillance system that can perform instant wiretaps on almost any telecommunications device in the US.

The Electronic Frontier Foundation (EFF) is an international non-profit digital rights group based in San Francisco, California. The foundation was formed on 10 July 1990 by John Gilmore, John Perry Barlow and Mitch Kapor to promote Internet civil liberties.

FBI Index System used to track American citizens and other people

The FBI Indexes, or Index List, was a system used to track American citizens and other people by the Federal Bureau of Investigation (FBI) before the adoption of computerized databases. The Index List was originally made of paper index cards, first compiled by J. Edgar Hoover at the Bureau of Investigations before he was appointed director of the BI. The Index List was used to track U.S. citizens and others believed by the FBI to be dangerous to national security, and was subdivided into various divisions which generally were rated based on different classes of danger the subject was thought to represent. There is no indication the FBI stopped adding names onto its Index List before September 11, 2001.

MuckRock is a United States-based 501(c)(3) non-profit organization which assists anyone in filing governmental requests for information through the Freedom of Information Act (FOIA) and other public record laws around the United States, then publishes the returned information on its website and encourages journalism around it.

The Narcotics and Dangerous Drugs Information System, or NADDIS, is a data index and collection system operated by the United States Drug Enforcement Administration (DEA). Comprising millions of DEA reports and records on individuals, NADDIS is a system by which intelligence analysts, investigators and others in law enforcement retrieve reports from the DEA's Investigative Filing and Reporting System (IFRS). NADDIS is thought to have become the most widely used, if least known, tool in drug law enforcement.

HTTPS Everywhere is a free and open-source browser extension for Google Chrome, Microsoft Edge, Mozilla Firefox, Opera, Brave, Vivaldi and Firefox for Android, which is developed collaboratively by The Tor Project and the Electronic Frontier Foundation (EFF). It automatically makes websites use a more secure HTTPS connection instead of HTTP, if they support it. The option "Encrypt All Sites Eligible" makes it possible to block and unblock all non-HTTPS browser connections with one click.

Machine Identification Code Digital watermark which certain printers leave

A Machine Identification Code (MIC), also known as printer steganography, yellow dots, tracking dots or secret dots, is a digital watermark which certain color laser printers and copiers leave on every single printed page, allowing identification of the device with which a document was printed and giving clues to the originator. Developed by Xerox and Canon in the mid-1980s, its existence became public only in 2004. In 2018, scientists developed privacy software to anonymize prints in order to support whistleblowers publishing their work.

Spectre AI Incorporated was a private software company that served various government agencies and defense contractors in the early 2000s. The company is notable for having developed and deployed the first functional presence engine in 2001. Spectre AI's initial contract with Raytheon was the direct result of a request on the floor of the United States Congress on September 13, 2001 by Senators Judd Gregg and Ernest Hollings. This was a surprising amount of access given the company was started one year earlier in Spokane, Washington.