MiTAP

Last updated

MiTAP, or Mitre Text and Audio Processing, is a computer system that tries to automatically gather, translate, organize, and present information "for monitoring infectious disease outbreaks and other global events." [1] It is also used in the FBI Investigative Data Warehouse.

Sources

"Multiple information sources in multiple languages are automatically captured, filtered, translated, summarized, and categorized" [1]

It uses 'web sources, electronic mailing lists, newsgroups, news feeds, and audio-video data.'. The audio-video is automatically transcribed into text by the ViTAP system. [1]

Guts

In 2002 it was reported to have used CyberTrans, the Alembic natural language analyzer, WebSumm summarizer, Lucene indexing, NewsBlaster from Columbia, Brill tagging, SOAP, HTML, NNTP, Perl, Unix scripts, and other tools. Upgrades to various components are planned. [2]

Creators

It was created at the Mitre Corporation by Damianos and a team of other researchers, with public release in 2001. [1] [3]

Users

It is being used by the FBI as part of their Investigative Data Warehouse via DARPA's TIDES program. [4] According to 2004 FBI email, MiTAP was running at San Diego State University, collecting only English language website news. It mentioned a plan to have FBI run its own version of MiTAP. [5]

It has also been used by people in the White House, the Department of Homeland Security, the Pentagon, [1] the American Red Cross, the United Nations, and the European Disaster Center [6]

Notes

  1. 1 2 3 4 5 Damianos et al, AI Magazine, Vol 23, No 4
  2. Damianos et al, AAAI Proceedings, 2002
  3. Damianos et al, The MiTAP System..., 2004
  4. FBI, IDW System Security Plan, 2003, and Security Concept of Operations IDW, 2004
  5. EFF FOIA file, 2008 Apr 8, idw02. page 30 of linked pdf
  6. Damianos et al. AI Magazine, Vol 23, No 4

Bibliography


Related Research Articles

Translingual Information Detection, Extraction and Summarization (TIDES) is a technology development program funded by the U.S. Defense Advanced Research Projects Agency (DARPA), focused on the automated processing and understanding of language data. The primary goal of the program is to enable English speakers to locate and interpret required information quickly and effectively regardless of the original language.

Wiretapping, also known as wire tapping or telephone tapping, is the monitoring of telephone and Internet-based conversations by a third party, often by covert means. The wire tap received its name because, historically, the monitoring connection was an actual electrical tap on an analog telephone or telegraph line. Legal wiretapping by a government agency is also called lawful interception. Passive wiretapping monitors or records the traffic, while active wiretapping alters or otherwise affects it.

Computer and network surveillance is the monitoring of computer activity and data stored locally on a computer or data being transferred over computer networks such as the Internet. This monitoring is often carried out covertly and may be completed by governments, corporations, criminal organizations, or individuals. It may or may not be legal and may or may not require authorization from a court or other independent government agencies. Computer and network surveillance programs are widespread today and almost all Internet traffic can be monitored.

The Communications Assistance for Law Enforcement Act (CALEA), also known as the "Digital Telephony Act," is a United States wiretapping law passed in 1994, during the presidency of Bill Clinton.

<span class="mw-page-title-main">Freedom of Information Act (United States)</span> 1967 US statute regarding access to information held by the US government

The Freedom of Information Act, 5 U.S.C. § 552, is the United States federal freedom of information law that requires the full or partial disclosure of previously unreleased or uncirculated information and documents controlled by the U.S. government upon request. The act defines agency records subject to disclosure, outlines mandatory disclosure procedures, and includes nine exemptions that define categories of information not subject to disclosure. The act was intended to make U.S. government agencies' functions more transparent so that the American public could more easily identify problems in government functioning and put pressure on Congress, agency officials, and the president to address them. The FOIA has been changed repeatedly by both the legislative and executive branches.

<span class="mw-page-title-main">Carnivore (software)</span> Electronic communication monitor used by the FBI

Carnivore, later renamed DCS1000, was a system implemented by the Federal Bureau of Investigation (FBI) that was designed to monitor email and electronic communications. It used a customizable packet sniffer that could monitor all of a target user's Internet traffic. Carnivore was implemented in October 1997. By 2005 it had been replaced with improved commercial software.

<span class="mw-page-title-main">Room 641A</span> Telecommunication interception facility

Room 641A is a telecommunication interception facility operated by AT&T for the U.S. National Security Agency, as part of its warrantless surveillance program as authorized by the Patriot Act. The facility commenced operations in 2003 and its purpose was publicly revealed by AT&T technician Mark Klein in 2006.

The Automated Targeting System (ATS) is a United States Department of Homeland Security computerized system that, for every person who crosses U.S. borders, scrutinizes a large volume of data related to that person, and then automatically assigns a rating for which the expectation is that it helps gauge whether this person may be placed within a risk group of terrorists or other criminals. Similarly ATS analyzes data related to container cargo.

Investigative Data Warehouse (IDW) is a searchable database operated by the FBI. It was created in 2004. Much of the nature and scope of the database is classified. The database is a centralization of multiple federal and state databases, including criminal records from various law enforcement agencies, the U.S. Department of the Treasury's Financial Crimes Enforcement Network (FinCEN), and public records databases. According to Michael Morehart's testimony before the House Committee on Financial Services in 2006, the "IDW is a centralized, web-enabled, closed system repository for intelligence and investigative data. This system, maintained by the FBI, allows appropriately trained and authorized personnel throughout the country to query for information of relevance to investigative and intelligence matters."

A government database collects information for various reasons, including climate monitoring, securities law compliance, geological surveys, patent applications and grants, surveillance, national security, border control, law enforcement, public health, voter registration, vehicle registration, social security, and statistics.

<span class="mw-page-title-main">Digital Collection System Network</span>

The Digital Collection System Network (DCSNet) is the Federal Bureau of Investigation (FBI)'s point-and-click surveillance system that can perform instant wiretaps on almost any telecommunications device in the United States.

<span class="mw-page-title-main">FBI Index</span> System used to track American citizens and other people

The FBI Indexes, or Index List, was a system used to track American citizens and other people by the Federal Bureau of Investigation (FBI) before the adoption of computerized databases. The Index List was originally made of paper index cards, first compiled by J. Edgar Hoover at the Bureau of Investigations before he was appointed director of the FBI. The Index List was used to track U.S. citizens and others believed by the FBI to be dangerous to national security, and was subdivided into various divisions which generally were rated based on different classes of danger the subject was thought to represent.

<span class="mw-page-title-main">Mass surveillance in the United States</span>

The practice of mass surveillance in the United States dates back to wartime monitoring and censorship of international communications from, to, or which passed through the United States. After the First and Second World Wars, mass surveillance continued throughout the Cold War period, via programs such as the Black Chamber and Project SHAMROCK. The formation and growth of federal law-enforcement and intelligence agencies such as the FBI, CIA, and NSA institutionalized surveillance used to also silence political dissent, as evidenced by COINTELPRO projects which targeted various organizations and individuals. During the Civil Rights Movement era, many individuals put under surveillance orders were first labelled as integrationists, then deemed subversive, and sometimes suspected to be supportive of the communist model of the United States' rival at the time, the Soviet Union. Other targeted individuals and groups included Native American activists, African American and Chicano liberation movement activists, and anti-war protesters.

MuckRock is a United States-based 501(c)(3) non-profit organization which assists anyone in filing governmental requests for information through the Freedom of Information Act (FOIA) and other public record laws around the United States, then publishes the returned information on its website and encourages journalism around it.

The Narcotics and Dangerous Drugs Information System, or NADDIS, is a data index and collection system operated by the United States Drug Enforcement Administration (DEA). Comprising millions of DEA reports and records on individuals, NADDIS is a system by which intelligence analysts, investigators and others in law enforcement retrieve reports from the DEA's Investigative Filing and Reporting System (IFRS). NADDIS is thought to have become the most widely used, if least known, tool in drug law enforcement.

HTTPS Everywhere is a discontinued free and open-source browser extension for Google Chrome, Microsoft Edge, Mozilla Firefox, Opera, Brave, Vivaldi and Firefox for Android, which was developed collaboratively by The Tor Project and the Electronic Frontier Foundation (EFF). It automatically makes websites use a more secure HTTPS connection instead of HTTP, if they support it. The option "Encrypt All Sites Eligible" makes it possible to block and unblock all non-HTTPS browser connections with one click. Due to the widespread adoption of HTTPS on the World Wide Web, and the integration of HTTPS-only mode on major browsers, the extension was retired in January 2023.

<span class="mw-page-title-main">Machine Identification Code</span> Digital watermark which certain printers leave

A Machine Identification Code (MIC), also known as printer steganography, yellow dots, tracking dots or secret dots, is a digital watermark which certain color laser printers and copiers leave on every printed page, allowing identification of the device which was used to print a document and giving clues to the originator. Developed by Xerox and Canon in the mid-1980s, its existence became public only in 2004. In 2018, scientists developed privacy software to anonymize prints in order to support whistleblowers publishing their work.

<span class="mw-page-title-main">EFAIL</span> Email security vulnerability

Efail, also written EFAIL, is a security hole in email systems with which content can be transmitted in encrypted form. This gap allows attackers to access the decrypted content of an email if it contains active content like HTML or JavaScript, or if loading of external content has been enabled in the client. Affected email clients include Gmail, Apple Mail, and Microsoft Outlook.

Spectre AI Incorporated was a private software company that served various government agencies and defense contractors in the early 2000s. The company is notable for having developed and deployed the first functional presence engine in 2001. Spectre AI's initial contract with Raytheon was the direct result of a request on the floor of the United States Congress on September 13, 2001 by Senators Judd Gregg and Ernest Hollings. This was a surprising amount of access given the company was started one year earlier in Spokane, Washington.

<span class="mw-page-title-main">Peter Eckersley (computer scientist)</span> Australian computer scientist (1978/1979–2022)

Peter Daniel Eckersley was an Australian computer scientist, computer security researcher and activist. From 2006 to 2018, he worked at the Electronic Frontier Foundation, including as chief computer scientist and head of AI policy. In 2018, he left the EFF to become director of research at the Partnership on AI, a position he held until 2020. In 2021, he co-founded the AI Objectives Institute.