ClearForest

Last updated
ClearForest Corporation
Private
Industry computer software
Founded1998
Headquarters Waltham, MA,
Key people
Barak Pridor
(CEO)
Dr. Yonatan Aumann
(Cofounder and VP of Product Strategy)
ProductsClearForest Text Analytics Suite
Number of employees
60+ (2006)
Website www.clearforest.com

ClearForest was an Israeli software company that developed and marketed text analytics and text mining solutions.

Contents

History

Founded in 1998, ClearForest had its headquarters just outside Boston and a development center in Or Yehuda. [1] The company was acquired by Reuters in April, 2007. It now markets its services under the names Calais, OpenCalais, and OneCalais.

ClearForest was previously venture-backed; its last funding round was led by Greylock Ventures and closed in 2005. Other investors included DB Capital Partners, Pitango, Walden Israel, Booz Allen, JP Morgan Partners and HarbourVest Partners.

On February 7, 2008 Reuters announced the launch of Open Calais, [2] a named-entity recognition and semantic analysis service that uses ClearForest technology.

On April 30, 2007, Reuters [3] announced that it would acquire ClearForest. Sources estimate the acquisition to be for $25 Million.

Solutions and Products

ClearForest offers several hosted solutions, including:

ClearForest also offers Text Analytics solutions targeted at specific business problems, including:

See also

Related Research Articles

The Semantic Web is an extension of the World Wide Web through standards set by the World Wide Web Consortium (W3C). The goal of the Semantic Web is to make Internet data machine-readable. To enable the encoding of semantics with the data, technologies such as Resource Description Framework (RDF) and Web Ontology Language (OWL) are used. These technologies are used to formally represent metadata. For example, ontology can describe concepts, relationships between entities, and categories of things. These embedded semantics offer significant advantages such as reasoning over data and operating with heterogeneous data sources.

Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current, and predictive views of business operations. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics. BI technologies can handle large amounts of structured and sometimes unstructured data to help identify, develop, and otherwise create new strategic business opportunities. They aim to allow for the easy interpretation of these big data. Identifying new opportunities and implementing an effective strategy based on insights can provide businesses with a competitive market advantage and long-term stability.

Text mining, also referred to as text data mining, similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can differ three different perspectives of text mining: information extraction, data mining, and a KDD process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction

Founded in 1969, Software AG is an enterprise software company with over 10,000 enterprise customers in over 70 countries. The company is the second largest software vendor in Germany, and the seventh largest in Europe. Software AG is traded on the Frankfurt Stock Exchange under the symbol “SOW” and part of the technology index TecDAX.

The International Press Telecommunications Council (IPTC), based in London, United Kingdom, is a consortium of the world's major news agencies, other news providers and news industry vendors and acts as the global standards body of the news media.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorised terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

Reuters Group plc was a British multinational media and financial information company headquartered in London, United Kingdom. It was acquired by the Thomson Corporation in 2008, forming Thomson Reuters, and moving its head office to Toronto.

Microformats are a set of defined HTML classes created to serve as consistent and descriptive metadata about an element, designating it as representing a certain type of data. They allow software to process the information reliably by having set classes refer to a specific type of data rather than being arbitrary. Microformats emerged around 2005 and were predominantly designed for use by search engines and aggregators such as RSS.

Alfresco Software open-source content-management system

Alfresco is a collection of information management software products for Microsoft Windows and Unix-like operating systems developed by Alfresco Software Inc. using Java technology. Their primary software offering, branded as a Digital Business Platform is proprietary & a commercially licensed open source platform, supports open standards, and provides enterprise scale.

Ontotext is a Bulgarian software company headquartered in Sofia. It is the semantic technology branch of Sirma Group. Its main domain of activity is the development of software based on the Semantic Web languages and standards, in particular RDF, OWL and SPARQL. Ontotext is best known for the Ontotext GraphDB semantic graph database engine. Another major business line is the development of enterprise knowledge management and analytics systems that involve big knowledge graphs. Those systems are developed on top of the Ontotext Platform that builds on top of GraphDB capabilities for text mining using big knowledge graphs.

Thomson Reuters Canada-based media company

Thomson Reuters Corporation is a Canadian multinational media conglomerate. The company was founded in Toronto, Ontario, Canada, where it is headquartered at 333 Bay Street.

TIBCO Software American company

TIBCO Software Inc. is a company that specializes in big data and software integrations.

DBpedia online database project

DBpedia is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets. Tim Berners-Lee described DBpedia as one of the most famous parts of the decentralized Linked Data effort.

Calais is a service by Thomson Reuters that automatically extracts semantic information from web pages in a format that can be used on the semantic web. Calais was launched in January 2008, and is free to use.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

intergator is an Information Access Platform and a product suite for Enterprise Search. It is an intelligent systemwide search engine, knowledge management- and analytics platform. The fifth generation focusses on Enterprise Search - Next Generation, Big Content Analytics, Knowledge Capturing and Social Intranet. It is developed by interface projects GmbH, a subsidiary of the interface business group with its head office in Dresden, Sachsen.

Smartlogic is a software company which specializes in developing information retrieval, text analytics and knowledge management solutions.

The PoolParty Semantic Suite is a technology platform provided by the Semantic Web Company. The EU-based company belongs to the early pioneers of the Semantic Web movement. The software supports enterprises in knowledge management, data analytics and content organisation. The product uses standards-based technologies as defined by W3C, which prevents vendor lock-in. Reference customers are among others Boehringer Ingelheim, Credit Suisse, European Commission, REEEP, Wolters Kluwer and the World Bank Group.

Clarivate American analytics company

Clarivate is a Philadelphia and London-based company formed in 2016, following the acquisition of Thomson Reuters' Intellectual Property and Science Business by Onex Corporation and Baring Private Equity Asia. On May 13, 2019, Clarivate merged with Churchill Capital.

References

  1. ClearForest has developed text analysis solutions for business intelligence applications.
  2. "Calais". Archived from the original on October 24, 2008. Retrieved September 1, 2010.
  3. Eric Auchard (30 April 2007). "Reuters to acquire text search firm ClearForest" . Retrieved September 1, 2010.