Developer(s) | Provalis Research |
---|---|
Initial release | 1998 |
Stable release | 9 |
Operating system | Microsoft Windows |
Available in | Multilingual |
Type | Text mining, Content analysis, Text analytics, Sentiment Analysis |
License | Proprietary software |
Website | www |
WordStat is a content analysis and text mining software. [1] It was first released in 1998 after being developed by Normand Peladeau from Provalis Research. The latest version 9 was released in 2021.
The software is mainly used for business intelligence and competitive analysis of web sites, sentiment analysis, content analysis of open-ended questions, theme extraction from social media data, etc.
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.
A tag cloud is a visual representation of text data which is often used to depict keyword metadata on websites, or to visualize free form text. Tags are usually single words, and the importance of each tag is shown with font size or color. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.
Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.
Knowledge management software is a subset of content management software, which contains a range of software that specializes in the way information is collected, stored and/or accessed. The concept of knowledge management is based on a range of practices used by an individual, a business, or a large corporation to identify, create, represent and redistribute information for a range of purposes. Software that enables an information practice or range of practices at any part of the processes of information management can be deemed to be called information management software. A subset of information management software that emphasizes an approach to build knowledge out of information that is managed or contained is often called knowledge management software.
GGobi is a free statistical software tool for interactive data visualization. GGobi allows extensive exploration of the data with Interactive dynamic graphics. It is also a tool for looking at multivariate data. R can be used in sync with GGobi. The GGobi software can be embedded as a library in other programs and program packages using an application programming interface (API) or as an add-on to existing languages and scripting environments, e.g., with the R command line or from a Perl or Python scripts. GGobi prides itself on its ability to link multiple graphs together.
Audio mining is a technique by which the content of an audio signal can be automatically analyzed and searched. It is most commonly used in the field of automatic speech recognition, where the analysis tries to identify any speech within the audio. The term ‘audio mining’ is sometimes used interchangeably with audio indexing, phonetic searching, phonetic indexing, speech indexing, audio analytics, speech analytics, word spotting, and information retrieval. Audio indexing, however, is mostly used to describe the pre-process of audio mining, in which the audio file is broken down into a searchable index of words.
Document clustering is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering.
Co-occurrence network, sometimes referred to as a semantic network, is a method to analyze text that includes a graphic visualization of potential relationships between people, organizations, concepts, biological organisms like bacteria or other entities represented within written material. The generation and visualization of co-occurrence networks has become practical with the advent of electronically stored text compliant to text mining.
A concept search is an automated information retrieval method that is used to search electronically stored unstructured text for information that is conceptually similar to the information provided in a search query. In other words, the ideas expressed in the information retrieved in response to a concept search query are relevant to the ideas contained in the text of the query.
Patent visualisation is an application of information visualisation. The number of patents has been increasing, encouraging companies to consider intellectual property as a part of their strategy. Patent visualisation, like patent mapping, is used to quickly view a patent portfolio.
IBM Cognos Analytics with Watson is a web-based integrated business intelligence suite by IBM. It provides a toolset for reporting, analytics, scorecarding, and monitoring of events and metrics. The software consists of several components designed to meet the different information requirements in a company. IBM Cognos Analytics has components such as IBM Cognos Framework Manager, IBM Cognos Cube Designer, IBM Cognos Transformer.
Catpac is a computer program that analyzes text samples to identify key concepts contained within the sample. It was conceived chiefly by Richard Holmes, a Michigan State computer programmer and Dr. Joseph Woelfel, a University at Albany and University at Buffalo sociologist for the analysis of attitude formation and change in the sociological context. Contributions by Rob Zimmelman, an undergraduate and graduate student at the University of Albany, from 1981 to 1984 on the Univac 1100 mainframe, included the inclusion of the CATPAC software in the Galileo*Telegal system, text-labeling and porting of CATPAC output for the Galileo system of paired-comparison conceptual visualization. CATPAC and the Galileo system are still in commercial use today, and with recent data capture and visualization contributions, continues to grow. Contributions by other students at the university resulted in the software that is still in commercial use today. It uses text files as input and produces output such as word and alphabetical frequencies as well as various types of cluster analysis.
QDA Miner is mixed methods and qualitative data analysis software developed by Provalis Research. The program was designed to assist researchers in managing, coding and analyzing qualitative data.
Provalis Research is a Canadian company that specializes in developing and marketing text analytics tools combining qualitative analysis through QDA Miner with quantitative content analysis and text-mining through WordStat. Headquartered in Montreal, the company was founded in 1989 by the current president Normand Peladeau.
The following outline is provided as an overview of and topical guide to natural-language processing:
Vaa3D is an Open Source visualization and analysis software suite created mainly by Hanchuan Peng and his team at Janelia Research Campus, HHMI and Allen Institute for Brain Science. The software performs 3D, 4D and 5D rendering and analysis of very large image data sets, especially those generated using various modern microscopy methods, and associated 3D surface objects. This software has been used in several large neuroscience initiatives and a number of applications in other domains. In a recent Nature Methods review article, it has been viewed as one of the leading open-source software suites in the related research fields. In addition, research using this software was awarded the 2012 Cozzarelli Prize from the National Academy of Sciences.
Online content analysis or online textual analysis refers to a collection of research techniques used to describe and make inferences about online material through systematic coding and interpretation. Online content analysis is a form of content analysis for analysis of Internet-based communication.
KH Coder is an open source software for computer assisted qualitative data analysis, particularly quantitative content analysis and text mining. It can be also used for computational linguistics. It supports processing and etymological information of text in several languages, such as Japanese, English, French, German, Italian, Portuguese and Spanish. Specifically, it can contribute factual examination co-event system hub structure, computerized arranging guide, multidimensional scaling and comparative calculations. Word frequency statistics, part-of-speech analysis, grouping, correlation analysis, and visualization are among the features offered by KH Coder.