Developer(s) | Megaputer Intelligence |
---|---|
Initial release | 1994[1] |
Stable release | 6.5 |
Type | Data science, artificial intelligence, text mining, predictive analytics |
Website | www |
PolyAnalyst is a data science software platform developed by Megaputer Intelligence that provides an environment for text mining, data mining, machine learning, and predictive analytics. It is used by Megaputer to build tools with applications to health care, business management, insurance, and other industries. PolyAnalyst has also been used for COVID-19 forecasting and scientific research.
PolyAnalyst's graphical user interface contains nodes that can be linked into a flowchart to perform an analysis. The software provides nodes for data import, data preparation, data visualization, data analysis, and data export. [2] [3] PolyAnalyst includes features for text clustering, sentiment analysis, extraction of facts, keywords, and entities, and the creation of taxonomies and ontologies. Polyanalyst supports a variety of machine learning algorithms, as well as nodes for the analysis of structured data and the ability to execute code in Python and R. [4] [5] PolyAnalyst also acts as a report generator, which allows the result of an analysis to be made viewable by non-analysts. [6] It uses a client–server model and is licensed under a software as a service model. [6]
PolyAnalyst was used to build a subrogation prediction tool which determines the likelihood that a claim is subrogatable, and if so, the amount that is expected to be recovered.[ citation needed ] The tool works by categorizing insurance claims based on whether or not they meet the criteria that are needed for successful subrogation.[ citation needed ] PolyAnalyst is also used to detect insurance fraud. [7]
PolyAnalyst is used by pharmaceutical companies to assist in pharmacovigilance. The software was used to design a tool that matches descriptions of adverse events to their proper MedDRA codes, determines if side effects are serious or non-serious, and to set up cases for ongoing monitoring if needed. [8] PolyAnalyst has also been applied to discover new uses for existing drugs by text mining ClinicalTrials.gov, [9] and to forecast the spread of the COVID-19 virus in the United States and Russia. [10] [11]
PolyAnalyst is used in business management to analyze written customer feedback including product review data, warranty claims, and customer comments. [12] In one case, PolyAnalyst was used to build a tool which helped a company monitor its employees' conversations with customers by rating their messages for factors such as professionalism, empathy, and correctness of response. The company reported to Forrester Research that this tool had saved them $11.8 million annually. [13]
PolyAnalyst is run on the SKIF Cyberia Supercomputer at Tomsk State University, where it is made available to Russian researchers through the Center for Collective Use (CCU). Researchers at the center use PolyAnalyst to perform scientific research and to management the operations of their universities. [14] In 2020, researchers at Vyatka State University (in collaboration with the CCU) performed a study in which PolyAnalyst was used to identify and reach out to victims of domestic violence through social media analysis. The researchers scraped the web for messages containing descriptions of abuse, and then classified the type of abuse as physical, psychological, economic, or sexual. They also constructed a chatbot to contact the identified victims of abuse and to refer them to specialists based on the type of abuse described in their messages. The data collected in this study was used to create the first ever Russian-language corpus on domestic violence. [15] [16]
Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Business intelligence consists of strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.
JMP is a suite of computer programs for statistical analysis developed by JMP, a subsidiary of SAS Institute. It was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating systems. It has since been significantly rewritten and made available also for the Windows operating system. JMP is used in applications such as Six Sigma, quality control, and engineering, design of experiments, as well as for research in science, engineering, and social sciences.
Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.
The analysis of competing hypotheses (ACH) is a methodology for evaluating multiple competing hypotheses for observed data. It was developed by Richards (Dick) J. Heuer, Jr., a 45-year veteran of the Central Intelligence Agency, in the 1970s for use by the Agency. ACH is used by analysts in various fields who make judgments that entail a high risk of error in reasoning. ACH aims to help an analyst overcome, or at least minimize, some of the cognitive limitations that make prescient intelligence analysis so difficult to achieve.
Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, acquired by Datawatch and now owned by Altair, was a provider of predictive analytics systems through software licensing and services. Angoss' customers represent industries including finance, insurance, mutual funds, retail, health sciences, telecom and technology. The company was founded in 1984, and publicly traded on the TSX Venture Exchange from 2008-2013 under the ticker symbol ANC.
IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.
Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.
KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with only minimal, programming.
The fields of marketing and artificial intelligence converge in systems which assist in areas such as market forecasting, and automation of processes and decision making, along with increased efficiency of tasks which would usually be performed by humans. The science behind these systems can be explained through neural networks and expert systems, computer programs that process input and provide valuable output for marketers.
Maltego is a link analysis software used for open-source intelligence, forensics and other investigations, originally developed by Paterva from Pretoria, South Africa. Maltego offers real-time data mining and information gathering, as well as the representation of this information on a node-based graph, making patterns and multiple order connections between said information easily identifiable. In 2019, the team of Maltego Technologies headquartered in Munich, Germany took over responsibility for all global customer-facing operations, and in 2023 complete technology development and management.
NodeXL is a network analysis and visualization software package for Microsoft Excel 2007/2010/2013/2016. The package is similar to other network visualization tools such as Pajek, UCINet, and Gephi. It is widely applied in ring, mapping of vertex and edge, and customizable visual attributes and tags. NodeXL enables researchers to undertake social network analysis work metrics such as centrality, degree, and clustering, as well as monitor relational data and describe the overall relational network structure. When applied to Twitter data analysis, it showed the total network of all users participating in public discussion and its internal structure through data mining. It allows social Network analysis (SNA) to emphasize the relationships rather than the isolated individuals or organizations, allowing interested parties to investigate the two-way dialogue between organizations and the public. SNA also provides a flexible measurement system and parameter selection to confirm the influential nodes in the network, such as in-degree and out-degree centrality. The software contains network visualization, social network analysis features, access to social media network data importers, advanced network metrics, and automation.
Tanagra is a free suite of machine learning software for research and academic purposes developed by Ricco Rakotomalala at the Lumière University Lyon 2, France. · Tanagra supports several standard data mining tasks such as: Visualization, Descriptive statistics, Instance selection, feature selection, feature construction, regression, factor analysis, clustering, classification and association rule learning.
Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.
KH Coder is an open source software for computer assisted qualitative data analysis, particularly quantitative content analysis and text mining. It can be also used for computational linguistics. It supports processing and etymological information of text in several languages, such as Japanese, English, French, German, Italian, Portuguese and Spanish. Specifically, it can contribute factual examination co-event system hub structure, computerized arranging guide, multidimensional scaling and comparative calculations. Word frequency statistics, part-of-speech analysis, grouping, correlation analysis, and visualization are among the features offered by KH Coder.
Hancock is a C-based programming language, first developed by researchers at AT&T Labs in 1998, to analyze data streams. The language was intended by its creators to improve the efficiency and scale of data mining. Hancock works by creating profiles of individuals, utilizing data to provide behavioral and social network information.
Vyatka State University is a public university in Kirov, the capital city of Kirov Oblast in Russia. It was created in its current form in 2016 with the merger of the Vyatka State Humanitarian University and the Vyatka State Technological University.