Developer(s) | Megaputer Intelligence |
---|---|
Initial release | 1994[1] |
Stable release | 6.5 |
Type | Data science, artificial intelligence, text mining, predictive analytics |
Website | www |
PolyAnalyst is a data science software platform developed by Megaputer Intelligence that provides an environment for text mining, data mining, machine learning, and predictive analytics. It is used by Megaputer to build tools with applications to health care, business management, insurance, and other industries. PolyAnalyst has also been used for COVID-19 forecasting and scientific research.
PolyAnalyst's graphical user interface contains nodes that can be linked into a flowchart to perform an analysis. The software provides nodes for data import, data preparation, data visualization, data analysis, and data export. [2] [3] PolyAnalyst includes features for text clustering, sentiment analysis, extraction of facts, keywords, and entities, and the creation of taxonomies and ontologies. Polyanalyst supports a variety of machine learning algorithms, as well as nodes for the analysis of structured data and the ability to execute code in Python and R. [4] [5] PolyAnalyst also acts as a report generator, which allows the result of an analysis to be made viewable by non-analysts. [6] It uses a client–server model and is licensed under a software as a service model. [6]
PolyAnalyst was used to build a subrogation prediction tool which determines the likelihood that a claim is subrogatable, and if so, the amount that is expected to be recovered.[ citation needed ] The tool works by categorizing insurance claims based on whether or not they meet the criteria that are needed for successful subrogation.[ citation needed ] PolyAnalyst is also used to detect insurance fraud. [7]
PolyAnalyst is used by pharmaceutical companies to assist in pharmacovigilance. The software was used to design a tool that matches descriptions of adverse events to their proper MedDRA codes, determines if side effects are serious or non-serious, and to set up cases for ongoing monitoring if needed. [8] PolyAnalyst has also been applied to discover new uses for existing drugs by text mining ClinicalTrials.gov, [9] and to forecast the spread of the COVID-19 virus in the United States and Russia. [10] [11]
PolyAnalyst is used in business management to analyze written customer feedback including product review data, warranty claims, and customer comments. [12] In one case, PolyAnalyst was used to build a tool which helped a company monitor its employees' conversations with customers by rating their messages for factors such as professionalism, empathy, and correctness of response. The company reported to Forrester Research that this tool had saved them $11.8 million annually. [13]
PolyAnalyst is run on the SKIF Cyberia Supercomputer at Tomsk State University, where it is made available to Russian researchers through the Center for Collective Use (CCU). Researchers at the center use PolyAnalyst to perform scientific research and to management the operations of their universities. [14] In 2020, researchers at Vyatka State University (in collaboration with the CCU) performed a study in which PolyAnalyst was used to identify and reach out to victims of domestic violence through social media analysis. The researchers scraped the web for messages containing descriptions of abuse, and then classified the type of abuse as physical, psychological, economic, or sexual. They also constructed a chatbot to contact the identified victims of abuse and to refer them to specialists based on the type of abuse described in their messages. The data collected in this study was used to create the first ever Russian-language corpus on domestic violence. [15] [16]
Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005), there are three perspectives of text mining: information extraction, data mining, and knowledge discovery in databases (KDD). Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.
JMP is a suite of computer programs for statistical analysis and machine learning developed by JMP, a subsidiary of SAS Institute. The program was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating systems. It has since been significantly rewritten and made available for the Windows operating system.
Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.
Sentiment analysis is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, also more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly.
Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, acquired by Datawatch and now owned by Altair, was a provider of predictive analytics systems through software licensing and services. Angoss' customers represent industries including finance, insurance, mutual funds, retail, health sciences, telecom and technology. The company was founded in 1984, and publicly traded on the TSX Venture Exchange from 2008-2013 under the ticker symbol ANC.
IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.
Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.
KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with minimal, programming.
Patent visualisation is an application of information visualisation. The number of patents has been increasing, encouraging companies to consider intellectual property as a part of their strategy. Patent visualisation, like patent mapping, is used to quickly view a patent portfolio.
The fields of marketing and artificial intelligence converge in systems which assist in areas such as market forecasting, and automation of processes and decision making, along with increased efficiency of tasks which would usually be performed by humans. The science behind these systems can be explained through neural networks and expert systems, computer programs that process input and provide valuable output for marketers.
NodeXL is a network analysis and visualization software package for Microsoft Excel 2007/2010/2013/2016. The package is similar to other network visualization tools such as Pajek, UCINet, and Gephi. It is widely applied in ring, mapping of vertex and edge, and customizable visual attributes and tags. NodeXL enables researchers to undertake social network analysis work metrics such as centrality, degree, and clustering, as well as monitor relational data and describe the overall relational network structure. When applied to Twitter data analysis, it showed the total network of all users participating in public discussion and its internal structure through data mining. It allows social Network analysis (SNA) to emphasize the relationships rather than the isolated individuals or organizations, allowing interested parties to investigate the two-way dialogue between organizations and the public. SNA also provides a flexible measurement system and parameter selection to confirm the influential nodes in the network, such as in-degree and out-degree centrality. The software contains network visualization, social network analysis features, access to social media network data importers, advanced network metrics, and automation.
Tanagra is a free suite of machine learning software for research and academic purposes developed by Ricco Rakotomalala at the Lumière University Lyon 2, France. Tanagra supports several standard data mining tasks such as: Visualization, Descriptive statistics, Instance selection, feature selection, feature construction, regression, factor analysis, clustering, classification and association rule learning.
Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services.
Social media analytics or social media monitoring is the process of gathering and analyzing data from social networks such as Facebook, Instagram, LinkedIn, or Twitter. A part of social media analytics is called social media monitoring or social listening. It is commonly used by marketers to track online conversations about products and companies. One author defined it as "the art and science of extracting valuable hidden insights from vast amounts of semi-structured and unstructured social media data to enable informed and insightful decision-making."
KH Coder is an open source software for computer assisted qualitative data analysis, particularly quantitative content analysis and text mining. It can be also used for computational linguistics. It supports processing and etymological information of text in several languages, such as Japanese, English, French, German, Italian, Portuguese and Spanish. Specifically, it can contribute factual examination co-event system hub structure, computerized arranging guide, multidimensional scaling and comparative calculations. Word frequency statistics, part-of-speech analysis, grouping, correlation analysis, and visualization are among the features offered by KH Coder.
Hancock is a C-based programming language, first developed by researchers at AT&T Labs in 1998, to analyze data streams. The language was intended by its creators to improve the efficiency and scale of data mining. Hancock works by creating profiles of individuals, utilizing data to provide behavioral and social network information.
Vyatka State University is a public university in Kirov, the capital city of Kirov Oblast in Russia. It was created in its current form in 2016 with the merger of the Vyatka State Humanitarian University and the Vyatka State Technological University.