PolyAnalyst

Last updated

PolyAnalyst
Developer(s) Megaputer Intelligence
Initial release1994;30 years ago (1994) [1]
Stable release
6.5
Type Data science, artificial intelligence, text mining, predictive analytics
Website www.megaputer.com

PolyAnalyst is a data science software platform developed by Megaputer Intelligence that provides an environment for text mining, data mining, machine learning, and predictive analytics. It is used by Megaputer to build tools with applications to health care, business management, insurance, and other industries. PolyAnalyst has also been used for COVID-19 forecasting and scientific research.

Contents

Overview

A screenshot of a PolyAnalyst flowchart showing the use of a convolutional neural network node. Convolutional-neural-network-polyanalyst-flowchart-example.png
A screenshot of a PolyAnalyst flowchart showing the use of a convolutional neural network node.

PolyAnalyst's graphical user interface contains nodes that can be linked into a flowchart to perform an analysis. The software provides nodes for data import, data preparation, data visualization, data analysis, and data export. [2] [3] PolyAnalyst includes features for text clustering, sentiment analysis, extraction of facts, keywords, and entities, and the creation of taxonomies and ontologies. Polyanalyst supports a variety of machine learning algorithms, as well as nodes for the analysis of structured data and the ability to execute code in Python and R. [4] [5] PolyAnalyst also acts as a report generator, which allows the result of an analysis to be made viewable by non-analysts. [6] It uses a client–server model and is licensed under a software as a service model. [6]

Business Applications

Insurance

PolyAnalyst was used to build a subrogation prediction tool which determines the likelihood that a claim is subrogatable, and if so, the amount that is expected to be recovered.[ citation needed ] The tool works by categorizing insurance claims based on whether or not they meet the criteria that are needed for successful subrogation.[ citation needed ] PolyAnalyst is also used to detect insurance fraud. [7]

Health care

A heat map showing Megaputer's COVID-19 forecast Covid19-confirmedcases 2020-07-02-1.png
A heat map showing Megaputer's COVID-19 forecast

PolyAnalyst is used by pharmaceutical companies to assist in pharmacovigilance. The software was used to design a tool that matches descriptions of adverse events to their proper MedDRA codes, determines if side effects are serious or non-serious, and to set up cases for ongoing monitoring if needed. [8] PolyAnalyst has also been applied to discover new uses for existing drugs by text mining ClinicalTrials.gov, [9] and to forecast the spread of the COVID-19 virus in the United States and Russia. [10] [11]

Business management

PolyAnalyst is used in business management to analyze written customer feedback including product review data, warranty claims, and customer comments. [12] In one case, PolyAnalyst was used to build a tool which helped a company monitor its employees' conversations with customers by rating their messages for factors such as professionalism, empathy, and correctness of response. The company reported to Forrester Research that this tool had saved them $11.8 million annually. [13]

SKIF Cyberia Supercomputer

PolyAnalyst is run on the SKIF Cyberia Supercomputer at Tomsk State University, where it is made available to Russian researchers through the Center for Collective Use (CCU). Researchers at the center use PolyAnalyst to perform scientific research and to management the operations of their universities. [14] In 2020, researchers at Vyatka State University (in collaboration with the CCU) performed a study in which PolyAnalyst was used to identify and reach out to victims of domestic violence through social media analysis. The researchers scraped the web for messages containing descriptions of abuse, and then classified the type of abuse as physical, psychological, economic, or sexual. They also constructed a chatbot to contact the identified victims of abuse and to refer them to specialists based on the type of abuse described in their messages. The data collected in this study was used to create the first ever Russian-language corpus on domestic violence. [15] [16]

Related Research Articles

Data mining is the process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal of extracting information from a data set and transforming the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Business intelligence consists of strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

JMP is a suite of computer programs for statistical analysis developed by JMP, a subsidiary of SAS Institute. It was launched in 1989 to take advantage of the graphical user interface introduced by the Macintosh operating systems. It has since been significantly rewritten and made available also for the Windows operating system. JMP is used in applications such as Six Sigma, quality control, and engineering, design of experiments, as well as for research in science, engineering, and social sciences.

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated in documents.

<span class="mw-page-title-main">Analysis of competing hypotheses</span> Process to evaluate alternative hypotheses

The analysis of competing hypotheses (ACH) is a methodology for evaluating multiple competing hypotheses for observed data. It was developed by Richards (Dick) J. Heuer, Jr., a 45-year veteran of the Central Intelligence Agency, in the 1970s for use by the Agency. ACH is used by analysts in various fields who make judgments that entail a high risk of error in reasoning. ACH aims to help an analyst overcome, or at least minimize, some of the cognitive limitations that make prescient intelligence analysis so difficult to achieve.

<span class="mw-page-title-main">Angoss</span> Software company in Canada

Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, acquired by Datawatch and now owned by Altair, was a provider of predictive analytics systems through software licensing and services. Angoss' customers represent industries including finance, insurance, mutual funds, retail, health sciences, telecom and technology. The company was founded in 1984, and publicly traded on the TSX Venture Exchange from 2008-2013 under the ticker symbol ANC.

<span class="mw-page-title-main">SPSS Modeler</span> Data analytics software

IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.

Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics. They offer applicable and successful solutions in different areas of electronic fraud crimes.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with only minimal, programming.

The fields of marketing and artificial intelligence converge in systems which assist in areas such as market forecasting, and automation of processes and decision making, along with increased efficiency of tasks which would usually be performed by humans. The science behind these systems can be explained through neural networks and expert systems, computer programs that process input and provide valuable output for marketers.

<span class="mw-page-title-main">Maltego</span> Data mining and link analysis software

Maltego is a link analysis software used for open-source intelligence, forensics and other investigations, originally developed by Paterva from Pretoria, South Africa. Maltego offers real-time data mining and information gathering, as well as the representation of this information on a node-based graph, making patterns and multiple order connections between said information easily identifiable. In 2019, the team of Maltego Technologies headquartered in Munich, Germany took over responsibility for all global customer-facing operations, and in 2023 complete technology development and management.

<span class="mw-page-title-main">NodeXL</span> Network analysis and visualization package for Microsoft Excel

NodeXL is a network analysis and visualization software package for Microsoft Excel 2007/2010/2013/2016. The package is similar to other network visualization tools such as Pajek, UCINet, and Gephi. It is widely applied in ring, mapping of vertex and edge, and customizable visual attributes and tags. NodeXL enables researchers to undertake social network analysis work metrics such as centrality, degree, and clustering, as well as monitor relational data and describe the overall relational network structure. When applied to Twitter data analysis, it showed the total network of all users participating in public discussion and its internal structure through data mining. It allows social Network analysis (SNA) to emphasize the relationships rather than the isolated individuals or organizations, allowing interested parties to investigate the two-way dialogue between organizations and the public. SNA also provides a flexible measurement system and parameter selection to confirm the influential nodes in the network, such as in-degree and out-degree centrality. The software contains network visualization, social network analysis features, access to social media network data importers, advanced network metrics, and automation.

Tanagra is a free suite of machine learning software for research and academic purposes developed by Ricco Rakotomalala at the Lumière University Lyon 2, France. · Tanagra supports several standard data mining tasks such as: Visualization, Descriptive statistics, Instance selection, feature selection, feature construction, regression, factor analysis, clustering, classification and association rule learning.

Social media mining is the process of obtaining big data from user-generated content on social media sites and mobile apps in order to extract actionable patterns, form conclusions about users, and act upon the information, often for the purpose of advertising to users or conducting research. The term is an analogy to the resource extraction process of mining for rare minerals. Resource extraction mining requires mining companies to shift through vast quantities of raw ore to find the precious minerals; likewise, social media mining requires human data analysts and automated software programs to shift through massive amounts of raw social media data in order to discern patterns and trends relating to social media usage, online behaviours, sharing of content, connections between individuals, online buying behaviour, and more. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as these organizations can use these patterns and trends to design their strategies or introduce new programs, new products, processes or services.

<span class="mw-page-title-main">KH Coder</span> Qualitative data analysis software

KH Coder is an open source software for computer assisted qualitative data analysis, particularly quantitative content analysis and text mining. It can be also used for computational linguistics. It supports processing and etymological information of text in several languages, such as Japanese, English, French, German, Italian, Portuguese and Spanish. Specifically, it can contribute factual examination co-event system hub structure, computerized arranging guide, multidimensional scaling and comparative calculations. Word frequency statistics, part-of-speech analysis, grouping, correlation analysis, and visualization are among the features offered by KH Coder.

<span class="mw-page-title-main">Hancock (programming language)</span> Programming language intended for data mining

Hancock is a C-based programming language, first developed by researchers at AT&T Labs in 1998, to analyze data streams. The language was intended by its creators to improve the efficiency and scale of data mining. Hancock works by creating profiles of individuals, utilizing data to provide behavioral and social network information.

<span class="mw-page-title-main">Vyatka State University</span> University in Kirov, Russia

Vyatka State University is a public university in Kirov, the capital city of Kirov Oblast in Russia. It was created in its current form in 2016 with the merger of the Vyatka State Humanitarian University and the Vyatka State Technological University.

References

  1. Kiselev, Mikhail V. (1994). "PolyAnalyst - A Machine Dis covery System Inferring Functional Programs" (PDF). AAAI Technical Report. AAAI. AAAI-94 Workshop on Knowledge Discovery in Databases (WS-94-03): 237-249. Retrieved 15 March 2021.
  2. Apicella, Mario (3 July 2000). "PolyAnalyst 4.1 digs through data for gold". Info World.
  3. Zhang, Qingyu; Segall, Richard S. (1 December 2008). "Web mining: a survey of current research, techniques, and software". International Journal of Information Technology & Decision Making. 7 (4): 683–720. doi:10.1142/S0219622008003150. ISSN   0219-6220.
  4. Zhang, Qingyu; Segall, Richard S. (1 January 2010). "Review of data, text and web mining software". Kybernetes. 39 (4): 625–655. doi:10.1108/03684921011036835. ISSN   0368-492X.
  5. Zhang, Qingyu; Segall, Richard S. (2009), Maimon, Oded; Rokach, Lior (eds.), "Commercial Data Mining Software", Data Mining and Knowledge Discovery Handbook, Boston, MA: Springer US, pp. 1245–1268, Bibcode:2010dmak.book.1245Z, doi:10.1007/978-0-387-09823-4_65, ISBN   978-0-387-09823-4 , retrieved 3 October 2020
  6. 1 2 Halper, Fern (2011). "Predictive Analytics: The Hurwitz Victory Index Report" (PDF). Hurwitz & Associates. Retrieved 28 September 2020.
  7. Wang, John; Yang, James G.S. (2009). "Data mining techniques for auditing attest function and fraud detection" (PDF). Journal of Forensic & Investigative Accounting. 1 (1): 1–24.
  8. "Life sciences: Increasing speed-to-insight in pharma". kmworld.com. Retrieved 22 September 2020.
  9. Su, Eric Wen; Sanger, Todd M. (23 March 2017). "Systematic drug repositioning through mining adverse event data in ClinicalTrials.gov". PeerJ. 5: e3154. doi: 10.7717/peerj.3154 . ISSN   2167-8359. PMC   5366063 . PMID   28348935.
  10. "COVID-19: Megaputer provides interactive geo-map to forecast peak of active cases in U.S." thegeospatial. 29 April 2020. Retrieved 30 September 2020.
  11. "В России представили модели пика заболеваемости COVID-19 в регионах". РБК (in Russian). 17 May 2020. Retrieved 24 September 2020.
  12. Richard S. Segall; Qingyu Zhang. "Web Mining of Hotel Customer Survey Data". Systemics, Cybernetics and Informatics. 6 (6): 23–29. CiteSeerX   10.1.1.455.7659 .
  13. Evelson, Boris (10 November 2015). "Vendor Landscape: Big Data Text Analytics". Forrester.
  14. редакция, Любимая. "В ТГУ открылся Центр коллективного пользования платформой для аналитики big data". Томский Обзор (in Russian). Retrieved 26 February 2021.
  15. "Ученые ВятГУ совместно с компанией Мегапьютер Интеллидженс разработали чат-бот для помощи жертвам супружеского насилия - Официальный сайт ВятГУ". vyatsu.ru. Retrieved 26 February 2021.
  16. "Суперкомпьютер помогает находить в интернете жертв домашнего насилия | iot.ru Новости Интернета вещей". iot.ru. Retrieved 26 February 2021.