Metatron Discovery

Metatron Discovery
Developer(s)	SK Telecom
Type	Business intelligence
Website	metatron.app

Last updated January 16, 2025

Metatron Discovery is a big data analytics platform developed by a South Korean telecommunications provider, SK Telecom. It is a partially open-sourced^[1] software system based on the Apache Druid engine.^[2]

Overview

Metatron discovery is a big data analytics platform with the capabilities of big data collection, storage, and visualization. SK Telecom, originally a mobile telecommunication carrier, developed Metatron Discovery to fulfill their internal needs: to effectively process and analyze more than 500TB of mobile network service data that occur daily. SK Telecom then commercialized the platform; about 10 enterprises in South Korea, including SK hynix and Industrial Bank of Korea have adopted Metatron discovery in their systems.^[3]^[4]^{[ dead link ‍]} In February 2019, SK Telecom and Microsoft agreed to establish a business strategic partnership to launch Microsoft Azure with Metatron discovery.^[5]

Key Components

Metatron Discovery performs analytics on its ingested data sources or other external data sources using different analytical tools and outputs analytical results in charts and reports.

Data Preparation

Data Preparation is a tool that creates transformation rules to transform files and tables into forms more suitable for analysis of datasets, and saves the results into HDFS or Hive.

Data Storage

Data Storage manages data ingested into the Metatron engine for analysis and visualization.

Data analysis and visualization

Workbook

Workbook is a data visualization module powered by the Metatron Druid engine. Each workbook is a standalone report that consists of multiple dashboards, while each dashboard consists of various charts showing a visualization of source data analysis.

Workbench

Metatron Workbench provides an environment for data preparation and analytics based on SQL.

Notebook

Metatron Discovery supports a notebook function. Notebook is a tool for creating and sharing documents that include live codes, equations, visualizations, and descriptive texts. It is mostly used for data cleaning and manipulation, numerical simulations, statistical modeling, and machine learning. External Jupyter and Zeppelin servers can be used in Metatron Discovery.

Data Monitoring

Data monitoring supports monitoring the logs of all queries submitted by users in Metatron Workbench to the staging database (internal Hive database) and external databases connected to Metatron.

Related Research Articles

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence (BI). The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

Dundas Data Visualization, Inc. is a software company specializing in data visualization and dashboard solutions. In addition to developing enterprise-level dashboard software, Dundas offers a professional services group that provides consulting and training.

Tableau Software, LLC is an American interactive data visualization software company focused on business intelligence. It was founded in 2003 in Mountain View, California, and is currently headquartered in Seattle, Washington. In 2019, the company was acquired by Salesforce for $15.7 billion. At the time, this was the largest acquisition by Salesforce since its foundation. It was later surpassed by Salesforce's acquisition of Slack.

The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information.

Pentaho is the brand name for several Data Management software products that make up the Pentaho+ Data Platform. These include Pentaho Data Integration, Pentaho Business Analytics, Pentaho Data Catalog, and Pentaho Data Optimiser.

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

Cloud analytics is a marketing term for businesses to carry out analysis using cloud computing. It uses a range of analytical tools and techniques to help companies extract information from massive data and present it in a way that is easily categorised and readily available via a web browser.

Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data. The name Druid comes from the shapeshifting Druid class in many role-playing games, to reflect that the architecture of the system can shift to solve different types of data problems.

Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma et al. Registration requires a credit card or bank account details.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

Logi Analytics, Inc. is a computer software company headquartered in McLean, Virginia, United States with offices in the UK and Ireland. It offers interactive data visualization products for business intelligence and business analytics. On April 7, 2021, Logi Analytics, Inc. was acquired by insightsoftware.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

Seeq Corporation is a software company founded in 2013 and headquartered in Seattle, Washington, United States. The company creates analytics software for various industrial process manufacturing sectors, including pharmaceuticals, oil and gas, mining, pulp and paper, energy, utilities, IIoT, and chemicals.

Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & Professor Ion Stoica. Alluxio sits between computation and storage in the big data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License.

Azure Data Explorer is a fully-managed big data analytics cloud platform and data-exploration service, developed by Microsoft, that ingests structured, semi-structured and unstructured data. The service then stores this data and answers analytic ad hoc queries on it with seconds of latency. It is a full-text indexing and retrieval database, including time series analysis capabilities and regular expression evaluation and text parsing.

Microsoft Power Platform is a collection of low-code development tools that allows users to build custom business applications, automate workflows, and analyze data. It also offers integration with GitHub, Microsoft Azure, Microsoft Dynamics 365, and Microsoft Teams, amongst other Microsoft and third-party applications.

Apache Pinot is a column-oriented, open-source, distributed data store written in Java. Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion. The name Pinot comes from the Pinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.

References

↑ "SKT, 오픈플랫폼 전략 박차…이번엔 스마트팩토리". www.ddaily.co.kr.
↑ "Druid | Community and Third Party Software". druid.apache.org.
↑ "SKT partners with MS to enter big data market". koreatimes. February 20, 2019.
↑ http://www.zdnet.co.kr/view/?no=20181220141344
↑ "SK Telecom and Microsoft Sign MOU for Comprehensive Cooperation in Cutting-Edge ICT". Network Manias.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "SKT, 오픈플랫폼 전략 박차…이번엔 스마트팩토리". www.ddaily.co.kr.

[2] "Druid | Community and Third Party Software". druid.apache.org.

[3] "SKT partners with MS to enter big data market". koreatimes. February 20, 2019.

[4] ttp://www.zdnet.co.kr/view/?no=20181220141344

[5] "SK Telecom and Microsoft Sign MOU for Comprehensive Cooperation in Cutting-Edge ICT". Network Manias.

[1]

[2]

[3]

[4]

[5]