Metatron Discovery

Last updated
Metatron Discovery
Developer(s) SK Telecom
Type Business intelligence
Website metatron.app

Metatron Discovery is a big data analytics platform developed by a South Korean telecommunications provider, SK Telecom. It is a partially open-sourced [1] software system based on the Apache Druid engine. [2] [3]

Contents

Overview

Metatron discovery is a big data analytics platform with the capabilities of big data collection, storage, and visualization. SK Telecom, originally a mobile telecommunication carrier, developed Metatron Discovery to fulfill their internal needs: to effectively process and analyze more than 500TB of mobile network service data that occur daily. SK Telecom then commercialized the platform; about 10 enterprises in South Korea, including SK hynix and Industrial Bank of Korea have adopted Metatron discovery in their systems. [4] [5] In February 2019, SK Telecom and Microsoft agreed to establish a business strategic partnership to launch Microsoft Azure with Metatron discovery. [6]

Key Components

Metatron Discovery performs analytics on its ingested data sources or other external data sources using different analytical tools and outputs analytical results in charts and reports.

Data Preparation

Data Preparation is a tool that creates transformation rules to transform files and tables into forms more suitable for analysis of datasets, and saves the results into HDFS or Hive.

Data Storage

Data Storage manages data ingested into the Metatron engine for analysis and visualization.

Data analysis and visualization

Workbook

Workbook is a data visualization module powered by the Metatron Druid engine. Each workbook is a standalone report that consists of multiple dashboards, while each dashboard consists of various charts showing a visualization of source data analysis.

Workbench

Metatron Workbench provides an environment for data preparation and analytics based on SQL.

Notebook

Metatron Discovery supports a notebook function. Notebook is a tool for creating and sharing documents that include live codes, equations, visualizations, and descriptive texts. It is mostly used for data cleaning and manipulation, numerical simulations, statistical modeling, and machine learning. External Jupyter and Zeppelin servers can be used in Metatron Discovery.

Data Monitoring

Data monitoring supports monitoring the logs of all queries submitted by users in Metatron Workbench to the staging database (internal Hive database) and external databases connected to Metatron.

See also

Related Research Articles

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

<span class="mw-page-title-main">Dundas Data Visualization</span> Software company in Canada

Dundas Data Visualization, Inc. is a software company specializing in data visualization and dashboard solutions. In addition to developing enterprise-level dashboard software, Dundas offers a professional services group that provides consulting and training.

The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information.

Pentaho is business intelligence (BI) software that provides data integration, OLAP services, reporting, information dashboards, data mining and extract, transform, load (ETL) capabilities. Its headquarters are in Orlando, Florida. Pentaho was acquired by Hitachi Data Systems in 2015 and in 2017 became part of Hitachi Vantara.

HPCC, also known as DAS, is an open source, data-intensive computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL.

Cloud analytics is a marketing term for businesses to carry out analysis using cloud computing. It uses a range of analytical tools and techniques to help companies extract information from massive data and present it in a way that is easily categorised and readily available via a web browser.

Druid is a column-oriented, open-source, distributed data store written in Java. Druid is designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data. The name Druid comes from the shapeshifting Druid class in many role-playing games, to reflect that the architecture of the system can shift to solve different types of data problems.

Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, Google Drive, and YouTube. Alongside a set of management tools, it provides a series of modular cloud services including computing, data storage, data analytics and machine learning. Registration requires a credit card or bank account details.

The High-performance Integrated Virtual Environment (HIVE) is a distributed computing environment used for healthcare-IT and biological research, including analysis of Next Generation Sequencing (NGS) data, preclinical, clinical and post market data, adverse events, metagenomic data, etc. Currently it is supported and continuously developed by US Food and Drug Administration, George Washington University, and by DNA-HIVE, WHISE-Global and Embleema. HIVE currently operates fully functionally within the US FDA supporting wide variety (+60) of regulatory research and regulatory review projects as well as for supporting MDEpiNet medical device postmarket registries. Academic deployments of HIVE are used for research activities and publications in NGS analytics, cancer research, microbiome research and in educational programs for students at GWU. Commercial enterprises use HIVE for oncology, microbiology, vaccine manufacturing, gene editing, healthcare-IT, harmonization of real-world data, in preclinical research and clinical studies.

<span class="mw-page-title-main">Databricks</span> Modern Cloud Data Platform

Databricks is an American enterprise software company founded by the creators of Apache Spark. Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.

<span class="mw-page-title-main">Logi Analytics</span>

Logi Analytics, Inc. is a computer software company headquartered in McLean, Virginia, United States with offices in the UK and Ireland. It offers interactive data visualization products for business intelligence and business analytics. On April 7th, 2021, Logi Analytics, Inc. was acquired by insightsoftware.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

<span class="mw-page-title-main">Apache Kylin</span> Open-source distributed analytics engine

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

<span class="mw-page-title-main">Seeq Corporation</span>

Seeq Corporation is a software company, founded in 2013 and headquartered in Seattle, Washington, United States. Seeq provides software with advanced analytics capabilities to the industrial process manufacturing sector including pharmaceutical, oil and gas, mining and minerals, pulp and paper, energy and utilities, IIoT, and chemical industries among others. Seeq's browser-based software is designed specifically for use with time series data, which is most often aggregated in data historians such as OSIsoft's PI system, Inductive Automation's Ignition system and other similar data historians such as Emerson's Ovation and DeltaV, GE Proficy, Honeywell's Uniformance PHD, Wonderware, and AspenTech IP.21, as well as many others.

Azure Data Explorer is a fully-managed big data analytics cloud platform and data-exploration service, developed by Microsoft, that ingests structured, semi-structured and unstructured data. The service then stores this data and answers analytic ad hoc queries on it with seconds of latency. It is a full text indexing and retrieval database, including time series analysis capabilities and regular expression evaluation and text parsing.

<span class="mw-page-title-main">Apache Pinot</span> Open-source distributed data store

Apache Pinot is a column-oriented, open-source, distributed data store written in Java. Pinot is designed to execute OLAP queries with low latency. It is suited in contexts where fast analytics, such as aggregations, are needed on immutable data, possibly, with real-time data ingestion. The name Pinot comes from the Pinot grape vines that are pressed into liquid that is used to produce a variety of different wines. The founders of the database chose the name as a metaphor for analyzing vast quantities of data from a variety of different file formats or streaming data sources.

Apache Iceberg is an open source high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Snowflake, Dremio, Presto, Hive, Impala, StarRocks, Doris, Athena, and Pig to safely work with the same tables, at the same time. Iceberg is released under the Apache License. Iceberg addresses the performance and usability challenges of using Apache Hive tables in large and demanding data lake environments.

References

  1. "SKT, 오픈플랫폼 전략 박차…이번엔 스마트팩토리". www.ddaily.co.kr.
  2. "Druid | Community and Third Party Software". druid.apache.org.
  3. "빅 데이터 프레임워크, 솔루션들의 목적과 역할". brunch. June 19, 2018.
  4. "SKT partners with MS to enter big data market". koreatimes. February 20, 2019.
  5. http://www.zdnet.co.kr/view/?no=20181220141344
  6. "SK Telecom and Microsoft Sign MOU for Comprehensive Cooperation in Cutting-Edge ICT". Network Manias.