Joe Caserta

Last updated
Joe Caserta
Joe Caserta.jpg
Joe Caserta at MIT CDOIQ Data Symposium, July 18, 2018
Born
New York, NY, USA
NationalityAmerican
EducationColumbia University
EmployerCaserta
TitleFounder and President

Joe Caserta is an American information specialist [1] and author. [2] He is best known as the founder and president of data and analytics consulting, architecture, and implementation firm [3] Caserta founded in 2001. [4]

Joe Caserta was born and raised in New York. [5] He studied database application development and design at Columbia University. [5] He is a data science expert, [6] keynote speaker, [7] and panelist. [8]

Working with Ralph Kimball, he co-authored The Data Warehouse ETL Toolkit, (Wiley, 2004) [9] which is used as a textbook for courses teaching ETL processes in data warehousing. [10]

Caserta is the founder and host of the Big Data Warehousing Meetup group in New York, which has more than 5,000 members. [11]

Related Research Articles

Data warehouse Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.

IBM Db2 Family Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. They initially supported the relational model, but were extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB/2, then DB2 until 2017 and finally changed to its present form.

Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.

Extract, transform, load Procedure in computing

In computing, extract, transform, load (ETL) is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s). The ETL process became a popular concept in the 1970s and is often used in data warehousing.

William H. Inmon is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine and was the first to offer classes in data warehousing. Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.

Dimension (data warehouse) Structure that categorizes facts and measures in a data warehouse

A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time.

Ralph Kimball is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. His methodology, also known as dimensional modeling or the Kimball methodology, has become the de facto standard in the area of decision support.

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

Precisely, rebranded from Syncsort Incorporated in May 2020, is a software company specializing in big data, high speed sorting products, data integration data quality, data enrichment, and location intelligence offerings, for IBM i, Hadoop, Microsoft Windows, UNIX, Linux, and mainframe computer systems.

IBM Netezza is a subsidiary of American technology company IBM that designs and markets high-performance data warehouse appliances and advanced analytics applications for uses including enterprise data warehousing, business intelligence, predictive analytics and business continuity planning.

Splunk American technology company

Splunk Inc. is an American software company based in San Francisco, California, that produces software for searching, monitoring, and analyzing machine-generated data via a Web-style interface.

Vertica Software company

Vertica Systems is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker, with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as later CEOs.

ParAccel, Inc. was a California-based software company.

KNIME, the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" concept. A graphical user interface and use of JDBC allows assembly of nodes blending different data sources, including preprocessing, for modeling, data analysis and visualization without, or with only minimal, programming.

Hybrid transaction/analytical processing (HTAP), a term created by Gartner Inc. – an information technology research and advisory company, in its early 2014 research report Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation. As defined by Gartner:

Hybrid transaction/analytical processing (HTAP) is an emerging application architecture that "breaks the wall" between transaction processing and analytics. It enables more informed and "in business real time" decision making.

Arcplan Business intelligence software company

Arcplan is a software for business intelligence (BI), budgeting, planning & forecasting (BP&F), business analytics and collaborative Business Intelligence. It is the enhancement of the enterprise software inSight® and dynaSight of the former German provider arcplan Information Services GmbH.

The functional database model is used to support analytics applications such as financial planning and performance management. The functional database model, or the functional model for short, is different from but complementary to the relational model. The functional model is also distinct from other similarly named concepts, including the DAPLEX functional database model and functional language databases.

Dark data is data which is acquired through various computer network operations but not used in any manner to derive insights or for decision making. The ability of an organisation to collect data can exceed the throughput at which it can analyse the data. In some cases the organisation may not even be aware that the data is being collected. IBM estimate that roughly 90 percent of data generated by sensors and analog-to-digital conversions never get used.

Apache Kylin

Apache Kylin is an open source distributed analytics engine designed to provide a SQL interface and multi-dimensional analysis (OLAP) on Hadoop and Alluxio supporting extremely large datasets.

DataOps is a set of practices, processes and technologies that combines an integrated and process-oriented perspective on data with automation and methods from agile software engineering to improve quality, speed, and collaboration and promote a culture of continuous improvement in the area of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

References

  1. Concepts, Joe Caserta, Caserta (2013-04-30). "Data Scientists to Wipe out Business Analysts". Wired. ISSN   1059-1028 . Retrieved 2018-12-04.
  2. "2018 – Oregon State University". hedw.org. Retrieved 2018-12-04.
  3. Heizenberg, Jorgen (June 18, 2018). "Market Guide for Data and Analytics Service Providers". www.gartner.com. Retrieved 2018-12-04.
  4. "Caserta Concepts: Number 740 on the 2015 Inc. 5000". Inc.com. 2015. Retrieved 2018-12-04.
  5. 1 2 Success, Insights (2018-03-21). "Joe Caserta: A Bold and Innovative Entrepreneur". Insights success. Retrieved 2018-12-04.
  6. "Data science expert interview: Joe Caserta". IBM Big Data & Analytics Hub. Retrieved 2018-12-04.
  7. SiliconANGLE theCUBE, Joe Caserta, Caserta | MIT CDOIQ 2018 , retrieved 2018-12-04
  8. Realty, Digital (2013-07-09), Mobility, The Cloud and Big Data Session - MarketplaceLIVE 2013 , retrieved 2018-12-04
  9. "The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data". Wiley.com. Retrieved 2018-12-04.
  10. "The Data Warehouse ETL Toolkit". www.protechtraining.com. Retrieved 2018-12-04.
  11. "Big Data Warehousing".