Joe Caserta

Last updated
Joe Caserta
Joe Caserta.jpg
Joe Caserta at MIT CDOIQ Data Symposium, July 18, 2018
Born
New York, NY, USA
NationalityAmerican
EducationColumbia University
EmployerCaserta
TitleFounder and President

Joe Caserta is an American information specialist [1] and author. [2] He is best known as the founder and president of data and analytics consulting, architecture, and implementation firm [3] Caserta founded in 2001. [4] Management consulting firm McKinsey & Company acquired Caserta on June 1, 2022. [5]

Joe Caserta was born and raised in New York. [6] He studied database application development and design at Columbia University. [6] He is a data science expert, [7] keynote speaker, [8] and panelist. [9]

Working with Ralph Kimball, he co-authored The Data Warehouse ETL Toolkit, (Wiley, 2004) [10] which is used as a textbook for courses teaching ETL processes in data warehousing. [11]

Caserta is the founder and host of the Big Data Warehousing Meetup group in New York, which has more than 5,000 members. [12]

Related Research Articles

<span class="mw-page-title-main">Data warehouse</span> Centralized storage of knowledge

In computing, a data warehouse, also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. Data warehouses are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating reports. This is beneficial for companies as it enables them to interrogate and draw insights from their data and make decisions.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

Business intelligence (BI) consists of strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.

Teradata Corporation is an American software company that provides cloud database and analytics-related software, products, and services. The company was formed in 1979 in Brentwood, California, as a collaboration between researchers at Caltech and Citibank's advanced technology group.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

Data engineering refers to the building of systems to enable the collection and usage of data. This data is usually used to enable subsequent analysis and data science; which often involves machine learning. Making the data usable usually involves substantial compute and storage, as well as data processing.

<span class="mw-page-title-main">McKinsey & Company</span> US-based worldwide management consulting firm

McKinsey & Company is an American multinational strategy and management consulting firm that offers professional services to corporations, governments, and other organizations. Founded in 1926 by James O. McKinsey, McKinsey is the oldest and largest of the "MBB" management consultancies (MBB). The firm mainly focuses on the finances and operations of their clients.

William H. Inmon is an American computer scientist, recognized by many as the father of the data warehouse. Inmon wrote the first book, held the first conference, wrote the first column in a magazine and was the first to offer classes in data warehousing. Inmon created the accepted definition of what a data warehouse is - a subject oriented, nonvolatile, integrated, time variant collection of data in support of management's decisions. Compared with the approach of the other pioneering architect of data warehousing, Ralph Kimball, Inmon's approach is often characterized as a top-down approach.

Ralph Kimball is an author on the subject of data warehousing and business intelligence. He is one of the original architects of data warehousing and is known for long-term convictions that data warehouses must be designed to be understandable and fast. His bottom-up methodology, also known as dimensional modeling or the Kimball methodology, is one of the two main data warehousing methodologies alongside Bill Inmon.

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

Precisely Holdings, LLC, doing business as Precisely, is a software company specializing in data integrity tools, and also providing big data, high-speed sorting, ETL, data integration, data quality, data enrichment, and location intelligence offerings. The company was originally founded as Whitlow Computer Systems before rebranding as Syncsort Incorporated in 1981, and then to its current form in 2020. Its original, eponymously named product, SyncSort, was the dominant sort program for IBM mainframe computers during much of the 1970s and 1980s.

<span class="mw-page-title-main">Vertica</span> Software company

Vertica is an analytic database management software company. Vertica was founded in 2005 by the database researcher Michael Stonebraker with Andrew Palmer as the founding CEO. Ralph Breslauer and Christopher P. Lynch served as CEOs later on.

RTTS is a professional services organization that provides software quality outsourcing, training, and resources for business applications. With offices in New York City, Philadelphia, Atlanta, and Phoenix, RTTS serves mid-sized to large corporations throughout North America. RTTS uses the software quality and test solutions from IBM, Hewlett Packard Enterprise, Microsoft and other vendors and open source tools to perform software performance testing, functional test automation, big data testing, data warehouse/ETL testing, mobile application testing, security testing and service virtualization.

Cloudant is an IBM software product, which is primarily delivered as a cloud-based service. Cloudant is a non-relational, distributed database service of the same name. Cloudant is based on the Apache-backed CouchDB project and the open source BigCouch project.

<span class="mw-page-title-main">IBM Cognos Analytics</span> Business intelligence suite

IBM Cognos Analytics with Watson is a web-based integrated business intelligence suite by IBM. It provides a toolset for reporting, analytics, scorecarding, and monitoring of events and metrics. The software consists of several components designed to meet the different information requirements in a company. IBM Cognos Analytics has components such as IBM Cognos Framework Manager, IBM Cognos Cube Designer, IBM Cognos Transformer.

AlixPartners is a financial advisory and global consulting firm best known for its work in the turnaround space. Jay Alix founded what became AlixPartners LLP in 1981. The firm has advised on some of the largest Chapter 11 reorganizations including General Motors Co., Kmart, and Enron Corp. The firm has since moved into a more traditional consulting space, and grown to a staff of over 1000. AlixPartners is headquartered in New York, and has offices in more than 20 cities around the world. They were also involved in the Bernie Madoff scandal, identifying 13,000 investors affected by the scandal for the prosecuting team.

Extract, load, transform (ELT) is an alternative to extract, transform, load (ETL) used with data lake implementations. In contrast to ETL, in ELT models the data is not transformed on entry to the data lake, but stored in its original raw format. This enables faster loading times. However, ELT requires sufficient processing power within the data processing engine to carry out the transformation on demand, to return the results in a timely manner. Since the data is not processed on entry to the data lake, the query and schema do not need to be defined a priori. ELT is a data pipeline model.

<span class="mw-page-title-main">Data lake</span> System or repository of data stored in its natural/raw format

A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning. A data lake can include structured data from relational databases, semi-structured data, unstructured data, and binary data. A data lake can be established "on premises" or "in the cloud".

The functional database model is used to support analytics applications such as financial planning and performance management. The functional database model, or the functional model for short, is different from but complementary to the relational model. The functional model is also distinct from other similarly named concepts, including the DAPLEX functional database model and functional language databases.

DataOps is a set of practices, processes and technologies that combines an integrated and process-oriented perspective on data with automation and methods from agile software engineering to improve quality, speed, and collaboration and promote a culture of continuous improvement in the area of data analytics. While DataOps began as a set of best practices, it has now matured to become a new and independent approach to data analytics. DataOps applies to the entire data lifecycle from data preparation to reporting, and recognizes the interconnected nature of the data analytics team and information technology operations.

References

  1. Concepts, Joe Caserta, Caserta (2013-04-30). "Data Scientists to Wipe out Business Analysts". Wired. ISSN   1059-1028 . Retrieved 2018-12-04.{{cite magazine}}: CS1 maint: multiple names: authors list (link)
  2. "2018 – Oregon State University". hedw.org. Retrieved 2018-12-04.
  3. Heizenberg, Jorgen (June 18, 2018). "Market Guide for Data and Analytics Service Providers". www.gartner.com. Retrieved 2018-12-04.
  4. "Caserta Concepts: Number 740 on the 2015 Inc. 5000". Inc.com. 2015. Retrieved 2018-12-04.
  5. Hampton, Jaime (2022-06-01). "McKinsey Acquires Data Engineering Pioneer Caserta". Datanami. Retrieved 2022-07-09.
  6. 1 2 Success, Insights (2018-03-21). "Joe Caserta: A Bold and Innovative Entrepreneur". Insights success. Retrieved 2018-12-04.
  7. "Data science expert interview: Joe Caserta". IBM Big Data & Analytics Hub. Retrieved 2018-12-04.
  8. SiliconANGLE theCUBE, Joe Caserta, Caserta | MIT CDOIQ 2018 , retrieved 2018-12-04
  9. Realty, Digital (2013-07-09), Mobility, The Cloud and Big Data Session - MarketplaceLIVE 2013 , retrieved 2018-12-04
  10. "The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data". Wiley.com. Retrieved 2018-12-04.
  11. "The Data Warehouse ETL Toolkit". www.protechtraining.com. Retrieved 2018-12-04.
  12. "Big Data Warehousing".