TREX search engine

Last updated

TREX is a search engine in the SAP NetWeaver integrated technology platform produced by SAP SE using columnar storage. [1] The TREX engine is a standalone component that can be used in a range of system environments but is used primarily as an integral part of SAP products such as Enterprise Portal, Knowledge Warehouse, and Business Intelligence (BI, formerly SAP Business Information Warehouse). In SAP NetWeaver BI, the TREX engine powers the BI Accelerator, which is a plug-in appliance for enhancing the performance of online analytical processing. The name "TREX" stands for Text Retrieval and information EXtraction, but it is not a registered trademark of SAP and is not used in marketing collateral.

Contents

Search functions

TREX supports various kinds of text search, including exact search, boolean search, wildcard search, linguistic search (grammatical variants are normalized for the index search) and fuzzy search (input strings that differ by a few letters from an index term are normalized for the index search). Result sets are ranked using term frequency-inverse document frequency (tf-idf) weighting, and results can include snippets with the search terms highlighted.

TREX supports text mining and classification using a vector space model. Groups of documents can be classified using query based classification, example based classification, or a combination of these plus keyword management.

TREX supports structured data search not only for document metadata but also for mass business data and data in SAP BusinessObjects. Indexes for structured data are implemented compactly using data compression and the data can be aggregated in linear time, to enable large volumes of data to be processed entirely in memory.

Recent developments include:

History

The first code for the engine was written in 1998 and TREX became an SAP component in 2000. The SAP NetWeaver BI Accelerator was first rolled out in 2005. As of Q1 2013, the current release of TREX is SAP NW 7.1.

Security

A security vulnerability in TREX was first identified and fixed in 2015 ). [2] [3] The vulnerability occurred due to lack of authentication in TREXnet, an internal communication protocol. The aforementioned patch fixed the problem by removing some critical functionality.

Later on, ERPScan head of threat intelligence Mathieu Geli continued to look into the vulnerability and found that the vulnerability was still exploitable. [4] Moreover, in case of successful attack, the vulnerability would allow a remote attacker to get full control over the server without authorization. [5] The vulnerability has been finally patched via SAP Security Note 2419592.

Related Research Articles

Business intelligence (BI) consists of strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics.

ABAP is a high-level programming language created by the German software company SAP SE. It is currently positioned, alongside Java, as the language for programming the SAP NetWeaver Application Server, which is part of the SAP NetWeaver platform for building business applications.

SAP Business Warehouse is SAP’s Enterprise Data Warehouse product. It can transform and consolidate business information from virtually any source system. It ran on industry standard RDBMS until version 7.3 at which point it began to transition onto SAP's HANA in-memory DBMS, particularly with the release of version 7.4.

In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases.

SAP NetWeaver is a software stack for many of SAP SE's applications. The SAP NetWeaver Application Server, sometimes referred to as WebAS, is the runtime environment for the SAP applications and all of the mySAP Business Suite runs on SAP WebAS: supplier relationship management (SRM), customer relationship management (CRM), supply chain management (SCM), product lifecycle management (PLM), enterprise resource planning (ERP), transportation management system (TMS).

<span class="mw-page-title-main">SAP Graphical User Interface</span> Component of SAP systems

SAP GUI is the graphical user interface client in SAP ERP's 3-tier architecture of database, application server and client. It is software that runs on a Microsoft Windows, Apple Macintosh or Unix desktop, and allows a user to access SAP functionality in SAP applications such as SAP ERP and SAP Business Information Warehouse (BW). It is used for remote access to the SAP central server in a company network.

Business intelligence software is a type of application software designed to retrieve, analyze, transform and report data for business intelligence. The applications generally read data that has been previously stored, often - though not necessarily - in a data warehouse or data mart.

SAP PowerDesigner is a collaborative enterprise modelling tool produced by Sybase, currently owned by SAP. It can run either under Microsoft Windows as a native application or in an Eclipse environment through a plugin. It supports model-driven architecture software design, and stores models using a variety of file extensions, such as .bpm, .cdm and .pdm. The internal file structure can be either XML or a compressed binary file format. It can also store models in a database repository.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.

In computing, the SAP BW Accelerator is a computer appliance - preinstalled software on predefined hardware - which is used to speed up OLAP queries. The software was initially known as the BI Accelerator.

Enterprise search is software technology for searching data sources internal to a company, typically intranet and database content. The search is generally offered only to users internal to the company. Enterprise search can be contrasted with web search, which applies search technology to documents on the open web, and desktop search, which applies search technology to the content on a single computer.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

The bag-of-words model is a model of text which uses a representation of text that is based on an unordered collection of words. It is used in natural language processing and information retrieval (IR). It disregards word order but captures multiplicity. The bag-of-words model has also been used for computer vision.

<span class="mw-page-title-main">Metadata</span> Data about data

Metadata is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including:

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

<span class="mw-page-title-main">IBM Cognos Analytics</span> Business intelligence suite

IBM Cognos Analytics with Watson is a web-based integrated business intelligence suite by IBM. It provides a toolset for reporting, analytics, scorecarding, and monitoring of events and metrics. The software consists of several components designed to meet the different information requirements in a company. IBM Cognos Analytics has components such as IBM Cognos Framework Manager, IBM Cognos Cube Designer, IBM Cognos Transformer.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">SAP HANA</span> Database management system by SAP

SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as the software running a database server is to store and retrieve data as requested by the applications. In addition, it performs advanced analytics and includes extract, transform, load (ETL) capabilities as well as an application server.

References

  1. Daniel Abadi; Peter Boncz; Stavros Harizopoulos; Stratos Idreos; Samuel Madden (2012). "The Design and Implementation of Modern Column-Oriented Database Systems" (PDF). Foundations and Trends in Databases. 5 (3): 197–280. doi:10.1561/1900000024. Archived from the original (PDF) on 2021-04-12. Retrieved 2016-03-29.
  2. "Unknown".[ permanent dead link ]
  3. Unknown [ permanent dead link ]
  4. "Critical Vulnerability affects SAP HANA and dozen of other SAP applications". Archived from the original on 2017-07-09. Retrieved 2017-05-03.
  5. "SAP's TREX exposed HANA, NetWeaver". The Register .