User-defined function

Last updated

A user-defined function (UDF) is a function provided by the user of a program or environment, in a context where the usual assumption is that functions are built into the program or environment. UDFs are usually written for the requirement of its creator.

Contents

BASIC language

In some old implementations of the BASIC programming language, user-defined functions are defined using the "DEF FN" syntax. More modern dialects of BASIC are influenced by the structured programming paradigm, where most or all of the code is written as user-defined functions or procedures, and the concept becomes practically redundant.

COBOL language

In the COBOL programming language, a user-defined function is an entity that is defined by the user by specifying a FUNCTION-ID paragraph. A user-defined function must return a value by specifying the RETURNING phrase of the procedure division header and they are invoked using the function-identifier syntax. See the ISO/IEC 1989:2014 Programming Language COBOL standard for details.

As of May 2022, the IBM Enterprise COBOL for z/OS 6.4 (IBM COBOL) compiler contains support for user-defined functions.

Databases

In relational database management systems, a user-defined function provides a mechanism for extending the functionality of the database server by adding a function, that can be evaluated in standard query language (usually SQL) statements. The SQL standard distinguishes between scalar and table functions. A scalar function returns only a single value (or NULL), whereas a table function returns a (relational) table comprising zero or more rows, each row with one or more columns.

User-defined functions in SQL are declared using the CREATE FUNCTION statement. For example, a user-defined function that converts Celsius to Fahrenheit (a temperature scale used in USA) might be declared like this:

CREATEFUNCTIONdbo.CtoF(CelsiusFLOAT)RETURNSFLOATRETURN(Celsius*1.8)+32

Once created, a user-defined function may be used in expressions in SQL statements. For example, it can be invoked where most other intrinsic functions are allowed. This also includes SELECT statements, where the function can be used against data stored in tables in the database. Conceptually, the function is evaluated once per row in such usage. For example, assume a table named Elements, with a row for each known chemical element. The table has a column named BoilingPoint for the boiling point of that element, in Celsius. The query

SELECTName,CtoF(BoilingPoint)FROMElements

would retrieve the name and the boiling point from each row. It invokes the CtoF user-defined function as declared above in order to convert the value in the column to a value in Fahrenheit.

Each user-defined function carries certain properties or characteristics. The SQL standard defines the following properties:

User-defined functions should not be confused with stored procedures. Stored procedures allow the user to group a set of SQL commands. A procedure can accept parameters and execute its SQL statements depending on those parameters. A procedure is not an expression and, thus, cannot be used like user-defined functions.

Some database management systems allow the creation of user defined functions in languages other than SQL. Microsoft SQL Server, for example, allows the user to use .NET languages including C# for this purpose. DB2 and Oracle support user-defined functions written in C or Java programming languages.

SQL Server 2000

There are three types of UDF in Microsoft SQL Server 2000: scalar functions, inline table-valued functions, and multistatement table-valued functions.

Scalar functions return a single data value (not a table) with RETURNS clause. Scalar functions can use all scalar data types, with exception of timestamp and user-defined data types. Inline table-valued functions return the result set of a single SELECT statement. Multistatement table-valued functions return a table, which was built with many TRANSACT-SQL statements.

User-defined functions can be invoked from a query like built‑in functions such as OBJECT_ID, LEN, DATEDIFF, or can be executed through an EXECUTE statement like stored procedures.

Performance Notes: 1. On Microsoft SQL Server 2000 a table-valued function which "wraps" a View may be much faster than the View itself. The following MyFunction is an example of a "function-wrapper" which runs faster than the underlying view MyView:

CREATEFUNCTIONMyFunction()RETURNS@TblTABLE(StudentIDVARCHAR(255),SAS_StudentInstancesIDINT,LabelVARCHAR(255),ValueMONEY,CMN_PersonsIDINT)ASBEGININSERT@Tbl(StudentID,SAS_StudentInstancesID,Label,Value,CMN_PersonsID)SELECTStudentID,SAS_StudentInstancesID,Label,Value,CMN_PersonsIDFROMMyView--whereMyViewselects(withjoins)thesamecolumnsfromlargetable(s)RETURNEND

2. On Microsoft SQL Server 2005 the result of the same code execution is the opposite: view is executed faster than the "function-wrapper".

User-defined functions are subroutines made of one or more Transact-SQL statements that can be used to encapsulate code for reuse. It takes zero or more arguments and evaluates a return value. Has both control-flow and DML statements in its body similar to stored procedures. Does not allow changes to any Global Session State, like modifications to database or external resource, such as a file or network. Does not support output parameter. DEFAULT keyword must be specified to pass the default value of parameter. Errors in UDF cause UDF to abort which, in turn, aborts the statement that invoked the UDF.

CREATEFUNCTIONCubicVolume--Inputdimensionsincentimeters(@CubeLengthdecimal(4,1),@CubeWidthdecimal(4,1),@CubeHeightdecimal(4,1))RETURNSdecimal(12,3)ASBEGINRETURN(@CubeLength*@CubeWidth*@CubeHeight)END

Data type supported in Microsoft SQL Server 2000 Like a temporary table used to store results Mostly used to define temporary variable of type (table) and the return value of a UDF The scope is limited to function, stored procedure, or batch in which it is defined Assignment operation is not allowed between (Table) variables May be used in SELECT, INSERT, UPDATE, and DELETE CREATE FUNCTION to create UDF ALTER FUNCTION to change the characteristics of UDF DROP FUNCTION to remove UDF

Apache Hive

Apache Hive defines, in addition to the regular user-defined functions (UDF), also user-defined aggregate functions (UDAF) and table-generating functions (UDTF). [1] Hive enables developers to create their own custom functions with Java. [2]

Apache Doris

Apache Doris, an open-source real-time analytical database, allows external users to contribute their own UDFs written in C++ to it. [3]

Related Research Articles

Structured Query Language (SQL) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

Online analytical processing, or OLAP, is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

A stored procedure is a subroutine available to applications that access a relational database management system (RDBMS). Such procedures are stored in the database data dictionary.

Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary extension to the SQL used to interact with relational databases. T-SQL expands on the SQL standard to include procedural programming, local variables, various support functions for string processing, date processing, mathematics, etc. and changes to the DELETE and UPDATE statements.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

The SQL SELECT statement returns a result set of records, from one or more tables.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

Multidimensional Expressions (MDX) is a query language for online analytical processing (OLAP) using a database management system. Much like SQL, it is a query language for OLAP cubes. It is also a calculation language, with syntax similar to spreadsheet formulae.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

SQL CLR or SQLCLR is technology for hosting of the Microsoft .NET common language runtime engine within SQL Server. The SQLCLR allows managed code to be hosted by, and run in, the Microsoft SQL Server environment.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

The MySQLi Extension is a relational database driver used in the PHP scripting language to provide an interface with MySQL databases.

<span class="mw-page-title-main">XLeratorDB</span>

XLeratorDB is a suite of database function libraries that enable Microsoft SQL Server to perform a wide range of additional (non-native) business intelligence and ad hoc analytics. The libraries, which are embedded and run centrally on the database, include more than 450 individual functions similar to those found in Microsoft Excel spreadsheets. The individual functions are grouped and sold as six separate libraries based on usage: finance, statistics, math, engineering, unit conversions and strings. WestClinTech, the company that developed XLeratorDB, claims it is "the first commercial function package add-in for Microsoft SQL Server."

The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

In database management systems (DBMS), a prepared statement, parameterized statement, or parameterized query is a feature where the database pre-compiles SQL code and stores the results, separating it from data. Benefits of prepared statements are:

PL/SQL is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database, Times Ten in-memory database, and IBM Db2. Oracle Corporation usually extends PL/SQL functionality with each successive release of the Oracle Database.

References

  1. "LanguageManual UDF - Apache Hive - Apache Software Foundation". 26 June 2015.
  2. "HivePlugins - Apache Hive - Apache Software Foundation". 26 June 2015.
  3. "Apache Doris UDF" . Retrieved 8 April 2023.