Marshalling (computer science)

Last updated

In computer science, marshalling or marshaling (US spelling) is the process of transforming the memory representation of an object into a data format suitable for storage or transmission, especially between different runtimes.[ citation needed ] It is typically used when data must be moved between different parts of a computer program or from one program to another.

Contents

Marshalling simplifies complex communications, because it allows using composite objects instead of being restricted to primitive objects .

Comparison with serialization

Marshalling can be somewhat similar to or synonymous with serialization, although technically serialization is one step in the process of marshalling an object.

Marshalling and serialization might thus be done differently, although some form of serialization is usually used to do marshalling. [1]

The term deserialization is somewhat similar to un-marshalling a dry object "on the server side" ie. demarshalling (or unmarshalling) to get a live object back: the serialized object is transformed into an internal data structure ie. a live object within the target runtime. It usually corresponds to the exact inverse process of marshalling, although sometimes both ends of the process trigger specific business logic.

The accurate definition of marshalling differs across programming languages such as Python, Java, and .NET, and in some contexts, is used interchangeably with serialization.

Marshalling in different programming languages

To "serialize" an object means to convert its state into a byte stream in such a way that the byte stream may be converted back into a copy of the object, which is unmarshalling in essence. Different programming languages either make or don’t make the distinction between the two concepts. A few examples:

In Python, the term "marshal" is used for a specific type of "serialization" in the Python standard library [2] – storing internal python objects:

The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files.

If you’re serializing and de-serializing Python objects, use the pickle module instead

The Python Standard Library [3]

In the Java-related RFC   2713, marshalling is used when serializing objects for remote invocation. An object that is marshalled records the state of the original object and it contains the codebase (codebase here refers to a list of URLs where the object code can be loaded from, and not source code). Hence, in order to convert the object state and codebase(s), unmarshalling must be done. The unmarshaller interface automatically converts the marshalled data containing codebase(s) into an executable Java object in JAXB. Any object that can be deserialized can be unmarshalled. However, the converse need not be true.

To "marshal" an object means to record its state and codebase(s) in such a way that when the marshalled object is "unmarshalled," a copy of the original object is obtained, possibly by automatically loading the class definitions of the object. You can marshal any object that is serializable or remote (that is, implements the java.rmi.Remote interface). Marshalling is like serialization, except marshalling also records codebases. Marshalling is different from serialization in that marshalling treats remote objects specially.

Any object whose methods can be invoked [on an object in another Java virtual machine] must implement the java.rmi.Remote interface. When such an object is invoked, its arguments are marshalled and sent from the local virtual machine to the remote one, where the arguments are unmarshalled and used.

Schema for Representing Java(tm) Objects in an LDAP Directory (RFC 2713) [4]

In .NET, marshalling is also used to refer to serialization when using remote calls:

When you marshal an object by value, a copy of the object is created and serialized to the server. Any method calls made on that object are done on the server

How To Marshal an Object to a Remote Server by Value by Using Visual Basic .NET (Q301116) [5]

Usage and examples

Marshalling is used within implementations of different remote procedure call (RPC) mechanisms, where it is necessary to transport data between processes and/or between threads.

In Microsoft's Component Object Model (COM), interface pointers must be marshalled when crossing COM apartment boundaries. [6] [7] In the .NET Framework, the conversion between an unmanaged type and a CLR type, as in the P/Invoke process, is also an example of an action that requires marshalling to take place. [8]

Additionally, marshalling is used extensively within scripts and applications that use the XPCOM technologies provided within the Mozilla application framework. The Mozilla Firefox browser is a popular application built with this framework, that additionally allows scripting languages to use XPCOM through XPConnect (Cross-Platform Connect).

Example

In the Microsoft Windows family of operating systems the entire set of device drivers for Direct3D are kernel-mode drivers. The user-mode portion of the API is handled by the DirectX runtime provided by Microsoft.

This is an issue because calling kernel-mode operations from user-mode requires performing a system call, and this inevitably forces the CPU to switch to "kernel mode". This is a slow operation, taking on the order of microseconds to complete. [9] During this time, the CPU is unable to perform any operations. As such, minimizing the number of times this switching operation must be performed would optimize performance to a substantive degree.

Linux OpenGL drivers are split in two: a kernel-driver and a user-space driver. The user-space driver does all the translation of OpenGL commands into machine code to be submitted to the GPU. To reduce the number of system calls, the user-space driver implements marshalling. If the GPU's command buffer is full of rendering data, the API could simply store the requested rendering call in a temporary buffer and, when the command buffer is close to being empty, it can perform a switch to kernel-mode and add a number of stored commands all at once.

Formats

Marshalling data requires some kind of data transfer, which leverages a specific data format to be chosen as the serialization target.

XML vs JSON vs…

XML is one such format and means of transferring data between systems. Microsoft, for example, uses it as the basis of the file formats of the various components (Word, Excel, Access, PowerPoint, etc.) of the Microsoft Office suite (see Office Open XML).

While this typically results in a verbose wire format, XML's fully-bracketed "start-tag", "end-tag" syntax allows provision of more accurate diagnostics and easier recovery from transmission or disk errors. In addition, because the tags occur repeatedly, one can use standard compression methods to shrink the content—all the Office file formats are created by zipping the raw XML. [10] Alternative formats such as JSON (JavaScript Object Notation) are more concise, but correspondingly less robust for error recovery.

Once the data is transferred to a program or an application, it needs to be converted back to an object for usage. Hence, unmarshalling is generally used in the receiver end of the implementations of Remote Method Invocation (RMI) and Remote procedure call (RPC) mechanisms to unmarshal transmitted objects in an executable form.

JAXB

JAXB or Java Architecture for XML Binding is the most common framework used by developers to marshal and unmarshal Java objects. JAXB provides for the interconversion between fundamental data types supported by Java and standard XML schema data types. [11]

XmlSerializer

XmlSerializer is the framework used by C# developers to marshal and unmarshal C# objects. One of the advantages of C# over Java is that C# natively supports marshalling due to the inclusion of XmlSerializer class. Java, on the other hand requires a non-native glue code in the form of JAXB to support marshalling. [12]

From XML to an executable representation

An example of unmarshalling is the conversion of an XML representation of an object to the default representation of the object in any programming language. Consider the following class:

publicclassStudent{privatecharname[150];privateintID;publicStringgetName(){returnthis.name;}publicintgetID(){returnthis.ID;}voidsetName(Stringname){this.name=name;}voidsetID(intID){this.ID=ID;}}
<!-- Code Snippet 1 --><?xml version="1.0" encoding="UTF-8"?><studentid="11235813"><name>Jayaraman</name></student><studentid="21345589"><name>Shyam</name></student>
// Code Snippet 2Students1=newStudent();s1.setID(11235813);s1.setName("Jayaraman");Students2=newStudent();s2.setID(21345589);s2.setName("Shyam");

Unmarshalling is the process of converting the XML representation of Code Snippet 1 to the default executable Java representation of Code Snippet 2, and running that very code to get a consistent, live object back. Had a different format been chosen, the unmarshalling process would have been different, but the end result in the target runtime would be the same.

Unmarshalling in Java

Unmarshaller in JAXB

The process of unmarshalling XML data into an executable Java object is taken care of by the in-built Unmarshaller class. The unmarshal methods defined in the Unmarshaller class are overloaded to accept XML from different types of input such as a File, FileInputStream, or URL. [13] For example:

JAXBContextjcon=JAXBContext.newInstance("com.acme.foo");Unmarshallerumar=jcon.createUnmarshaller();Objectobj=umar.unmarshal(newFile("input.xml"));

Unmarshalling XML Data

Unmarshal methods can deserialize an entire XML document or a small part of it. When the XML root element is globally declared, these methods utilize the JAXBContext's mapping of XML root elements to JAXB mapped classes to initiate the unmarshalling. If the mappings are not sufficient and the root elements are declared locally, the unmarshal methods use declaredType methods for the unmarshalling process. These two approaches can be understood below. [13]

Unmarshal a global XML root element

The unmarshal method uses JAXBContext to unmarshal the XML data, when the root element is globally declared. The JAXBContext object always maintains a mapping of the globally declared XML element and its name to a JAXB mapped class. If the XML element name or its @xsi:type attribute matches the JAXB mapped class, the unmarshal method transforms the XML data using the appropriate JAXB mapped class. However, if the XML element name has no match, the unmarshal process will abort and throw an UnmarshalException. This can be avoided by using the unmarshal by declaredType methods. [14]

Unmarshal a local XML root element

When the root element is not declared globally, the application assists the unmarshaller by application-provided mapping using declaredType parameters. By an order of precedence, even if the root name has a mapping to an appropriate JAXB class, the declaredType overrides the mapping. However, if the @xsi:type attribute of the XML data has a mapping to an appropriate JAXB class, then this takes precedence over declaredType parameter. The unmarshal methods by declaredType parameters always return a JAXBElement<declaredType> instance. The properties of this JAXBElement instance are set as follows: [15]

JAXBElement PropertyValue
namexml element name
valueinstanceof declaredType
declaredTypeunmarshal method declaredType parameter
scopenull (actual size is not known)

See also

Related Research Articles

In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space, which is written as if it were a normal (local) procedure call, without the programmer explicitly writing the details for the remote interaction. That is, the programmer writes essentially the same code whether the subroutine is local to the executing program, or remote. This is a form of client–server interaction, typically implemented via a request–response message-passing system. In the object-oriented programming paradigm, RPCs are represented by remote method invocation (RMI). The RPC model implies a level of location transparency, namely that calling procedures are largely the same whether they are local or remote, but usually, they are not identical, so local calls can be distinguished from remote calls. Remote calls are usually orders of magnitude slower and less reliable than local calls, so distinguishing them is important.

In computing, serialization is the process of translating a data structure or object state into a format that can be stored or transmitted and reconstructed later. When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.

<span class="mw-page-title-main">XML</span> Markup language by the W3C for encoding of data

Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium's XML 1.0 Specification of 1998 and several other related specifications—all of them free open standards—define XML.

In computer science, reflective programming or reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.

Java Management Extensions (JMX) is a Java technology that supplies tools for managing and monitoring applications, system objects, devices and service-oriented networks. Those resources are represented by objects called MBeans. In the API, classes can be dynamically loaded and instantiated. Managing and monitoring applications can be designed and developed using the Java Dynamic Management Kit.

Jakarta XML Binding is a software framework that allows Java EE developers to map Java classes to XML representations. JAXB provides two main features: the ability to marshal Java objects into XML and the inverse, i.e. to unmarshal XML back into Java objects. In other words, JAXB allows storing and retrieving data in memory in any XML format, without the need to implement a specific set of XML loading and saving routines for the program's class structure. It is similar to xsd.exe and XmlSerializer in the .NET Framework.

XML data binding refers to a means of representing information in an XML document as a business object in computer memory. This allows applications to access the data in the XML from the object, rather than using the DOM or SAX to retrieve the data from a direct representation of the XML itself.

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

In the Java computer programming language, an annotation is a form of syntactic metadata that can be added to Java source code. Classes, methods, variables, parameters and Java packages may be annotated. Like Javadoc tags, Java annotations can be read from source files. Unlike Javadoc tags, Java annotations can also be embedded in and read from Java class files generated by the Java compiler. This allows annotations to be retained by the Java virtual machine at run-time and read via reflection. It is possible to create meta-annotations out of the existing ones in Java.

.NET Remoting is a Microsoft application programming interface (API) for interprocess communication released in 2002 with the 1.0 version of .NET Framework. It is one in a series of Microsoft technologies that began in 1990 with the first version of Object Linking and Embedding (OLE) for 16-bit Windows. Intermediate steps in the development of these technologies were Component Object Model (COM) released in 1993 and updated in 1995 as COM-95, Distributed Component Object Model (DCOM), released in 1997, and COM+ with its Microsoft Transaction Server (MTS), released in 2000. It is now superseded by Windows Communication Foundation (WCF), which is part of the .NET Framework 3.0.

Thrift is an interface definition language and binary communication protocol used for defining and creating services for programming languages. It was developed at Facebook. As of 2020, it is an open source project in the Apache Software Foundation.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

In a distributed computing environment, distributed object communication realizes communication between distributed objects. The main role is to allow objects to access data and invoke methods on remote objects. Invoking a method on a remote object is known as remote method invocation (RMI) or remote invocation, and is the object-oriented programming analog of a remote procedure call (RPC).

Component Object Model (COM) is a binary-interface standard for software components introduced by Microsoft in 1993. It is used to enable inter-process communication (IPC) object creation in a large range of programming languages. COM is the basis for several other Microsoft technologies and frameworks, including OLE, OLE Automation, Browser Helper Object, ActiveX, COM+, DCOM, the Windows shell, DirectX, UMDF and Windows Runtime.

Apache Empire-db is a Java library that provides a high level object-oriented API for accessing relational database management systems (RDBMS) through JDBC. Apache Empire-db is open source and provided under the Apache License 2.0 from the Apache Software Foundation.

<span class="mw-page-title-main">XQuery API for Java</span> Application programming interface

XQuery API for Java (XQJ) refers to the common Java API for the W3C XQuery 1.0 specification.

Windows Runtime (WinRT) is a platform-agnostic component and application architecture first introduced in Windows 8 and Windows Server 2012 in 2012. It is implemented in C++ and officially supports development in C++, Rust/WinRT, Python/WinRT, JavaScript-TypeScript, and the managed code languages C# and Visual Basic .NET (VB.NET).

Oracle TopLink is a mapping and persistence framework for Java developers. TopLink is produced by Oracle and is a part of Oracle's OracleAS, WebLogic, and OC4J servers. It is an object-persistence and object-transformation framework. TopLink provides development tools and run-time functionalities that ease the development process and help increase functionality. Persistent object-oriented data is stored in relational databases which helps build high-performance applications. Storing data in either XML or relational databases is made possible by transforming it from object-oriented data.

Castor is a data binding framework for Java with some features like Java to Java-to-XML binding, Java-to-SQL persistence, paths between Java objects, XML documents, relational tables, etc. Castor is one of the oldest data binding projects.

References

  1. Jeffrey Hantin. "What is the difference between Serialization and Marshaling?". Stack Exchange Network. Retrieved 23 July 2021.
  2. "marshal — Internal Python object serialization". Python Software Foundation. Retrieved 4 November 2016.
  3. "marshal — Internal Python object serialization". Python Software Foundation. Retrieved 9 October 2019.
  4. "Schema for Representing Java(tm) Objects in an LDAP Directory". IETF. October 1999. Retrieved 4 November 2016.
  5. "How To Marshal an Object to a Remote Server by Value by Using Visual Basic .NET". Microsoft. July 2004. Archived from the original on 2004-11-15. Retrieved 4 November 2016.
  6. "Apartments and COM Threading Models". Archived from the original on 2015-09-23. Retrieved 2009-06-19.
  7. "CoInitializeEx function (COM)". Windows Desktop App Development. Retrieved 2013-02-22.
  8. Interop Marshaling Overview
  9. Code Quality: The Open Source Perspective.
  10. What is a DOCX file? https://docs.fileformat.com/word-processing/docx/ Accessed Oct 13, 2020.
  11. "Binding XML Schemas - The Java EE 5 Tutorial". docs.oracle.com. Retrieved 2016-09-14.
  12. "Using the XmlSerializer Class". msdn.microsoft.com. Retrieved 2016-09-23.
  13. 1 2 "Unmarshaller (JAXB 2.2.3)". jaxb.java.net. Retrieved 2016-09-14.
  14. "JAXBContext (JAXB 2.2.3)". jaxb.java.net. Retrieved 2016-09-23.
  15. "JAXBElement (JAXB 2.2.3)". jaxb.java.net. Retrieved 2016-09-23.