Xmark93

Last updated

Xmark93 is a standardized benchmarking tool for measuring the performance of computer systems running the X Window System. It was developed by the SPEC XPC group in 1993.

Xmark93 allows systems evaluators and vendors to compare the performance of X server/hardware systems for a broad set of X basic functions, covering a wide range of applications. The benchmark provides a standardized method for summarizing X11perf results, providing a single-number measure of overall X11 server/hardware performance.

Specifications

Xmark93 is derived by calculating the ratio between the geometrically weighted mean of the 447 individual X11perf tests for the server/hardware being evaluated and the corresponding results from a Sun Microsystems SPARCstation 1. Weightings for each of the X11perf tests were obtained by a survey of X11 technical experts. The weightings reflect the experts' ratings of the relative importance of individual X11perf operations within a wide mix of applications.

Related Research Articles

Instructions per second measure of a computers processing speed

Instructions per second (IPS) is a measure of a computer's processor speed. For CISC computers different instructions take different amounts of time, so the value measured depends on the instruction mix; even for comparing processors in the same family the IPS measurement can be problematic. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches and no cache contention, whereas realistic workloads typically lead to significantly lower IPS values. Memory hierarchy also greatly affects processor performance, an issue barely considered in IPS calculations. Because of these problems, synthetic benchmarks such as Dhrystone are now generally used to estimate computer performance in commonly used applications, and raw IPS has fallen into disuse.

In software quality assurance, performance testing is in general a testing practice performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

Standardized test Test administered and scored in a predetermined, standard manner

A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a predetermined, standard manner.

Standard Performance Evaluation Corporation

The Standard Performance Evaluation Corporation (SPEC) is an American non-profit corporation that aims to "produce, establish, maintain and endorse a standardized set" of performance benchmarks for computers.

Load testing is the process of putting demand on a system and measuring its response.

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term benchmark is also commonly utilized for the purposes of elaborately designed benchmarking programs themselves.

SPECint is a computer benchmark specification for CPU integer processing power. It is maintained by the Standard Performance Evaluation Corporation (SPEC). SPECint is the integer performance testing component of the SPEC test suite. The first SPEC test suite, CPU92, was announced in 1992. It was followed by CPU95, CPU2000, and CPU2006. The latest standard is SPEC CPU 2017 and consists of SPECspeed and SPECrate.

EEMBC, the Embedded Microprocessor Benchmark Consortium, is a non-profit, member-funded organization formed in 1997, focused on the creation of standard benchmarks for the hardware and software used in embedded systems. The goal of its members is to make EEMBC benchmarks an industry standard for evaluating the capabilities of embedded processors, compilers, and the associated embedded system implementations, according to objective, clearly defined, application-based criteria. EEMBC members may contribute to the development of benchmarks, vote at various stages before public distribution, and accelerate testing of their platforms through early access to benchmarks and associated specifications.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

Scalability testing, is the testing of a software application to measure its capability to scale up or scale out in terms of any of its non-functional capability.

VMmark is a freeware virtual machine benchmark software suite from VMware, Inc. The suite measures the performance of virtualized servers while running under load on a set of physical hardware. VMmark was independently developed by VMware.

This page is a comparison of Windows Vista and Windows XP. Windows XP and Windows Vista differ considerably in regards to their security architecture, networking technologies, management and administration, shell and user interface, and mobile computing. Windows XP has suffered criticism for security problems and issues with performance. Vista has received criticism for issues with performance and product activation. Another common criticism of Vista concerns the integration of new forms of DRM into the operating system, and User Account Control (UAC) security technology.

CoreMark is a benchmark that measures the performance of central processing units (CPU) used in embedded systems. It was developed in 2009 by Shay Gal-On at EEMBC and is intended to become an industry standard, replacing the Dhrystone benchmark. The code is written in C and contains implementations of the following algorithms: list processing, matrix manipulation, state machine, and CRC. The code is under the Apache License 2.0 and is free of cost to use, but ownership is retained by the Consortium and publication of modified versions under the CoreMark name prohibited.

The Telecommunication Application Transaction Processing Benchmark (TATP) is a benchmark designed to measure performance of in-memory database transaction systems.

SPARC T3

The SPARC T3 microprocessor is a multithreading, multi-core CPU produced by Oracle Corporation. Officially launched on 20 September 2010, it is a member of the SPARC family, and the successor to the UltraSPARC T2.

HPC Challenge Benchmark combines several benchmarks to test a number of independent attributes of the performance of high-performance computer (HPC) systems. The project has been co-sponsored by the DARPA High Productivity Computing Systems program, the United States Department of Energy and the National Science Foundation.

A browser speed test is a computer benchmark that scores the performance of a web browser, by measuring the browser's efficiency in completing a predefined list of tasks. In general the testing software is available online, located on a website, where different algorithms are loaded and performed in the browser client. Typical test tasks are rendering and animation, DOM transformations, string operations, mathematical calculations, sorting algorithms, graphic performance tests and memory instructions. Browser speed tests have been used during browser wars to prove superiority of specific web browsers. The popular Acid3 test is no particular speed test but checks browser conformity to web standards.

httperf is a testing tool to measure the performance of web servers. It was originally developed by David Mosberger and other staff at Hewlett-Packard Research Laboratories.

perf is a performance analyzing tool in Linux, available from Linux kernel version 2.6.31 in 2009. Userspace controlling utility, named perf, is accessed from the command line and provides a number of subcommands; it is capable of statistical profiling of the entire system.

Expert Judgment (EJ) denotes a wide variety of techniques ranging from a single undocumented opinion, through preference surveys, to formal elicitation with external validation of expert probability assessments. Recent books are . In the nuclear safety area, Rasmussen formalized EJ by documenting all steps in the expert elicitation process for scientific review. This made visible wide spreads in expert assessments and teed up questions regarding the validation and synthesis of expert judgments. The nuclear safety community later took onboard expert judgment techniques underpinned by external validation . Empirical validation is the hallmark of science, and forms the centerpiece of the classical model of probabilistic forecasting . A European Network coordinates workshops. Application areas include nuclear safety, investment banking, volcanology, public health, ecology, engineering, climate change and aeronautics/aerospace. For a survey of applications through 2006 see and give exhortatory overviews. A recent large scale implementation by the World Health Organization is described in . A long running application at the Montserrat Volcano Observatory is described in . The classical model scores expert performance in terms of statistical accuracy and informativeness . These terms should not be confused with “accuracy and precision”. Accuracy “is a description of systematic errors” while precision “is a description of random errors”. In the classical model statistical accuracy is measured as the p-value or probability with which one would falsely reject the hypotheses that an expert's probability assessments were statistically accurate. A low value means it is very unlikely that the discrepancy between an expert's probability statements and observed outcomes should arise by chance. Informativeness is measured as Shannon relative information with respect to an analyst-supplied background measure. Shannon relative information is used because it is scale invariant, tail insensitive, slow, and familiar. Parenthetically, measures with physical dimensions, such as the standard deviation, or the width of prediction intervals, raise serious problems, as a change of units would affect some variables but not others. The product of statistical accuracy and informativeness for each expert is their combined score. With an optimal choice of a statistical accuracy threshold beneath which experts are unweighted, the combined score is a long run “strictly proper scoring rule”: an expert achieves his long run maximal expected score by and only by stating his true beliefs. The classical model derives Performance Weighted (PW) combinations. These are compared with Equally Weighted (EW) combinations, and recently with Harmonically Weighted (HW) combinations, as well as with individual expert assessments.