Web server directory index

Last updated
www.wikipedia.org, the index of Wikipedia, a multilingual online encyclopedia. Here the website's home page offers many different languages. Www.wikipedia.org screenshot 2018.png
www.wikipedia.org, the index of Wikipedia, a multilingual online encyclopedia. Here the website's home page offers many different languages.

When an HTTP client (generally a web browser) requests a URL that points to a directory structure instead of an actual web page within the directory structure, the web server will generally serve a default page, which is often referred to as a main or "index" page.

Contents

A common filename for such a page is index.html , but most modern HTTP servers offer a configurable list of filenames that the server can use as an index. If a server is configured to support server-side scripting, the list will usually include entries allowing dynamic content to be used as the index page (e.g. index.cgi , index.pl , index.php , index.shtml , index.jsp , default.asp ) even though it may be more appropriate to still specify the HTML output (index.html.php or index.html.aspx), as this should not be taken for granted. An example is the popular open source web server Apache, where the list of filenames is controlled by the DirectoryIndex [1] directive in the main server configuration file or in the configuration file for that directory. It is possible to not use file extensions at all, and be neutral to content delivery methods, and set the server to automatically pick the best file through content negotiation.

If the server is unable to find a file with any of the names listed in its configuration, it may either return an error (usually 403 Index Listing Forbidden or 404 Not Found) or generate its own index page listing the files in the directory. Usually this option, often named autoindex, is also configurable. [2]

History

A scheme where web server serves a default file on per-subdirectory basis has been supported as early as NCSA HTTPd 0.3beta (22 April 1993), [3] which defaults to serve index.html file in the directory. [3] [4] This scheme has been then adopted by CERN HTTPd since at least 2.17beta (5 April 1994), whose default supports Welcome.html and welcome.html in addition to the NCSA-originated index.html. [5]

Later web servers typically support this default file scheme in one form or another; this is usually configurable, with index.html being one of the default file names. [6] [7] [8]

Implementation

An Apache index page. Web server directory list.png
An Apache index page.

In some cases, the home page of a website can be a menu of language options for large sites that use geotargeting. It is also possible to avoid this step, for example, by using content negotiation.

In cases where no known index.* file exists within a given directory, the web server may be configured to provide an automatically generated listing of the files within the directory instead. With the Apache web server, for example, this behavior is provided by the mod_autoindex module [9] and controlled by the Options +Indexes directive [10] in the web server configuration files. These automated directory listings are sometimes a security risk because they enumerate sensitive files which may not be intended for public access, in a process known as a directory indexing attack. [11] Such a security misconfiguration [12] may also assist in other attacks, such as a path or directory traversal attack. [13]

Performances

When accessing a directory, the various available index methods may also have a different impact on usage of OS resources (RAM, CPU time, etc.) and thus on web server performances.

Proceeding from fastest to slowest method, here is the list:

Related Research Articles

<span class="mw-page-title-main">Apache HTTP Server</span> Open-source web server software

The Apache HTTP Server is a free and open-source cross-platform web server software, released under the terms of Apache License 2.0. It is developed and maintained by a community of developers under the auspices of the Apache Software Foundation.

In computing, Common Gateway Interface (CGI) is an interface specification that enables web servers to execute an external program to process HTTP or HTTPS user requests.

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

In web applications, a rewrite engine is a software component that performs rewriting on URLs, modifying their appearance. This modification is called URL rewriting. It is a way of implementing URL mapping or routing within a web application. The engine is typically a component of a web server or web application framework. Rewritten URLs are used to provide shorter and more relevant-looking links to web pages. The technique adds a layer of abstraction between the files used to generate a web page and the URL that is presented to the outside world.

Server Side Includes (SSI) is a simple interpreted server-side scripting language used almost exclusively for the World Wide Web. It is most useful for including the contents of one or more files into a web page on a web server, using its #include directive. This could commonly be a common piece of code throughout a site, such as a page header, a page footer and a navigation menu. SSI also contains control directives for conditional features and directives for calling external programs. It is supported by Apache, LiteSpeed, nginx, IIS as well as W3C's Jigsaw. It has its roots in NCSA HTTPd.

An .htaccess file is a directory-level configuration file supported by several web servers, used for configuration of website-access issues, such as URL redirection, URL shortening, access control, and more. The 'dot' before the file name makes it a hidden file in Unix-based environments.

A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.

Web server software allows computers to act as web servers. The first web servers supported only static files, such as HTML, but now they commonly allow embedding of server side applications.

<span class="mw-page-title-main">HTTP persistent connection</span> Using a single TCP connection to send and receive multiple HTTP requests/responses

HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair. The newer HTTP/2 protocol uses the same idea and takes it further to allow multiple concurrent requests/responses to be multiplexed over a single connection.

For computer log management, the Common Log Format, also known as the NCSA Common log format, is a standardized text file format used by web servers when generating server log files. Because the format is standardized, the files can be readily analyzed by a variety of web analysis programs, for example Webalizer and Analog.

A file inclusion vulnerability is a type of web vulnerability that is most commonly found to affect web applications that rely on a scripting run time. This issue is caused when an application builds a path to executable code using an attacker-controlled variable in a way that allows the attacker to control which file is executed at run time. A file include vulnerability is distinct from a generic directory traversal attack, in that directory traversal is a way of gaining unauthorized file system access, and a file inclusion vulnerability subverts how an application loads code for execution. Successful exploitation of a file inclusion vulnerability will result in remote code execution on the web server that runs the affected web application. An attacker can use remote code execution to create a web shell on the web server, which can be used for website defacement.

<span class="mw-page-title-main">HTTP 403</span> HTTP status code indicating that access is forbidden to a resource

HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it, if it was correct.

<span class="mw-page-title-main">HTTP 301</span> HTTP response status code


On the World Wide Web, HTTP 301 is the HTTP response status code for 301 Moved Permanently. It is used for permanent redirecting, meaning that links or records returning this response should be updated. The new URL should be provided in the Location field, included with the response. The 301 redirect is considered a best practice for upgrading users from HTTP to HTTPS.

Apache suEXEC is a feature of the Apache web server. It allows users to run Common Gateway Interface (CGI) and Server Side Includes (SSI) applications as a different user. Normally, all web server processes run as the default web server user. The suEXEC feature consists of a module for the web server and a binary executable which acts as a wrapper. suEXEC was introduced in Apache 1.2 and is often included in the default Apache package provided by most Linux distributions.

.htpasswd is a flat-file used to store usernames and password for basic authentication on an Apache HTTP Server. The name of the file is given in the .htaccess configuration, and can be anything, although ".htpasswd" is the canonical name. The file name starts with a dot, because most Unix-like operating systems consider any file that begins with a dot to be hidden. The htpasswd command is used to manage .htpasswd file entries.

mod_deflate is an optional module for the Apache HTTP Server, Apache v2.0 and later. It is based on Deflate lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. This module provides the DEFLATE output filter that allows output from Apache HTTP server to be compressed before being sent to the client over the network. It also provides a filter for decompressing a gzip compressed response body.

<span class="mw-page-title-main">Plack (software)</span>

Plack is a Perl web application programming framework inspired by Rack for Ruby and WSGI for Python, and it is the project behind the PSGI specification used by other frameworks such as Catalyst and Dancer. Plack allows for testing of Perl web applications without a live web server.

<span class="mw-page-title-main">Helicon Ape</span>

Helicon Ape is a piece of software developed by Helicon Tech to bring Apache functionality to IIS web servers. It executes as an ASP.NET module for IIS 7, integrating the functionalities of over 35 Apache modules. This integration allows for the use of Apache configurations on IIS while maintaining the syntax intact, thereby extending the standard capabilities of IIS.

FastCGI is a binary protocol for interfacing interactive programs with a web server. It is a variation on the earlier Common Gateway Interface (CGI). FastCGI's main aim is to reduce the overhead related to interfacing between web server and CGI programs, allowing a server to handle more web page requests per unit of time.

Robert Martin McCool, more commonly known as Rob McCool, is a software developer and architect.

References

  1. "mod_dir - Apache HTTP Server". httpd.apache.org. Retrieved 2014-05-30.
  2. ASF Infrabot (2019-05-22). "Directory listings". Apache foundation: HTTPd server project. Retrieved 2021-11-16.
  3. 1 2 "WWW-Talk Apr-Jun 1993: NCSA httpd version 0.3". 1997.webhistory.org.
  4. "NCSA HTTPd DirectoryIndex". January 31, 2009. Archived from the original on January 31, 2009.
  5. "Change History of W3C httpd". June 5, 1997. Archived from the original on June 5, 1997.
  6. "mod_dir - Apache HTTP Server Version 2.4 § DirectoryIndex Directive". httpd.apache.org. Archived from the original on 2020-11-12. Retrieved 2021-01-13.
  7. "NGINX Docs | Serving Static Content". docs.nginx.com. Archived from the original on 2020-11-11. Retrieved 2021-01-13.
  8. "Default Document <defaultDocument> | Microsoft Docs". docs.microsoft.com. Archived from the original on 2020-12-08. Retrieved 2021-01-13.
  9. "mod_autoindex - Apache HTTP Server Version 2.4". httpd.apache.org. Retrieved 2021-01-13.
  10. "core - Apache HTTP Server Version 2.4 § Options Directive". httpd.apache.org. Retrieved 2021-01-13.
  11. "IBM Docs". IBM. 2021-03-08. Retrieved 2021-05-07.
  12. "A6:2017-Security Misconfiguration". OWASP. Retrieved 2021-05-07.
  13. "Path Traversal". OWASP. Retrieved 2021-05-07.