Original author(s) | Bradford L. Barrett |
---|---|
Initial release | 1997 |
Stable release | 2.23-08 / August 26, 2013 |
Written in | C |
Operating system | Cross-platform |
Available in | Over 30 languages |
Type | Web analytics |
License | GNU General Public License |
Website | webalizer |
The Webalizer is a web log analysis software, which generates web pages of analysis, from access and usage logs. It is one of the most commonly used web server administration tools. It was initiated by Bradford L. Barrett in 1997. Statistics commonly reported by Webalizer include hits, visits, referrers, the visitors' countries, and the amount of data downloaded. These statistics can be viewed graphically and presented by different time frames, such as by day, hour, or month.
Website traffic analysis is produced by grouping and aggregating various data items captured by the web server in the form of log files while the website visitor is browsing the website. The Webalizer analyzes web server log files, extracting such items as client's IP addresses, URL paths, processing times, user agents, referrers, etc. and grouping them in order to produce HTML reports.
Web servers log HTTP traffic using different file formats. Common file formats are Common Log Format (CLF), the Apache Custom Log Format, and Extended Log File Format. An example of a CLF log line is shown below.
192.168.1.20 - - [26/Dec/2006:03:09:16 -0500] "GET HTTP/ 1.1" 200 1774
Apache Custom Log Format can be customized to log most HTTP parameters, including request processing time and the size of the request itself. The format of a custom log is controlled by the format line. A typical Apache log format configuration is shown below.
LogFormat"%a %l \"%u\" %t %m \"%U\" \"%q\" %p %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\""my_custom_log CustomLoglogs/access_logmy_custom_log
Microsoft's Internet Information Services (IIS) web server logs HTTP traffic in W3C Extended Log File Format. Similarly to Apache Custom Log format, IIS logs may be configured to capture such extended parameters as request processing time. W3C extended logs may be recognized by the presence of one or more format lines, such as the one shown below.
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-bytes cs-bytes time-taken
The Webalizer can process CLF, Apache and W3C Extended log files, as well as HTTP proxy log files produced by Squid servers. Other log file formats are usually converted to CLF in order to be analyzed. In addition, logs compressed with either GZip (.gz) or BZip2 (.bz2) can be processed directly without the need to uncompress before use.
The Webalizer is a command line application and is launched from the operating system shell prompt. A typical command is shown below.
webalizer -p -F clf -n en.wikipedia.org -o reports logfiles/access_log
This command instructs The Webalizer to analyze the log file access_log, run in the incremental mode (-p), interpret the log as a CLF log file (-F), use the domain name en.wikipedia.org for report links (-n) and produce the output subdirectory of the current directory. Use the -h option to see the complete list of command line options.
Besides the command line options, the Webalizer may be configured through parameters of a configuration file. By default, The Webalizer reads the file webalizer.conf and interprets each line as a processing instruction. Alternatively, a user-specified file may be provided using the -c option.
For example, if the webmaster would like to ignore all requests made from a particular group of hosts, he or she can use the IgnoreSite parameter to discard all log records with the IP address matching the specified pattern:
IgnoreSite 192.168.0.*
There are over one hundred available configuration parameters, which make The Webalizer a highly configurable web traffic analysis application. For a complete list of configuration parameters please refer to the README file shipped with every source or binary distribution.
By default, The Webalizer produces two kinds of reports - a yearly summary report and a detailed monthly report, one for each analyzed month.
The yearly summary report provides such information as the number of hits, file and page requests, hosts and visits, as well as daily averages of these counters for each month. The report is accompanied by a yearly summary graph.
Each of the monthly reports is generated as a single HTML page containing a monthly summary report (listing the overall number of hits, file and page requests, visits, hosts, etc.), a daily report (grouping these counters for each of the days of the month), an aggregated hourly report (grouping counters for the same hour of each day together), a URL report (grouping collected information by URL), a host report (by IP address), website entry and exit URL reports (showing most common first and last visit URLs), a referrer report (grouping the referring third-party URLs leading to the analyzed website), a search string report (grouping items by search terms used in such search engines as Google), a user agent report (grouping by the browser type) and a country report (grouping by the host's country of origin).
Each of the standard HTML reports described above lists only top entries for each item (e.g. top 20 URLs). The actual number of lines for each of the reports is controlled by configuration. The Webalizer may also be configured to produce a separate report for each of the items, which will list every single item, such as all website visitors, all requested URLs, etc.
In addition to HTML reports, The Webalizer may be configured to produce comma-delimited dump files, which list all of the report data in a plain-text file. Dump files may be imported to spreadsheet applications or databases for further analysis.
HTML reports may be produced reports in over 30 languages, including Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latvian, Malay, Norwegian, Polish, Portuguese, Portuguese (Brazil), Romanian, Russian, Serbian, Simplified Chinese, Slovak, Slovene, Spanish, Swedish, Turkish, Ukrainian.
To generate reports in an alternate language requires a separate webalizer binary compiled specifically for that language.
The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).
A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.
In computer network communications, the HTTP 404, 404 not found, 404, 404 error, page not found, or file not found error message is a hypertext transfer protocol (HTTP) standard response code, to indicate that the browser was able to communicate with a given server, but the server could not find what was requested. The error may also be used when a server does not wish to disclose whether it has the requested information.
In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. It improves privacy, security, and possibly performance in the process.
Inline linking is the use of a linked object, often an image, on one site by a web page belonging to a second site. One site is said to have an inline link to the other site where the object is located.
Server Side Includes (SSI) is a simple interpreted server-side scripting language used almost exclusively for the World Wide Web. It is most useful for including the contents of one or more files into a web page on a web server, using its #include
directive. This could commonly be a common piece of code throughout a site, such as a page header, a page footer and a navigation menu. SSI also contains control directives for conditional features and directives for calling external programs. It is supported by Apache, LiteSpeed, nginx, IIS as well as W3C's Jigsaw. It has its roots in NCSA HTTPd.
An .htaccess file is a directory-level configuration file supported by several web servers, used for configuration of website-access issues, such as URL redirection, URL shortening, access control, and more. The 'dot' before the file name makes it a hidden file in Unix-based environments.
A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.
AWStats is an open source Web analytics reporting tool, suitable for analyzing data from Internet services such as web, streaming media, mail, and FTP servers. AWStats parses and analyzes server log files, producing HTML reports. Data is visually presented within reports by tables and bar graphs. Static reports can be created through a command line interface, and on-demand reporting is supported through a Web browser CGI program.
The Internet Server Application Programming Interface (ISAPI) is an n-tier API of Internet Information Services (IIS), Microsoft's collection of Windows-based web server services. The most prominent application of IIS and ISAPI is Microsoft's web server.
Web analytics is the measurement, collection, analysis, and reporting of web data to understand and optimize web usage. Web analytics is not just a process for measuring web traffic but can be used as a tool for business and market research and assess and improve website effectiveness. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It can be used to estimate how traffic to a website changes after launching a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views, or creates user behavior profiles. It helps gauge traffic and popularity trends, which is useful for market research.
Apache Log4j is a Java-based logging utility originally written by Ceki Gülcü. It is part of the Apache Logging Services, a project of the Apache Software Foundation. Log4j is one of several Java logging frameworks.
A proxy auto-config (PAC) file defines how web browsers and other user agents can automatically choose the appropriate proxy server for fetching a given URL.
For computer log management, the Common Log Format, also known as the NCSA Common log format, is a standardized text file format used by web servers when generating server log files. Because the format is standardized, the files can be readily analyzed by a variety of web analysis programs, for example Webalizer and Analog.
HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it, if it was correct.
The Netscape Server Application Programming Interface (NSAPI) is an application programming interface for extending server software, typically web server software.
In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or just information on current operations. These events may occur in the operating system or in other software. A message or log entry is recorded for each such event. These log messages can then be used to monitor and understand the operation of the system, to debug problems, or during an audit. Logging is particularly important in multi-user software, to have a central overview of the operation of the system.
curl-loader is an open-source software performance testing tool written in the C programming language.
Extended Log Format (ELF) is a standardized text file format that is used by web servers when generating log files. In comparison to the Common Log Format (CLF), ELF provides more information and flexibility.
When an HTTP client requests a URL that points to a directory structure instead of an actual web page within the directory structure, the web server will generally serve a default page, which is often referred to as a main or "index" page.