Proxy auto-config

Last updated

A proxy auto-config (PAC) file defines how web browsers and other user agents can automatically choose the appropriate proxy server (access method) for fetching a given URL.

Contents

A PAC file contains a JavaScript function FindProxyForURL(url, host). This function returns a string with one or more access method specifications. These specifications cause the user agent to use a particular proxy server or to connect directly. [1]

Multiple specifications provide a fallback when a proxy fails to respond. The browser fetches this PAC file before requesting other URLs. The URL of the PAC file is either configured manually or determined automatically by the Web Proxy Auto-Discovery Protocol.

Context

Modern web browsers implement several levels of automation; users can choose the level that is appropriate to their needs. The following methods are commonly implemented:

History

The Proxy auto-config file format was originally designed by Netscape in 1996 for the Netscape Navigator 2.0 [2] and is a text file that defines at least one JavaScript function.

The PAC File

By convention, the PAC file is normally named proxy.pac. The WPAD standard uses wpad.dat. The .pac file is expected to contain at least one function:

FindProxyForURL(url, host), with two arguments and return value in specific format:
* url is the URL of the object
* host is the host-name derived from that URL. Syntactically it is the same string as between :// and the first : or / after that. [3]
* return "..." is a string of signatures in the following format (see examples below): [note 1]
returnValue =type host,":",port,[{";",returnValue }];type        ="DIRECT"|"PROXY"|"SOCKS"|"HTTP"|"HTTPS"|"SOCKS4"|"SOCKS5"host        =UTF16String       (* ECMA262-compatible UTF16 string *)port        =UTF16String       (* Digits *)


To use it, a PAC file is published to a HTTP server, and client user agents are instructed to use it, either by entering the URL in the proxy connection settings of the browser or through the use of the WPAD protocol. The URL may also reference a local file as for example: file:///etc/proxy.pac.

Even though most clients will process the script regardless of the MIME type returned in the HTTP reply, for the sake of completeness and to maximize compatibility, the HTTP server should be configured to declare the MIME type of this file to be either application/x-ns-proxy-autoconfig or application/x-javascript-config.

There is little evidence to favor the use of one MIME type over the other. It would be, however, reasonable to assume that application/x-ns-proxy-autoconfig will be supported in more clients than application/x-javascript-config as it was defined in the original Netscape specification, the latter type coming into use more recently.

Example

A very simple example of a PAC file is:

functionFindProxyForURL(url,host){return'PROXY proxy.example.com:8080; DIRECT';}

This function instructs the browser to retrieve all pages through the proxy on port 8080 of the server proxy.example.com. Should this proxy fail to respond, the browser contacts the Web-site directly, without using a proxy. The latter may fail if firewalls, or other intermediary network devices, reject requests from sources other than the proxy—a common configuration in corporate networks.

A more complicated example demonstrates some available JavaScript functions to be used in the FindProxyForURL function:

functionFindProxyForURL(url,host){// our local URLs from the domains below example.com don't need a proxy:if(shExpMatch(host,'*.example.com')){return'DIRECT';}// URLs within this network are accessed through// port 8080 on fastproxy.example.com:if(isInNet(host,'10.0.0.0','255.255.248.0')){return'PROXY fastproxy.example.com:8080';}// All other requests go through port 8080 of proxy.example.com.// should that fail to respond, go directly to the WWW:return'PROXY proxy.example.com:8080; DIRECT';}

By default, the PROXY keyword means that a proxy corresponding to the protocol of the original request, be it http, https, or ftp, is used. Other supported keyword and proxy types include:

SOCKS
Use a SOCKS proxy.
HTTP, HTTPS
Introduced in more recent versions of Firefox. Specifies an HTTP(S) proxy.
SOCKS4, SOCKS5
Introduced in more recent versions of Firefox. Specifies the SOCKS protocol version.

Limitations

PAC Character-Encoding

The encoding of PAC scripts is generally unspecified, and different browsers and network stacks have different rules for how PAC scripts may be encoded. In general, wholly ASCII PAC scripts will work with any browser or network stack. Mozilla Firefox 66 and later additionally supports PAC scripts encoded as UTF-8. [4]

DnsResolve

The function dnsResolve (and similar other functions) performs a DNS lookup that can block the browser for a long time if the DNS server does not respond.

myIpAddress

The myIpAddress function has often been reported to give incorrect or unusable results, e.g. 127.0.0.1, the IP address of the localhost. [5] It may help to remove on the system's host file (e.g. /etc/hosts on Linux) any lines referring to the machine host-name, while the line 127.0.0.1 localhost can, and should, stay.[ citation needed ]

Security

In 2013, researchers began warning about the security risks of proxy auto-config. [6] The threat involves using a PAC, discovered automatically by the system, to redirect the victim's browser traffic to an attacker-controlled server instead.

Another issue with pac-file is that the typical implementation involve clear text http retrieval, which does not include any security features such as code signing or web certificates. Attackers can perform man-in-the-middle attacks easily.

Old Microsoft problems

Caching of proxy auto-configuration results by domain name in Microsoft's Internet Explorer 5.5 or newer limits the flexibility of the PAC standard. In effect, you can choose the proxy based on the domain name, but not on the path of the URL. Alternatively, you need to disable caching of proxy auto-configuration results by editing the registry, a process described by de Boyne Pollard (listed in further reading).

It is recommended to always use IP addresses instead of host domain names in the isInNet function for compatibility with other Windows components which make use of the Internet Explorer PAC configuration, such as .NET 2.0 Framework. For example,

if(isInNet(host,dnsResolve(sampledomain),'255.255.248.0')){}// .NET 2.0 will resolve proxy properlyif(isInNet(host,sampledomain,'255.255.248.0')){}// .NET 2.0 will not resolve proxy properly

The current convention is to fail over to direct connection when a PAC file is unavailable.

Shortly after switching between network configurations (e.g. when entering or leaving a VPN), dnsResolve may give outdated results due to DNS caching.

For instance, Firefox usually keeps 20 domain entries cached for 60 seconds. This may be configured via the network.dnsCacheEntries and network.dnsCacheExpiration configuration variables. Flushing the system's DNS cache may also help, which can be achieved e.g. in Linux with sudo service dns-clean start or in Windows with ipconfig /flushdns.

On Internet Explorer 9, isInNet('localHostName', 'second.ip', '255.255.255.255') returns true and can be used as a workaround.

The myIpAddress function assumes that the device has a single IPv4 address. The results are undefined if the device has more than one IPv4 address or has IPv6 addresses.

Others

Further limitations are related to the JavaScript engine on the local machine.

Advanced functionality

More advanced PAC files can reduce load on proxies, perform load balancing, fail over, or even black/white listing before the request is sent through the network. One can return multiple proxies:

return'PROXY proxy1.example.com:80; PROXY proxy2.example.com:8080';

The above will try proxy1 first and if unavailable it will then try proxy2.

Notes

  1. EBNF by W3C notation

Related Research Articles

The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned to each of the associated entities. Most prominently, it translates readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. The Domain Name System has been an essential component of the functionality of the Internet since 1985.

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

<span class="mw-page-title-main">Netscape Navigator</span> Web browser by Netscape released in 1994

Netscape Navigator is a discontinued proprietary web browser, and the original browser of the Netscape line, from versions 1 to 4.08, and 9.x. It was the flagship product of the Netscape Communications Corp and was the dominant web browser in terms of usage share in the 1990s, but by around 2003 its user base had all but disappeared. This was partly because the Netscape Corporation did not sustain Netscape Navigator's technical innovation in the late 1990s.

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web (WWW), commonly known as the Web, is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

<span class="mw-page-title-main">Web browser</span> Software used to navigate the internet

A web browser is an application for accessing websites and the Internet. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on a range of devices, including desktops, laptops, tablets, and smartphones. In 2020, an estimated 4.9 billion people have used a browser. The most used browser is Google Chrome, with a 65% global market share on all devices, followed by Safari with 18%.

<span class="mw-page-title-main">Proxy server</span> Computer server that makes and receives requests on behalf of a user

In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. It improves privacy, security, and performance in the process.

<span class="mw-page-title-main">Squid (software)</span> Caching and forwarding HTTP web proxy

Squid is a caching and forwarding HTTP proxy. It has a wide variety of uses, including speeding up a web server by caching repeated requests, caching World Wide Web (WWW), Domain Name System (DNS), and other lookups for a group of people sharing network resources, and aiding security by filtering traffic. Although used for mainly HTTP and File Transfer Protocol (FTP), Squid includes limited support for several other protocols including Internet Gopher, Secure Sockets Layer (SSL), Transport Layer Security (TLS), and Hypertext Transfer Protocol Secure (HTTPS). Squid does not support the SOCKS protocol, unlike Privoxy, with which Squid can be used in order to provide SOCKS support.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

The Web Proxy Auto-Discovery (WPAD) Protocol is a method used by clients to locate the URL of a configuration file using DHCP and/or DNS discovery methods. Once detection and download of the configuration file is complete, it can be executed to determine the proxy for a specified URL.

Ad blocking or ad filtering is a software capability for blocking or altering online advertising in a web browser, an application or a network. This may be done using browser extensions or other methods.

DNS spoofing, also referred to as DNS cache poisoning, is a form of computer security hacking in which corrupt Domain Name System data is introduced into the DNS resolver's cache, causing the name server to return an incorrect result record, e.g. an IP address. This results in traffic being diverted to any computer that the attacker chooses.

Link prefetching allows web browsers to pre-load resources. This speeds up both the loading and rendering of web pages. Prefetching was first introduced in HTML5.

<span class="mw-page-title-main">Same-origin policy</span> Security measure for client-side scripting

In computing, the same-origin policy (SOP) is an important concept in the web application security model. Under the policy, a web browser permits scripts contained in a first web page to access data in a second web page, but only if both web pages have the same origin. An origin is defined as a combination of URI scheme, host name, and port number. This policy prevents a malicious script on one page from obtaining access to sensitive data on another web page through that page's Document Object Model (DOM).

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

DNS rebinding is a method of manipulating resolution of domain names that is commonly used as a form of computer attack. In this attack, a malicious web page causes visitors to run a client-side script that attacks machines elsewhere on the network. In theory, the same-origin policy prevents this from happening: client-side scripts are only allowed to access content on the same host that served the script. Comparing domain names is an essential part of enforcing this policy, so DNS rebinding circumvents this protection by abusing the Domain Name System (DNS).

DNS hijacking, DNS poisoning, or DNS redirection is the practice of subverting the resolution of Domain Name System (DNS) queries. This can be achieved by malware that overrides a computer's TCP/IP configuration to point at a rogue DNS server under the control of an attacker, or through modifying the behaviour of a trusted DNS server so that it does not comply with internet standards.

<span class="mw-page-title-main">Helicon Ape</span>

Helicon Ape is a piece of software by Helicon Tech designed to introduce Apache functionality on IIS web servers. Helicon Ape was created as an ASP.NET module for IIS 7, implementing functionality of more than 35 Apache modules, that allow using Apache configurations on IIS and extending standard IIS capabilities.

Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript fetch or XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser. In a CSRF attack, an innocent end user is tricked by an attacker into submitting a web request that they did not intend. This may cause actions to be performed on the website that can include inadvertent client or server data leakage, change of session state, or manipulation of an end user's account.

Internet censorship circumvention, also referred to as going over the wall or scientific browsing in China, is the use of various methods and tools to bypass internet censorship.

Caddy is an extensible, cross-platform, open-source web server written in Go.

References

  1. "Proxy Auto-Configuration (PAC) file - HTTP | MDN". developer.mozilla.org. 2023-04-23. Retrieved 2023-07-02.
  2. "Navigator Proxy Auto-Config File Format". Netscape Navigator Documentation. March 1996. Archived from the original on 2007-06-02. Retrieved 2013-07-05.
  3. "Proxy Auto-Configuration (PAC) file - HTTP | MDN". 23 April 2023.
  4. "Bug 1492938 - Proxy autoconfig scripts should be loaded as UTF-8 if they are valid UTF-8, otherwise as Latin-1 (a byte is a code point)" . Retrieved 2019-04-10.
  5. "Bug 347307 - Need a way to determine the best local IP address for PAC files to use" . Retrieved 2022-04-18.
  6. Lemos, Robert (2013-03-06). "Cybercriminals Likely To Expand Use Of Browser Proxies" . Retrieved 2016-04-20.

Further reading

de Boyne Pollard, Jonathan (2004). "Automatic proxy HTTP server configuration in web browsers". Frequently Given Answers. Retrieved 2013-07-05.