RSS tracking

Last updated

RSS tracking is a methodology for tracking RSS feeds.

RSS family of web feed formats

RSS is a type of web feed which allows users and applications to access updates to online content in a standardized, computer-readable format. These feeds can, for example, allow a user to keep track of many different websites in a single news aggregator. The news aggregator will automatically check the RSS feed for new content, allowing the content to be automatically passed from website to website or from website to user. This passing of content is called web syndication. Websites usually use RSS feeds to publish frequently updated information, such as blog entries, news headlines, or episodes of audio and video series. RSS is also used to distribute podcasts. An RSS document includes full or summarized text, and metadata, like publishing date and author's name.

Contents

History

RSS feeds have been around since 1999 as a form of internet marketing, however unlike other forms of publishing information on the internet, it is difficult to track the usage of RSS feeds. Feed tracking methods have been growing in popularity

Technology

There are currently many methods of tracking RSS feeds, all with their own problems in terms of accuracy.

Method 1

Transparent 1×1 pixel images - These images can be embedded within the content of the RSS feed by linking to the image which should be held on the web server. The number of requests made can be measured by using the web server log files. This will give a rough estimate as to how many times the RSS feed has been viewed.

The problem with this method is that not all RSS feed aggregators will display images and parse HTML.

HTML Hypertext Markup Language

Hypertext Markup Language (HTML) is the standard markup language for creating web pages and web applications. With Cascading Style Sheets (CSS) and JavaScript, it forms a triad of cornerstone technologies for the World Wide Web.

Method 2

Third-party services - There are services available on the Internet that will syndicate your RSS feed and then track all requests made to their syndication of your RSS feed. These services come in both free and paid forms.

The problem with this method is that all analytical data about the feeds are controlled by the service provider and so not easily accessible or transferable.

Method 3

Unique URL per feed - This method requires heavy web server programming to auto generate a different RSS feed URL for each visitor to the website. The visitor's RSS feed activity can then be tracked accurately using standard web analytics applications.

Web analytics is the measurement, collection, analysis and reporting of web data for purposes of understanding and optimizing web usage. However, Web analytics is not just a process for measuring web traffic but can be used as a tool for business and market research, and to assess and improve the effectiveness of a website. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It helps one to estimate how traffic to a website changes after the launch of a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views. It helps gauge traffic and popularity trends which is useful for market research.

The problem with this method is that if the feed is syndicated by a search engine for instance then this will defeat the purpose of the unique URLs as many people could potentially view the RSS feed via a single URL.

Method 4

Estimating number of subscribers from the log files. Some aggregators (for example, Bloglines and Google Reader) include a number of unique users on whose behalf the feed is being downloaded in the HTTP request. Other readers -- such as web browsers -- can be counted by noting the number of unique IP addresses that retrieve the file in a given period.

This provides an estimate of actual readership, although it is probably higher than the real number because people may sign up for accounts with multiple aggregators and never delete their subscriptions and because they may read the same feeds at different computers, or the same computer may have a different IP address at different times.

Related Research Articles

World Wide Web System of interlinked hypertext documents accessed over the Internet

The World Wide Web (WWW), commonly known as the Web, is an information space where documents and other web resources are identified by Uniform Resource Locators, which may be interlinked by hypertext, and are accessible over the Internet. The resources of the WWW may be accessed by users by a software application called a web browser.

Proxy server server that acts as an intermediate between a client and its destination server

In computer networks, a proxy server is a server that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity. Proxies were invented to add structure and encapsulation to distributed systems.

Atom (Web standard) Extensible Markup Language used for web feeds

The name Atom applies to a pair of related Web standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol is a simple HTTP-based protocol for creating and updating web resources.

Web traffic is the amount of data sent and received by visitors to a website. This necessarily does not include the traffic generated by bots. Since the mid-1990s, web traffic has been the largest portion of Internet traffic. This is determined by the number of visitors and the number of pages they visit. Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are popular and if there are any apparent trends, such as one specific page being viewed mostly by people in a particular country. There are many ways to monitor this traffic and the gathered data is used to help structure sites, highlight security problems or indicate a potential lack of bandwidth.

Web feed data format used for providing users with frequently updated content

On the World Wide Web, a web feed is a data format used for providing users with frequently updated content. Content distributors syndicate a web feed, thereby allowing users to subscribe a channel to it. Making a collection of web feeds accessible in one spot is known as aggregation, which is performed by a news aggregator. A web feed is also sometimes referred to as a syndicated feed.

In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers, but instead categorize published messages into classes without knowledge of which subscribers, if any, there may be. Similarly, subscribers express interest in one or more classes and only receive messages that are of interest, without knowledge of which publishers, if any, there are.

The Webalizer is web log analysis software, which generates web pages of analysis, from access and usage logs. It is one of the most commonly used web server administration tools. It was initiated by Bradford L. Barrett in 1997. Statistics commonly reported by Webalizer include hits, visits, referrers, the visitors' countries, and the amount of data downloaded. These statistics can be viewed graphically and presented by different time frames, such as by day, hour, or month.

In computing, the same-origin policy is an important concept in the web application security model. Under the policy, a web browser permits scripts contained in a first web page to access data in a second web page, but only if both web pages have the same origin. An origin is defined as a combination of URI scheme, host name, and port number. This policy prevents a malicious script on one page from obtaining access to sensitive data on another web page through that page's Document Object Model.

An image hosting service allows individuals to upload images to an Internet website. The image host will then store the image onto its server, and show the individual different types of code to allow others to view that image.

News aggregator Client software that aggregates syndicated web content

In computing, a news aggregator, also termed a feed aggregator, feed reader, news reader, RSS reader or simply aggregator, is client software or a web application which aggregates syndicated web content such as online newspapers, blogs, podcasts, and video blogs (vlogs) in one location for easy viewing. RSS is a synchronized subscription system. RSS uses extensible markup language (XML) to structure pieces of information to be aggregated in a feed reader that displays the information in a user-friendly interface. The updates distributed may include journal tables of contents, podcasts, videos, and news items.

This is a list of blogging terms. Blogging, like any hobby, has developed something of a specialised vocabulary. The following is an attempt to explain a few of the more common phrases and words, including etymologies when not obvious.

An HTTP cookie is a small piece of data sent from a website and stored on the user's computer by the user's web browser while the user is browsing. Cookies were designed to be a reliable mechanism for websites to remember stateful information or to record the user's browsing activity. They can also be used to remember arbitrary pieces of information that the user previously entered into form fields such as names, addresses, passwords, and credit card numbers.

Anonymous web browsing refers to the utilization of the World Wide Web that hides a user's personally identifiable information from websites visited. Anonymous web browsing can be achieved via proxy servers, virtual private networks and other anonymity programs such as Tor. These programs work by sending information through a series of routers in order to hide the source and destination of information. However, there is never a guarantee of anonymity with these servers. These programs are still susceptible to traffic analysis. Proxy servers, which have a central point of knowledge, are also susceptible to collection of data by authorities. Moreover, cookies, browser plugins, and other information can be used to uniquely identify a user even if they have hidden their IP address.

Pull coding or client pull is a style of network communication where the initial request for data originates from the client, and then is responded to by the server. The reverse is known as push technology, where the server pushes data to clients.

Mobile web analytics studies the behavior of mobile website visitors in a similar way to traditional web analytics. In a commercial context, mobile web analytics refers to the use of data collected as visitors access a website from a mobile phone. It helps to determine which aspects of the website work best for mobile traffic and which mobile marketing campaigns work best for the business, including mobile advertising, mobile search marketing, text campaigns, and desktop promotion of mobile sites and services.

Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website where unauthorized commands are transmitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser.

WebSub is an open protocol for distributed publish–subscribe communication on the Internet. Initially designed to extend the Atom protocols for data feeds, the protocol can be applied to any data type as long as it is accessible via HTTP. Its main purpose is to provide real-time notifications of changes, which improves upon the typical situation where a client periodically polls the feed server at some arbitrary interval. In this way, WebSub provides pushed HTTP notifications without requiring clients to spend resources on polling for changes.

A web beacon is one of various techniques used on web pages and email, to unobtrusively allow checking that a user has accessed some content. Web beacons are typically used by third parties to monitor the activity of users at a website for the purpose of web analytics or page tagging. They can also be used for email tracking. When implemented using JavaScript, they may be called JavaScript tags.

References