Filename extension | .zip |
---|---|
Developed by | |
Initial release | 27 September 2006 |
Type of format | Transit schedule format |
Extended from | CSV |
Standard | De facto standard |
Open format? | Yes, CC BY 3.0 |
Website | gtfs |
GTFS or the General Transit Feed Specification defines a common data format for public transportation schedules and associated geographic information. [1] GTFS contains only static or scheduled information about public transport services, and is sometimes known as GTFS Static or GTFS Schedule to distinguish it from the GTFS Realtime extension, which defines how information on the realtime status of services can be shared. [1] [2]
What was to become GTFS started out as a side project of Google employee Chris Harrelson in 2005, who "monkeyed around with ways to incorporate transit data into Google Maps when he heard from Tim and Bibiana McHugh, married IT managers at TriMet, the transit agency for Portland, Oregon". [3] McHugh is cited with being frustrated about finding transit directions in unfamiliar cities, while popular mapping services were already offering easy-to-use driving directions at the time. [4]
Bibiana and Tim McHugh eventually got into contact with Google and provided the company with CSV exports of TriMet's schedule data. In December 2005, Portland became the first city to be featured in the first version of Google's "Transit Trip Planner". [5] In September 2006, five more US cities were added to the Google Transit Trip Planner, and the data format released as the Google Transit Feed Specification. [6]
In the United States, there had not been any standard for public transit timetables prior to the advent of GTFS, not even a de facto standard. According to long-time BART website manager Timothy Moore, before the advent of GTFS, BART had to provide different data consumers with different formats, making a standardized transit format very desirable. [3] The publicly and freely available format specification, as well as the availability of GTFS schedules, quickly made developers base their transit-related software on the format. This resulted in "hundreds of useful and popular transit applications" [4] as well as catalogues listing available GTFS feeds. Due to the common data format those applications adhere to, solutions do not need to be custom-tailored to one transit operator, but can easily be extended to any region where a GTFS feed is available.
Due to the wide use of the format, the "Google" part of the original name was seen as a misnomer "that makes some potential users shy away from adopting GTFS". As a consequence, it was proposed to change the name of the specification to General Transit Feed Specification in 2009. [7]
GTFS is typically used to supply data on public transit for use in multi-modal journey planner applications. In most cases, GTFS is combined with a detailed representation of the street/pedestrian network to allow routing to take place from point to point rather than just between stops. This data is often extended using GTFS-Realtime to factor delays, cancellations, and modified trips into realtime journey planning queries. OpenTripPlanner is open-source software that can do journey planning with a combination of GTFS and OpenStreetMap data. [8] Other general purpose applications exist such as the ArcMap Network Analyst extension which can incorporate GTFS for transit routing. [9]
GTFS was originally designed for use in Google Transit, an online multi-modal journey planning application.
GTFS is often used in research on transit accessibility where it is typically used to estimate travel times by transit from one point to many other points at different times of day. [10] [11] Studies however have called such applications into question due to their reliance on schedules alone without accounting for reliability issues and regular schedule non-adherence. [12]
GTFS has been used to measure changes in accessibility due to changes in transit service provision, either actual [13] or proposed. [14] Analysis of changes in service over time can be accomplished by simply comparing published GTFS data for the same agency from different time periods. For comparison of existing service with proposed infrastructure or service changes, a future GTFS must often be constructed by hand based on proposed service characteristics. [14]
Public GTFS feeds have been aggregated in a variety of feed registries:
A GTFS feed is a collection of at least six, and up to 13 CSV files (with extension .txt) contained within a .zip file. Preferred character encoding is UTF-8. Together, the related CSV tables describe a transit system's scheduled operations as visible to riders. The specification is designed to be sufficient to provide trip planning functionality, but is also useful for other applications such as analysis of service levels and some general performance measures. In contrast to European transit industry exchange standards such as Transmodel or VDV-45X, GTFS only includes scheduled operations that are meant to be distributed to riders. It is also limited to scheduled information and does not include real-time information. However, real-time information can be related to GTFS schedules according to the related GTFS Realtime specification. [1] [2]
Following are descriptions of the tables required for a valid GTFS data feed. Each table is literally a text CSV file whose filename is the name of the table, suffixed by '.txt'. So for the 'agency' table below, a CSV file called 'agency.txt' would be included in a valid GTFS feed.
The agency table provides information about the transit agency as such, including name, website and contact information.
Required fields:
The routes table identifies distinct routes. This is to be distinguished from distinct routings (or paths), several of which may belong to a single route.
Required fields:
Required fields:
Optional fields:
Required fields:
Note that dwell time may be modelled by the difference between the arrival and departure times. However, many agencies do not seem to model dwell time for most stops.[ original research? ]
The stops table defines the geographic locations of each and every actual stop or station in the transit system as well as, and optionally, some of the amenities associated with those stops.
Required fields:
The calendar table defines service patterns that operate recurrently such as, for example, every weekday. Service patterns that don't repeat such as for a one-time special event will be defined in the calendar_dates table.
Required fields:
Calendar dates is an optional table which adds exceptions to the calendar.txt file. This can be adding additional days or removing days, such as for holiday service. The file only contains three columns, the service id, date, and exception type (either added or removed). A service id does not have to be inside the calendar.txt file to be added to this table.
Rules for drawing lines on a map to represent a transit organization's routes.
This table specifies headway (time between trips) for routes with variable frequency of service.
Rules for making connections at transfer points between routes.
An optional feed start date and optional feed expiration date can be set. Agencies may publish feeds that are several days into the future. Thus, journey planning software applications keep multiple feed versions and the correct feed for a particular day or time.
translations
The translations table consists of the columns table_name, field_name, field_value,record_id,record_sub_id,language,translation. Translations are broken down into their respective tables, and any text field or URL may be translated. Translations in GTFS use two types of keys in the key-value table. Record_id uses ID for the field like stop_id or trip_id, while field_value is a matching value to the field_name's original contents. Tables using a two value tuple, such as stop_times, use record_id and record_sub_id to represent the tuple. The translation column is the output.
ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.
In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.
A bus stop is a place where buses stop for passengers to get on and off the bus. The construction of bus stops tends to reflect the level of usage, where stops at busy locations may have shelters, seating, and possibly electronic passenger information systems; less busy stops may use a simple pole and flag to mark the location. Bus stops are, in some locations, clustered together into transport hubs allowing interchange between routes from nearby stops and with other public transport modes to maximise convenience.
The New York City Transit Authority is a public-benefit corporation in the U.S. state of New York that operates public transportation in New York City. Part of the Metropolitan Transportation Authority, the busiest and largest transit system in North America, the NYCTA has a daily ridership of 8 million trips.
Comma-separated values (CSV) is a text file format that uses commas to separate values, and newlines to separate records. A CSV file stores tabular data in plain text, where each line of the file typically represents one data record. Each record consists of the same number of fields, and these are separated by commas in the CSV file. If the field delimiter itself may appear within a field, fields can be surrounded with quotation marks.
A flat-file database is a database stored in a file called a flat file. Records follow a uniform format, and there are no structures for indexing or recognizing relationships between records. The file is simple. A flat file can be a plain text file, or a binary file. Relationships can be inferred from the data in the database, but the database format itself does not make those relationships explicit.
A timing point, time point or timepoint is a public transit stop that a vehicle tries to reach at a scheduled time. A vehicle is not supposed to pass a timepoint until the schedule time has arrived. These stops are contrasted with all other stops on a scheduled route, for which the transit agency does not explicitly schedule an arrival/departure time. These other stops occur between timepoint stops, so their scheduled times are implicitly between those of the timepoints though not explicitly defined. At minimum, it allows regular passengers to estimate when a bus would get to a stop before or after a timepoint.
In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.
A point of interest (POI) is a specific point location that someone may find useful or interesting. An example is a point on the Earth representing the location of the Eiffel Tower, or a point on Mars representing the location of its highest mountain, Olympus Mons. Most consumers use the term when referring to hotels, campsites, fuel stations or any other categories used in modern automotive navigation systems.
The Delaware Transit Corporation, operating as DART First State, is the only public transportation system that operates throughout the U.S. state of Delaware. DART First State provides local and inter-county bus service throughout the state and also funds commuter rail service along SEPTA Regional Rail's Wilmington/Newark Line serving the northern part of the state. The agency also operates statewide paratransit service for people with disabilities. DART First State is a subsidiary of the Delaware Department of Transportation (DelDOT).
The Transport Direct Programme was a division of the UK Department for Transport (DfT) to develop standards, data and better information technology systems to support public transport. It developed and operates the Transport Direct Portal which is a public facing multi-modal journey planner. It also supports the creation and management of comprehensive databases of all public transport movements in the United Kingdom with Traveline. During 2010 two key datasets were released as Open Data and published on www.data.gov.uk.
The Southwest Ohio Regional Transit Authority (SORTA) is the public transport agency serving Cincinnati and its Ohio suburbs. SORTA operates Metro fixed-route buses, bus rapid transit, microtransit, and paratransit services. SORTA's headquarters are located at the Huntington Building in Cincinnati’s Central Business District. The agency is managed by CEO and General Manager Darryl Haley along with a 13-member board of trustees. In 2023, the system had a ridership of 13,091,500, or about 42,900 per weekday as of the second quarter of 2024.
GoRaleigh is the transit system responsible for operating most of the public transportation services in Raleigh, North Carolina. The system operates 27 fixed routes throughout the city's municipal area and also operates five regional/express routes in partnership with GoTriangle, the regional provider. GoRaleigh is contracted to operate two additional routes, an express route to the Wake Tech Community College campus south of Raleigh and a local circulator service in the Town of Wake Forest. Capital Area Transit, also known as CAT, was rebranded to GoRaleigh in 2015 under the consolidated GoTransit, a joint branding of municipal and regional transit systems for the Research Triangle. In 2023, the system had a ridership of 5,094,000, or about 14,500 per weekday as of the second quarter of 2024.
A passenger information system, or passenger information display system, is an automated system for supplying users of public transport with information about the nature and the state of a public transport service through visual, voice or other media. It is also known as a customer information system or an operational information system. Among the information provided by such systems, a distinction can be drawn between:
A public transport timetable is a document setting out information on public transport service times. Both public timetables to assist passengers with planning a trip and internal timetables to inform employees exist. Typically, the timetable will list the times when a service is scheduled to arrive at and depart from specified locations. It may show all movements at a particular location or all movements on a particular route or for a particular stop. Traditionally this information was provided in printed form, for example as a leaflet or poster. It is now also often available in a variety of electronic formats.
TransXChange is a UK national XML based data standard for the interchange of bus route and timetable information between bus operators, the Vehicle and Operator Services Agency, local authorities and passenger transport executives, and others involved in the provision of passenger information.
The Standard Interface for Real-time Information or SIRI is an XML protocol to allow distributed computers to exchange real-time information about public transport services and vehicles.
A journey planner, trip planner, or route planner is a specialized search engine used to find an optimal means of travelling between two or more given locations, sometimes using more than one transport mode. Searches may be optimized on different criteria, for example fastest, shortest, fewest changes, cheapest. They may be constrained, for example, to leave or arrive at a certain time, to avoid certain waypoints, etc. A single journey may use a sequence of several modes of transport, meaning the system may know about public transport services as well as transport networks for private transportation. Trip planning or journey planning is sometimes distinguished from route planning, which is typically thought of as using private modes of transportation such as cycling, driving, or walking, normally using a single mode at a time. Trip or journey planning, in contrast, would make use of at least one public transport mode which operates according to published schedules; given that public transport services only depart at specific times, an algorithm must therefore not only find a path to a destination, but seek to optimize it so as to minimize the waiting time incurred for each leg. In European Standards such as Transmodel, trip planning is used specifically to describe the planning of a route for a passenger, to avoid confusion with the completely separate process of planning the operational journeys to be made by public transport vehicles on which such trips are made.
Google Fusion Tables was a web service provided by Google for data management. Fusion tables was used for gathering, visualising and sharing data tables. Data are stored in multiple tables that Internet users can view and download.
GTFS Realtime is an extension to GTFS, in which public transport agencies share real-time vehicle locations, arrival time predictions, and alerts such as detours and cancellations via Protocol Buffers web server.
This article contains excerpts from "Opening Public Transit Data in Germany" by Stefan Kaufmann, which is available under a Creative Commons Attribution 3.0 unported license.