Jump to content

URI scheme

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Hex (talk | contribs) at 22:30, 23 August 2015 (Official IANA-registered schemes: Removing this entire nightmare of a section - see talk.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A URI scheme is the top level of the uniform resource identifier (URI) naming structure in computer networking. All URIs and absolute URI references are formed with a scheme name, followed by a colon character (":"), and the remainder of the URI called (in the outdated RFCs 1738 and 2396, but not the current STD 66/RFC 3986) the scheme-specific part. The syntax and semantics of the scheme-specific part are left largely to the specifications governing individual schemes, subject to certain constraints such as reserved characters and how to "escape" them.

URI schemes are frequently and incorrectly referred to as "protocols", or specifically as URI protocols or URL protocols, since most were originally designed to be used with a particular protocol, and often have the same name. The http scheme, for instance, is generally used for interacting with web resources using HyperText Transfer Protocol. Today, URIs with that scheme are also used for other purposes, such as RDF resource identifiers and XML namespaces, that are not related to the protocol. Furthermore, some URI schemes are not associated with any specific protocol (e.g. "file") and many others do not use the name of a protocol as their prefix (e.g. "news").

URI schemes should be registered with IANA, although non-registered schemes are used in practice. RFC 7595 describes the procedures for registering new URI schemes.

Generic syntax

Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:

<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]

The scheme name consists of a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus ("+"), period ("."), or hyphen ("-"). Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. The scheme name is followed by a colon (":").

The hierarchical part of the URI is intended to hold identification information hierarchical in nature. If this part begins with a double forward slash ("//"), it is followed by an authority part and a path. If the hierarchical part doesn't begin with ("//") it contains only a path.

  • The authority part holds an optional user-information part, terminated with "@" (e.g. username:password@); a hostname (e.g., domain name or IP address); and an optional port number, preceded by a colon ":".
  • The path part, if present, may optionally begin with a single forward slash ("/"). It may not begin with two slash characters ("//"). The path is a sequence of segments (conceptually similar to directories, though not necessarily representing them) separated by a forward slash ("/"). Historically, each segment was specified to contain parameters separated from it using a semicolon (";"), though this was rarely used in practice and current specifications allow but no longer specify such semantics.

The query is an optional part, separated by a question mark ("?"), that contains additional identification information that is not hierarchical in nature. The syntax of the query string is not well defined; however by convention it is most often a sequence of <key>=<value> pairs separated by a semicolon[1][2][3] or an ampersand. For example:

Semicolon: key1=value1;key2=value2;key3=value3
Ampersand: key1=value1&key2=value2&key3=value3

The fragment is an optional part separated from the front parts by a hash ("#"). It holds additional identifying information that provides direction to a secondary resource, e.g., a section heading (in an article) identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id attribute of a specific element and web browsers will make sure this element is visible.

Examples

The following figure displays two example URIs (foo://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose and urn:example:animal:ferret:nose) and their component parts. (The examples are derived from RFC 3986 — STD 66, chapter 3).

  foo://username:password@example.com:8042/over/there/index.dtb?type=animal&name=narwhal#nose
  \_/   \_______________/ \_________/ \__/            \___/ \_/ \______________________/ \__/
   |           |               |       |                |    |            |                |
   |       userinfo           host    port              |    |          query          fragment
   |    \________________________________/\_____________|____|/ \__/        \__/
 scheme                 |                          |    |    |    |          |
  name              authority                      |    |    |    |          |
   |                                             path   |    |    interpretable as keys
   |                                                    |    |
   |    \_______________________________________________|____|/       \____/     \_____/
   |                         |                          |    |          |           |
 scheme              hierarchical part                  |    |    interpretable as values
  name                                                  |    |
   |            path               interpretable as filename |
   |   ___________|____________                              |
  / \ /                        \                             |
  urn:example:animal:ferret:nose               interpretable as extension

                path
         _________|________
 scheme /                  \
  name  userinfo  hostname       query
  _|__   ___|__   ____|____   _____|_____
 /    \ /      \ /         \ /           \
 mailto:username@example.com?subject=Topic

Unofficial but common URI schemes

Scheme Purpose Defined by General format Notes
app URL scheme can be used by packaged applications to obtain resources that are inside a container. Google app://<application>/<path>

example: app://com.foo.bar/index.html

See more information on: [1] [2]
doi Digital object identifier, a digital identifier for any object of intellectual property. IETF Draft doi:10.<publisher number>/<suffix>
Example: doi:10.1000/182
Used e.g. for most scientific publications.
Can be resolved via HTTP (transformed into a URL) by prepending http://dx.doi.org/ or http://hdl.handle.net/ in front.
javascript Execute JavaScript code IETF Draft javascript:<javascript to execute>
jdbc Connect a database with Java Database Connectivity technology. Database vendor dependent jdbc:somejdbcvendor:other_data...

jdbc:oracle:oci:@host:port(sid or [/service])?params...

jdbc:sqlserver://serverName\instanceName:portNumber;params...

jdbc:mysql://host:port/database?params...

Requires a vendor provided connector (jar archive) to be included in the client library.
odbc Open Database Connectivity IETF Draft
stratum Connectivity URI for the Stratum protocol, used for proof-of-work coordination in pooled cryptocurrency mining. Stratum Protocol Draft stratum+tcp://server:port, stratum+udp://server:port This protocol has completely superseded the now-obsolete Getwork protocol,[4] and was created primarily to reduce network overhead as mining pool sizes inevitably scale upwards.[5]
vnc Virtual Network Computing IETF Draft vnc://[<host>[:<port>]][?<params>]
web+... Effectively namespaces web-based protocols from other, potentially less web-secure, protocols. This convention is defined within the HTML Living Standard specification web+<string of some lower-case alphabetic characters>: This convention is not associated with the registration of any new scheme but is currently a requirement as well as convention for non-whitelisted web-based protocols.

References

  1. ^ RFC 1866 section 8.2.1 : by Tim Berners-Lee in 1995 encourages CGI authors to support ';' in addition to '&'.
  2. ^ HTML 4.01 Specification: Implementation, and Design Notes: "CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner."
  3. ^ Hypertext Markup Language - 2.0 "CGI implementors are encouraged to support the use of ';' in place of '&' "
  4. ^ Stratum, Stratum Protocol
  5. ^ Stratum mining protocol, ..the official documentation of lightweight bitcoin mining protocol.