Module:Neturl
| This module is rated as ready for general use. It has reached a mature state, is considered relatively stable and bug-free, and may be used wherever appropriate. It can be mentioned on help pages and other Wikipedia resources as an option for new users. To minimise server load and avoid disruptive output, improvements should be developed through sandbox testing rather than repeated trial-and-error editing. |
Module:Neturl contains functions to parse a URL with a querystring and build a new URL.
How to call this module from another module:
url = require "Module:Neturl"
-- or if you'd prefer not to use the syntactic sugar
url = require("Module:Neturl")
-- call a function
result = url.functionName(argument)
The functions, their respective arguments and expected results can be seen in the table that follows. Examples for each function given below the table.
| Function | Arguments | What to put in them | Example |
|---|---|---|---|
url.parse() |
(url_string) |
A URL string to be parsed | u = url.parse("http://www.example.com/test/?start=10")
print(u.scheme) -- http
print(u.host) -- www.example.com
print(u.path) -- /test/
|
:normalize() |
none (called on parsed URL object) | Operates on an existing url object |
u = url.parse("http://www.FOO.com:80///foo/../foo/./bar"):normalize()
print(u) -- http://www.foo.com/foo/bar
|
:resolve() |
(relative_url) |
Relative path/URL string to resolve against base | u = url.parse("http://a/b/c/d;p?q"):resolve("../../g")
print(u) -- http://a/g
|
__div operator or u.addSegment() |
('segment') |
Path segment(s) to append | u = url.parse("http://example.com")
u / "bands" / "AC/DC"
print(u) -- http://example.com/bands/AC%2FDC
|
Module options (url.options) |
key-value pairs | Adjust parsing/encoding behaviour (e.g. separator, legal chars) | url = require "net.url"
url.options.legal_in_path["+"] = true
|
url.parseQuery() |
(query_string) |
Query string to parse into a Lua table | query = url.parseQuery("first=abc&a[]=123&a[]=false")
print(query.a[1]) -- 123
|
tostring(query) |
none (called on query table) | Converts parsed query table into a querystring | query = url.parseQuery("first=abc&a[]=123&a[]=false")
print(query) -- a[1]=123&a[2]=false&first=abc
|
:setQuery() |
({ table }) |
Lua table of key/value pairs to become query params | u = url.parse("http://www.example.com")
u:setQuery{ json = true, skip = 100 }
print(u) -- http://www.example.com/?json=true&skip=100
|
Direct assignment to u.query |
u.query.key = value |
Set query parameters directly via object property | u = url.parse("http://www.example.com")
u.query.foo = "bar"
print(u) -- http://www.example.com/?foo=bar
|
URL parser
[edit]The library converts an URL to a table of the elements as described in RFC : scheme, host, path, etc.
u = url.parse("http://www.example.com/test/?start=10")
print(u.scheme)
-- http
print(u.host)
-- www.example.com
print(u.path)
-- /test/
URL normalization
[edit]u = url.parse("http://www.FOO.com:80///foo/../foo/./bar"):normalize()
print(u)
-- http://www.foo.com/foo/bar
URL resolver
[edit]URL resolution follows the examples provided in the [RFC 2396](http://tools.ietf.org/html/rfc2396#appendix-C).
u = url.parse("http://a/b/c/d;p?q"):resolve("../../g")
print(u)
-- http://a/g
Path builder
[edit]Path segments can be added using the __div metatable or u.addSegment().
u = url.parse('http://example.com')
u / 'bands' / 'AC/DC'
print(u)
-- http://example.com/bands/AC%2FDC
Module Options
[edit]- separator is used to specify which separator is used between query parameters. It is & by default.
- cumulative_parameters is false by default. If true, query parameters with the same name will be stored in a table.
- legal_in_path is a table of characters that will not be url encoded in path components.
- legal_in_query is a table of characters that will not be url encoded in query values. Query parameters on the other hand only support a small set of legal characters (-_.).
- query_plus_is_space is true by default, so a plus sign in a query value will be converted to %20 (space), not %2B (plus).
If one wants to have the + sign as is in path segments, one can add it to the list of
legal characters in path. For example:
url = require "net.url"
url.options.legal_in_path["+"] = true;
Querystring parser
[edit]The library supports brackets in querystrings, like PHP. It means you can use brackets to build multi-dimensional tables. The parsed querystring has a tostring() helper. As usual with Lua, if no index is specified, it starts from index 1.
query = url.parseQuery("first=abc&a[]=123&a[]=false&b[]=str&c[]=3.5&a[]=last")
print(query)
-- a[1]=123&a[2]=false&a[3]=last&b[1]=str&c[1]=3.5&first=abc
print(query.a[1])
-- 123
Querystring builder
[edit]u = url.parse("http://www.example.com")
u.query.foo = "bar"
print(u)
-- http://www.example.com/?foo=bar
u:setQuery{ json = true, skip = 100 }
print(u)
-- http://www.example.com/?json=true&skip=100
Differences with htmlsocket/url.html
[edit]- htmlsocket/url.html can't parse http://www.example.com?url=net correctly because there are no path.
- htmlsocket/url.html can't clean and normalize url, for example by removing default port, extra zero in port, empty authority, uppercase scheme, domain name.
- htmlsocket/url.html doesn't parse the query string parameters.
- htmlsocket/url.html is less compliant with RFC 2396 and will resolve
http://a/b/c/d;p?qand :
../../../gtohttp://aginstead ofhttp://a/g../../../../gtohttp://a../ginstead ofhttp://a/gg;x=1/../ytohttp://a/b/c/g;x=1/../yinstead ofhttp://a/b/c/y/./gtohttp://a/./ginstead ofhttp://a/gg;x=1/./ytohttp://a/b/c/g;x=1/./yinstead ofhttp://a/b/c/g;x=1/y
Usage
[edit]Usage from a template: Use the same functions and arguments as explained above for modules.
{{#invoke:Neturl|function_name|arguments}}
License
[edit]Copyright © 2011-2023 Bertrand Mansion
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
The Software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the Software or the use or other dealings in the Software.
-- net/url.lua - a robust url parser and builder
--
-- @module net.url
-- @alias M
-- @alias [[en:Module:Neturl]]
-- @license MIT License Copyright 2011-2024.
-- @author Bertrand Mansion
-- @url [git://github.com/golgote/neturl.git]
local M = {}
M.version = "1.2"
--- url options
-- - `separator` is set to `&` by default but could be anything like `&` or `;`
-- - `cumulative_parameters` is false by default. If true, query parameters with the same name will be stored in a table.
-- - `legal_in_path` is a table of characters that will not be url encoded in path components
-- - `legal_in_query` is a table of characters that will not be url encoded in query values. Query parameters only support a small set of legal characters (-_.).
-- - `query_plus_is_space` is true by default, so a plus sign in a query value will be converted to %20 (space), not %2B (plus)
-- @todo Add option to limit the size of the argument table
-- @todo Add option to limit the depth of the argument table
-- @todo Add option to process dots in parameter names, ie. `param.filter=1`
M.options = {
separator = '&',
cumulative_parameters = false,
legal_in_path = {
[":"] = true, ["-"] = true, ["_"] = true, ["."] = true,
["!"] = true, ["~"] = true, ["*"] = true, ["'"] = true,
["("] = true, [")"] = true, ["@"] = true, ["&"] = true,
["="] = true, ["$"] = true, [","] = true,
[";"] = true
},
legal_in_query = {
[":"] = true, ["-"] = true, ["_"] = true, ["."] = true,
[","] = true, ["!"] = true, ["~"] = true, ["*"] = true,
["'"] = true, [";"] = true, ["("] = true, [")"] = true,
["@"] = true, ["$"] = true,
},
query_plus_is_space = true
}
--- List of known and common scheme ports.
-- As documented in <a href="http://www.iana.org/assignments/uri-schemes.html">IANA URI scheme list</a>
---@type table{string: integer}
M.services = {
acap = 674,
cap = 1026,
dict = 2628,
ftp = 21,
gopher = 70,
http = 80,
https = 443,
iax = 4569,
icap = 1344,
imap = 143,
ipp = 631,
ldap = 389,
mtqp = 1038,
mupdate = 3905,
news = 2009,
nfs = 2049,
nntp = 119,
rtsp = 554,
sip = 5060,
snmp = 161,
telnet = 23,
tftp = 69,
vemmi = 575,
afs = 1483,
jms = 5673,
rsync = 873,
prospero = 191,
videotex = 516
}
local function decode(str)
return (str:gsub("%%(%x%x)", function(c)
return string.char(tonumber(c, 16))
end))
end
local function encode(str, legal)
return (str:gsub("([^%w])", function(v)
if legal[v] then
return v
end
return string.upper(string.format("%%%02x", string.byte(v)))
end))
end
-- For query values, + can mean space if configured as such
local function decodeValue(str)
if M.options.query_plus_is_space then
str = str:gsub('+', ' ')
end
return decode(str)
end
local function concat(a, b)
if type(a) == 'table' then
return a:build() .. b
else
return a .. b:build()
end
end
function M:addSegment(path)
if type(path) == 'string' then
self.path = self.path .. '/' .. encode(path:gsub("^/+", ""), M.options.legal_in_path)
end
return self
end
--- Builds the Url
---
---@return string @built Url
function M:build()
local url = ''
if self.path then
local path = self.path
url = url .. tostring(path)
end
if self.query then
local qstring = tostring(self.query)
if qstring ~= "" then
url = url .. '?' .. qstring
end
end
if self.host then
local authority = self.host
if self.port and self.scheme and M.services[self.scheme] ~= self.port then
authority = authority .. ':' .. self.port
end
local userinfo
if self.user and self.user ~= "" then
userinfo = self.user
if self.password then
userinfo = userinfo .. ':' .. self.password
end
end
if userinfo and userinfo ~= "" then
authority = userinfo .. '@' .. authority
end
if authority then
if url ~= "" then
url = '//' .. authority .. '/' .. url:gsub('^/+', '')
else
url = '//' .. authority
end
end
end
if self.scheme then
url = self.scheme .. ':' .. url
end
if self.fragment then
url = url .. '#' .. self.fragment
end
return url
end
--- Builds the querystring
---@param tab table The key/value parameters.
---@param sep? string The separator to use. (optional)
---@param key? any The parent key if the value is multi-dimensional. (optional)
---@return string string Built querystring.
function M.buildQuery(tab, sep, key)
local query = {}
if not sep then
sep = M.options.separator or '&'
end
local keys = {}
for k in pairs(tab) do
keys[#keys+1] = k
end
table.sort(keys, function (a, b)
local function padnum(n, rest) return ("%03d"..rest):format(tonumber(n)) end
return tostring(a):gsub("(%d+)(%.)",padnum) < tostring(b):gsub("(%d+)(%.)",padnum)
end)
for _,name in ipairs(keys) do
local value = tab[name]
name = encode(tostring(name), {["-"] = true, ["_"] = true, ["."] = true})
if key then
if M.options.cumulative_parameters and string.find(name, '^%d+$') then
name = tostring(key)
else
name = string.format('%s[%s]', tostring(key), tostring(name))
end
end
if type(value) == 'table' then
query[#query+1] = M.buildQuery(value, sep, name)
else
local value = encode(tostring(value), M.options.legal_in_query)
if value ~= "" then
query[#query+1] = string.format('%s=%s', name, value)
else
query[#query+1] = name
end
end
end
return table.concat(query, sep)
end
--- Parses the querystring to a table
---
---This function can parse multidimensional pairs and is mostly compatible
---with PHP usage of brackets in key names like `?param[key]=value`
---@param str string Querystring to parse
---@param sep?'&'|string Separator between key-value pairs, defaults to `&`
---@todo Limit the max number of parameters with M.options.max_parameters
---@return table values Query represented as key-value pairs
function M.parseQuery(str, sep)
if not sep then
sep = M.options.separator or '&'
end
local values = {}
for key,val in str:gmatch(string.format('([^%s=]+)(=*[^%s]*)', sep, sep)) do
local key = decodeValue(key)
local keys = {}
key = key:gsub('%[([^%]]*)%]', function(v)
-- extract keys between balanced brackets
if string.find(v, "^-?%d+$") then
v = tonumber(v)
else
v = decodeValue(v)
end
table.insert(keys, v)
return "="
end)
key = key:gsub('=+.*$', "")
key = key:gsub('%s', "_") -- remove spaces in parameter name
val = val:gsub('^=+', "")
if not values[key] then
values[key] = {}
end
if #keys > 0 and type(values[key]) ~= 'table' then
values[key] = {}
elseif #keys == 0 and type(values[key]) == 'table' then
values[key] = decodeValue(val)
elseif M.options.cumulative_parameters
and type(values[key]) == 'string' then
values[key] = { values[key] }
table.insert(values[key], decodeValue(val))
end
local t = values[key]
for i,k in ipairs(keys) do
if type(t) ~= 'table' then
t = {}
end
if k == "" then
k = #t+1
end
if not t[k] then
t[k] = {}
end
if i == #keys then
t[k] = val
end
t = t[k]
end
end
setmetatable(values, { __tostring = M.buildQuery })
return values
end
--- Set the Url query.
---
---@param query string|table<number, string> String to parse or a table of key-value pairs.
---@return string|table<number, string> query Output table of key-value pairs containing data.
function M:setQuery(query)
local query = query
if type(query) == 'table' then
query = M.buildQuery(query)
end
self.query = M.parseQuery(query)
return query
end
--- Set the authority part of the Url
---
---The authority is parsed to find the user, password, port and host if available.
---@param authority string Represents the authority.
---@return string remainder Parsed authority.
function M:setAuthority(authority)
self.authority = authority
self.port = nil
self.host = nil
self.userinfo = nil
self.user = nil
self.password = nil
authority = authority:gsub('^([^@]*)@', function(v)
self.userinfo = v
return ''
end)
authority = authority:gsub(':(%d+)$', function(v)
self.port = tonumber(v)
return ''
end)
local function getIP(str)
-- IPv4
local chunks = { str:match("^(%d+)%.(%d+)%.(%d+)%.(%d+)$") }
if #chunks == 4 then
for _, v in pairs(chunks) do
if tonumber(v) > 255 then
return false
end
end
return str
end
-- IPv6
local chunks = { str:match("^%["..(("([a-fA-F0-9]*):"):rep(8):gsub(":$","%%]$"))) }
if #chunks == 8 or #chunks < 8 and
str:match('::') and not str:gsub("::", "", 1):match('::') then
for _,v in pairs(chunks) do
if #v > 0 and tonumber(v, 16) > 65535 then
return false
end
end
return str
end
return nil
end
local ip = getIP(authority)
if ip then
self.host = ip
elseif type(ip) == 'nil' then
-- Domain
if authority ~= '' and not self.host then
local host = authority:lower()
if string.match(host, '^[%d%a%-%.]+$') ~= nil and
string.sub(host, 0, 1) ~= '.' and
string.sub(host, -1) ~= '.' and
string.find(host, '%.%.') == nil then
self.host = host
end
end
end
if self.userinfo then
local userinfo = self.userinfo
userinfo = userinfo:gsub(':([^:]*)$', function(v)
self.password = v
return ''
end)
if string.find(userinfo, "^[%w%+%.]+$") then
self.user = userinfo
else
-- incorrect userinfo
self.userinfo = nil
self.user = nil
self.password = nil
end
end
return authority
end
--- Parse the url into the designated parts.
---
---Depending on the url, the following parts can be available:
---scheme, userinfo, user, password, authority, host, port, path,
---query, fragment.
---@param url string
---@return table comp Different parts and a few other functions
function M.parse(url)
local comp = {}
M.setAuthority(comp, "")
M.setQuery(comp, "")
local url = tostring(url or '')
url = url:gsub('#(.*)$', function(v)
comp.fragment = v
return ''
end)
url =url:gsub('^([%w][%w%+%-%.]*)%:', function(v)
comp.scheme = v:lower()
return ''
end)
url = url:gsub('%?(.*)', function(v)
M.setQuery(comp, v)
return ''
end)
url = url:gsub('^//([^/]*)', function(v)
M.setAuthority(comp, v)
return ''
end)
comp.path = url:gsub("([^/]+)", function (s) return encode(decode(s), M.options.legal_in_path) end)
setmetatable(comp, {
__index = M,
__tostring = M.build,
__concat = concat,
__div = M.addSegment
})
return comp
end
--- Removes dots and slashes in urls when possible.
---
---This function will also remove multiple slashes
---@param dirtyPath string The string representing the path to clean.
---@return string cleanPath The path without unnecessary dots and segments.
function M.removeDotSegments(dirtyPath)
local path = dirtyPath or ''
local fields = {}
if string.len(path) == 0 then
return ""
end
local startslash = false
local endslash = false
if string.sub(path, 1, 1) == "/" then
startslash = true
end
if (string.len(path) > 1 or startslash == false) and string.sub(path, -1) == "/" then
endslash = true
end
path:gsub('[^/]+', function(c) table.insert(fields, c) end)
local new = {}
local j = 0
for i,c in ipairs(fields) do
if c == '..' then
if j > 0 then
j = j - 1
end
elseif c ~= "." then
j = j + 1
new[j] = c
end
end
local cleanPath = ""
if #new > 0 and j > 0 then
cleanPath = table.concat(new, '/', 1, j)
else
cleanPath = ""
end
if startslash then
cleanPath = '/'..cleanPath
end
if endslash then
cleanPath = cleanPath..'/'
end
return cleanPath
end
local function reducePath(base_path, relative_path)
if string.sub(relative_path, 1, 1) == "/" then
return '/' .. string.gsub(relative_path, '^[%./]+', '')
end
local path = base_path
local startslash = string.sub(path, 1, 1) ~= "/";
if relative_path ~= "" then
path = (startslash and '' or '/') .. path:gsub("[^/]*$", "")
end
path = path .. relative_path
path = path:gsub("([^/]*%./)", function (s)
if s ~= "./" then return s else return "" end
end)
path = string.gsub(path, "/%.$", "/")
local reduced
while reduced ~= path do
reduced = path
path = string.gsub(reduced, "([^/]*/%.%./)", function (s)
if s ~= "../../" then return "" else return s end
end)
end
path = string.gsub(path, "([^/]*/%.%.?)$", function (s)
if s ~= "../.." then return "" else return s end
end)
local reduced
while reduced ~= path do
reduced = path
path = string.gsub(reduced, '^/?%.%./', '')
end
return (startslash and '' or '/') .. path
end
--- Builds a new url by using the one given as parameter and resolving paths.
---
---@param newUrl string|table String or table representing a Url.
---@return table newUrl Url table.
function M:resolve(newUrl)
if type(self) == "string" then
self = M.parse(self)
end
if type(newUrl) == "string" then
newUrl = M.parse(newUrl)
end
if newUrl.scheme then
return newUrl
else
newUrl.scheme = self.scheme
if not newUrl.authority or newUrl.authority == "" then
newUrl:setAuthority(self.authority)
if not newUrl.path or newUrl.path == "" then
newUrl.path = self.path
local query = newUrl.query
if not query or not next(query) then
newUrl.query = self.query
end
else
newUrl.path = reducePath(self.path, newUrl.path)
end
end
return newUrl
end
end
--- Normalize a Url path.
---
--- Following some common normalization rules
---described on <a href="http://en.wikipedia.org/wiki/URL_normalization">the URL normalization page of Wikipedia</a>.
---@param self table {string}
---@return table self {path: string} Normalized path.
function M:normalize()
if type(self) == 'string' then
self = M.parse(self)
end
if self.path then
local path = self.path
path = reducePath(path, "")
-- normalize multiple slashes
path = string.gsub(path, "//+", "/")
self.path = path
end
return self
end
return M