Module talk:IP
Plan
A discussion at WP:Lua (permalink) led to the creation of this module with a plan for a general-purpose IP address handler. I pointed out that Module:IPblock (which implements {{blockcalc}}) includes functions that can extract IPv4 and IPv6 IPs from text arguments, and convert IPs for text display, and can do various calculations including determining the best available CIDR block summary of given IPs.
Apart from the advantage of having a general-purpose IP handler, the immediate requirement was for a module that could generate the list at {{Sensitive IP addresses}} and output identical information for use by MediaWiki:Group-sysop.js.
I am contemplating how functions from Module:IPblock might be used here, and these are my preliminary thoughts. General purpose handling of CIDR blocks is tricky and I built in a restriction to make implementation easier. The module accepts individual IPs (IPv4 and IPv6), and CIDR blocks (IPv4 only, and only of form IP/n where n is 16 to 32 inclusive—that is, 65,536 addresses or fewer). The module puts the IPs in two lists, one for v4 and one for v6, then processes each list. Each list is sorted and duplicate IPs are removed (that's helpful for the user and necessary for the method used to create summaries). Then summaries are generated and the results displayed in a table.
A key point is that each CIDR block is expanded to individual IPs: for example, 1.2.3.224/27 would be expanded to 32 individual addresses (1.2.3.4.224 to 1.2.3.4.255 inclusive). That is done because the method used to create summaries needs a list of individual IPs.
It would be possible to work with CIDR blocks without expanding them, but I suspect the code complexity would blow out, and it's already pretty fierce.
The outline in Module:IP is great, and I initially contemplated something like that, but pragmatism overtook me and I did what was needed for {{blockcalc}}. A more general module would be good, but some of the proposed methods may be difficult to implement. I'm thinking of IPBlock removeIPs—if the input is 1.2.3.224/27 and 1.2.3.243 is removed, the result is 31 IPs which can be summarized as four CIDR blocks and an IP, as shown by blockcalc (this is the result of passing the 31 IPs to {{blockcalc}}):
Sorted 31 IPv4 addresses:
- 1.2.3.224 – 1.2.3.242
- 1.2.3.244 – 1.2.3.255
Total affected |
Affected addresses |
Given addresses |
Range | Contribs |
---|---|---|---|---|
32 | 32 | 31 | 1.2.3.224/27 | contribs |
31 | 16 | 16 | 1.2.3.224/28 | contribs |
2 | 2 | 1.2.3.240/31 | contribs | |
1 | 1 | 1.2.3.242 | contribs | |
4 | 4 | 1.2.3.244/30 | contribs | |
8 | 8 | 1.2.3.248/29 | contribs |
The module would either have to keep a list of each individual IP (what Module:IPblock does), or maintain a list of blocks—possible but awkward.
Before starting the implementation, can we discuss the plan. Are addIPs and removeIPs wanted? For any number of IPs?
My pragmatic side also wonders whether the implementation effort would be worthwhile—would it actually be used? The requirement for sensitive IPs probably does not need the generality, and it might not need a module that can process IP addresses at all (although I haven't looked at what is needed for the js output). Johnuniq (talk) 11:03, 14 July 2016 (UTC)
- @Johnuniq: Thank you for taking the time to look at this. In terms of functionality for the JavaScript side of things, we need to know the following:
- Whether the IP an admin is about to block is in a sensitive range
- Whether the IP range that an admin is about to block overlaps with a sensitive range
- A description of the sensitive range
- A list of CIDR blocks in the sensitive range
- Only the first two need to involve the library, I think - we would need to know whether an IP is in a given range (that is IPBlock's obj:containsIP), and whether two IP blocks overlap with each other. The rest can be done by Module:Sensitive IP addresses.
Thank you also for the demonstration of {{blockcalc}}'s functionality. That is even more impressive than I first thought! I am in two minds about the direction we should go in now that I know more about what would be involved in moving functionality from Module:IPblock to this library.
- Option 1: We keep the library small and only implement the functionality that is shared between the different modules. The IPBlock class does not store individual IPs, but only the prefix and bit length.
- Pros: The code will be relatively simple, and can be tested and deployed quickly.
- Cons: Not that much functionality is shared - possibly only the ability to parse IPv4 and IPv6 strings. Integrating the library into Module:IPblock may be more trouble than it's worth.
- Option 2: We integrate more functionality from Module:IPblock into the library. Instead of one IPBlock class, we have two classes: IPBlock and Subnet. The IPBlock class stores individual IPs which can be added and removed, and can take a maximum of 65,536 addresses. The Subnet class is like the IPBlock class from option 1 - it only stores prefix and bit length.
- Pros: It will be easier to integrate into Module:IPblock, and it wins a lot of geek credit.
- Cons: The codebase will be much more complicated. Although much of the code exists in Module:IPblock already, moving it may take some serious development effort, and testing/deployment may be slow.
- Option 1: We keep the library small and only implement the functionality that is shared between the different modules. The IPBlock class does not store individual IPs, but only the prefix and bit length.
- After typing all of that out, I think the way forward has become clearer. At first I think we should implement the IPAddress and Subnet classes, after which I can write Module:Sensitive IP addresses, and then later if we deem it is necessary we can implement the IPBlock class. (Perhaps if someone wants to write another module that needs similar functionality.) Perhaps we should choose different names, as well - something like IPCollection may be better than IPBlock at indicating that objects will hold individual IPs and can be divided up into smaller blocks. I'll have a think about all of this and sketch out another outline for the interface maybe tomorrow. — Mr. Stradivarius ♪ talk ♪ 14:27, 14 July 2016 (UTC)
I added some functions to make IPAddress nearly complete. I just plonked them in with minimal changes to work with the outline—feel free to massage them for style. The following would work.
IPAddress = require('Module:IP').IPAddress ip1 = IPAddress.new(' 1.2.3.4 ') ip2 = IPAddress.new('12.34.56.78') ip3 = IPAddress.new(' 01:002:3:0004:: ') ip1 < ip2 -- true tostring(ip1) -- '1.2.3.4' tostring(ip3) -- '1:2:3:4::'
I'll probably be able to do most of the other stuff in the next two or three days. Please see the "notes" section near the top. Johnuniq (talk) 12:36, 15 July 2016 (UTC)
@Mr. Stradivarius: I changed getIP() to use the new tostring code. That means it returns the normalized string representation of the IP address which may be different from the text that the caller provided. I suspect that is a good idea, but the reason I did it was to make it easier to implement getNextIP and getPreviousIP. It's easier because a new IP can now be constructed using a given table of numbers. It is only possible for methods to do that—a client program does not have access to the IPAddress uniqueId variable and so cannot use the trick employed by getNextIP. Please let me know if there is a better way of integrating the new code.
Please see the TODO items. Fix the style now if wanted, or do it later. What I would like is some thoughts on the first TODO. The points there need to be fixed, and isInSubnet needs to be implemented when Subnet is fixed. Then IPAddress is complete...I think.
I saw Module:IP/testcases but I also created Module:Sandbox/Johnuniq/ip for temporary testing. See the results at Module talk:Sandbox/Johnuniq/ip. Johnuniq (talk) 07:41, 16 July 2016 (UTC)
- Groan, I need watching. Obviously clients do have access to IPAddress! I changed it to uniqueId which is definitely private. I haven't uploaded that yet, and there may be some clever way to avoid the kludge. Johnuniq (talk) 03:16, 17 July 2016 (UTC)
- Sorry for not replying sooner - it's been a busy couple of days. First off, I agree that getIP() should return the normalized string representation of the IP. I expect that keeping the output in a predictable format will be useful for reusers of the library, and if they need their original input string they can always keep it in a variable somewhere. As for getIPParts(), that is a bit ugly, yes. Ideally we shouldn't have to make implementation details like that public. I will have a think whether there is any better way of doing it, but I can't think of any right now. Moving the metatable definition inside IPAddress.new seems like it would be a good idea, but it has the drawback that mt.__eq would no longer work -
__eq requires that both objects being compared use the exact same metatable. Let me ponder this one a bit longer. I'll also have a go at making the style more uniform. — Mr. Stradivarius ♪ talk ♪ 12:15, 17 July 2016 (UTC)- Actually, I see I have that wrong. __eq requires that the function be the same function object for both objects being compared - the metatables themselves can be different. Let me think some more about this. — Mr. Stradivarius ♪ talk ♪ 13:17, 17 July 2016 (UTC)
- Ok, I found a way that works. If we use a unique key and set an __index metamethod inside IPAddress.new, then we can allow access to the data upvalue for any code that knows the unique key. Code that can't access the unique key can't get to the data table, even with pairs or getmetatable. Untrusted code could break the metamethods by setting a new metatable, but that still doesn't allow access to the data. — Mr. Stradivarius ♪ talk ♪ 13:55, 17 July 2016 (UTC)
- The new dataKey is very nice, although it will be a couple of hours before I can do serious thinking and work out exactly how it is being used. Have you finished working on the module for the moment? It is missing uniqueId (which some of the code still uses—possibly when I think about it I'll see what you planned). Also, getIPParts is still in the code and the fact it is no longer defined is probably the reason for the error currently showing at Module talk:Sandbox/Johnuniq/ip. Johnuniq (talk) 23:56, 17 July 2016 (UTC)
- Ok, I found a way that works. If we use a unique key and set an __index metamethod inside IPAddress.new, then we can allow access to the data upvalue for any code that knows the unique key. Code that can't access the unique key can't get to the data table, even with pairs or getmetatable. Untrusted code could break the metamethods by setting a new metatable, but that still doesn't allow access to the data. — Mr. Stradivarius ♪ talk ♪ 13:55, 17 July 2016 (UTC)
- Actually, I see I have that wrong. __eq requires that the function be the same function object for both objects being compared - the metatables themselves can be different. Let me think some more about this. — Mr. Stradivarius ♪ talk ♪ 13:17, 17 July 2016 (UTC)
- Sorry for not replying sooner - it's been a busy couple of days. First off, I agree that getIP() should return the normalized string representation of the IP. I expect that keeping the output in a predictable format will be useful for reusers of the library, and if they need their original input string they can always keep it in a variable somewhere. As for getIPParts(), that is a bit ugly, yes. Ideally we shouldn't have to make implementation details like that public. I will have a think whether there is any better way of doing it, but I can't think of any right now. Moving the metatable definition inside IPAddress.new seems like it would be a good idea, but it has the drawback that mt.__eq would no longer work -
Subnet
I'm wondering about addIPs(...) and addIPsFromString(s). Are those names from the original idea? I can't see how they relate to my understanding of Subnet. Also, what is "options" in Subnet.new(options)? I would have thought that Subnet.new(ip) was wanted, where ip is a CIDR string like "1.2.3.0/24". And addIPs and addIPsFromString would be removed? If what I've suggested is all that is needed for Subnet, I can probably get it working in a day or so. At some later time we could look at addIPsFromString and others. Johnuniq (talk) 09:47, 16 July 2016 (UTC)
- The idea behind addIPs(...) was that you could do something like
addIPs('1.2.3.0', '1.2.3.1', '1.2.3.255')
, and that the prefix and bit length would be recalculated to incorporate the new IPs if they were not in the subnet already. addIPsFromString(s) was going to be the same thing, but from a string, likeaddIPsFromString('1.2.3.0 1.2.3.1 1.2.3.255')
. addIPs(...) would also accept IPAddress objects as well as strings. Perhaps we don't really need this functionality, but then again perhaps implementing it wouldn't be all that much hassle. I'm in two minds about it. The "options" in Subnet.new(options) is a table of named arguments. My original idea was to allowSubnet.new('1.2.3.0/24')
,Subnet.new('1.2.3.0', '1.2.3.1', '1.2.3.255')
andSubnet.new('1.2.3.0 1.2.3.1 1.2.3.255')
. If you allow the latter, though, the module has to know whether you want IPv4 addresses or IPv6 addresses, in order to parse something likeSubnet.new('1.2.3.0 01:002:3:0004:: 1.2.3.255')
. So I thought that maybe we needed something likeSubnet.new{ IPs = '1.2.3.0 01:002:3:0004:: 1.2.3.255', version = 'IPv4' }
. Perhaps I am trying to make things too complicated, however. — Mr. Stradivarius ♪ talk ♪ 12:36, 17 July 2016 (UTC)- I just updated Module:IP and Module:Sandbox/Johnuniq/ip (with its results at Module talk:Sandbox/Johnuniq/ip). It's mostly done, but very rushed and I'll be checking/refactoring it. Thanks for the replies but I have to go and can't contemplate them now. Back later. Johnuniq (talk) 13:02, 17 July 2016 (UTC)
If the functions are needed, they can readily be implemented using the code from Module:IPblock which calculates the number of common prefix bits shared by a list of IPs. One issue is that these functions mean a subnet is mutable—I guess that's ok, but care is needed in the implementation to ensure that all required fields are set. Another possibility would be to have different constructors (don't know how to do that in a pure way!)—a subnet could be constructed from a CIDR string, or constructed from a list of IP strings, or constructed from the union of two subnets. Johnuniq (talk) 08:11, 18 July 2016 (UTC)
Internal IP representation
Something I've been wondering about the internal representation of the IPs - why did you make the bit length of each part 16 bits rather than 32? Lua's double precision floats should be able to precisely represent integers between about -2^53 and 2^53 (I forget the exact cutoff), so an IPv4 address should fit inside one Lua number, and an IPv6 address should fit inside a table of 4 Lua numbers. — Mr. Stradivarius ♪ talk ♪ 14:31, 17 July 2016 (UTC)
- I think side-by-side comparisons would be needed to see what was best in practice. However, I chose 16 bits partly for convenience because it makes working with IPv6 easier—a bunch of calculations would be needed to convert the internal representation to-and-from text if 32 bits were used. A second point which weighed more heavily was a hunch that the code emulating bit-wise operations is more efficient working 16 bits at a time. See copyPrefix and setHostBits. Also, the function common_length from Module:IPblock is used frequently to determine the number of prefix bits that two IPs have in common, and it is able to step towards the answer using 16 bits at a time—if the answer is, say 18 bits, it does a very simple calculation for the first 16 bits, then needs only to determine that the next 2 bits are common. I suspect it would be a little less efficient if it were working 32 bits at a time. Johnuniq (talk) 23:39, 17 July 2016 (UTC)
- Thanks for the explanation - that makes a lot of sense. If we really need to we could do some side-by-side comparisons, but my guess is that we won't need to. — Mr. Stradivarius ♪ talk ♪ 02:28, 18 July 2016 (UTC)
Memory efficiency
I've been testing how many IPAddress objects can fit into memory. Perhaps unsurprisingly, it turns out that you can fit a lot less objects in memory when you put the methods directly inside the constructor. In the version with the methods inside the constructor, I managed to fit 19,456 objects into memory before getting the Scribunto "not enough memory" error. When I moved the methods out of the constructor but still kept the data table private using __index, I managed to increase that number to 39,168. And when I moved the metatable out of the constructor and made the data public (with underscores), I managed to fit 65,536 objects in memory. I know that premature optimisation is the root of all evil, so I'm going to resist the temptation to make the data public and use underscores. However, I can't think of any major disadvantages of switching IPAddress to the second style. What do you think? — Mr. Stradivarius ♪ talk ♪ 02:21, 18 July 2016 (UTC)
- Actually, I've just thought of one - with the methods in the constructor we can use checkSelf to check whether callers are using the colon syntax correctly, but that isn't possible unless the object is available as an upvalue in the methods. — Mr. Stradivarius ♪ talk ♪ 02:32, 18 July 2016 (UTC)
- I'll need some time to digest the various trials you mentioned. It is a shame that upvalues apparently consume 36 bytes each. It looks like it is not possible to have purity + Lua + Scribunto memory limits. I am not recommending that you look at Module:Date but as a matter of interest it uses what seemed a reasonable idea, although it has very little purity. The Date function returns a simple date table with a metatable. I wanted a date to be immutable because it's hard to support changing a date with all its weird internals, yet it's easy to construct a date (if wanted, by varying another date). Therefore I used a trick to make the date object read-only (search for "readonly" to see it—the date information is in the newdate table). I re-used a sandbox module (Module:Sandbox/Johnuniq/testpre) to construct a table of 36,000 dates. Its Lua memory usage is 47.59 MB/50 MB (the results are on its talk).
- Module:Date has an is_date function which it uses instead of checkSelf. A client is supposed to do something like the first two of the following lines. The third line produces the error shown.
local date = Date('2016-07-18') local s = date:text() local s = date.text() date:text: need a date (use "date:text()" with a colon)
IPAddress's getHighestIP and getPrefix
I think we are breaking the IP address metaphor by having getHighestIP and getPrefix methods in IPAddress. An IP address doesn't have a highest IP, for example - it is only one IP. If you wanted to find the highest IP in a subnet from a given IP address, then from an interface point of view ip:getSubnet(bitLength):getHighestIP()
would make more sense than ip:getHighestIP(bitLength)
. Looking at how you have put together the subnet class, though, I can see that just removing those methods is going to create problems. It looks like we need some way to make a new IPAddress object directly from the IP parts, rather than only through IPAddress.new. I am thinking that we need a newIPAddress function that is called by IPAddress.new and is accessible from Subnet, but is not accessible to callers. Will think some more about this. — Mr. Stradivarius ♪ talk ♪ 03:22, 18 July 2016 (UTC)
- Yes, makeSubnet needs a way to construct an IP from the IP's parts. I had better leave doing more edits until you have had a chance to decide on the design. If you wanted to take a break, I would have a look at what newIPAddress might look like, but I would need a couple of days in order to ensure some quiet thinking time. While what you say is correct, from the point of view of makeSubnet, it has an IP object and it needs to construct another IP based on the first. In summary:
local base = IPAddress.new(lhs) -- the IP entered in "IP/n" local prefix = base:getPrefix(n) -- IP from clearing host bits in base data.highestIP = base:getHighestIP(n) -- IP from setting host bits in base