Module talk:IP

Plan

A discussion at WP:Lua (permalink) led to the creation of this module with a plan for a general-purpose IP address handler. I pointed out that Module:IPblock (which implements {{blockcalc}}) includes functions that can extract IPv4 and IPv6 IPs from text arguments, and convert IPs for text display, and can do various calculations including determining the best available CIDR block summary of given IPs.

Apart from the advantage of having a general-purpose IP handler, the immediate requirement was for a module that could generate the list at {{Sensitive IP addresses}} and output identical information for use by MediaWiki:Group-sysop.js.

I am contemplating how functions from Module:IPblock might be used here, and these are my preliminary thoughts. General purpose handling of CIDR blocks is tricky and I built in a restriction to make implementation easier. The module accepts individual IPs (IPv4 and IPv6), and CIDR blocks (IPv4 only, and only of form IP/n where n is 16 to 32 inclusive—that is, 65,536 addresses or fewer). The module puts the IPs in two lists, one for v4 and one for v6, then processes each list. Each list is sorted and duplicate IPs are removed (that's helpful for the user and necessary for the method used to create summaries). Then summaries are generated and the results displayed in a table.

A key point is that each CIDR block is expanded to individual IPs: for example, 1.2.3.224/27 would be expanded to 32 individual addresses (1.2.3.4.224 to 1.2.3.4.255 inclusive). That is done because the method used to create summaries needs a list of individual IPs.

It would be possible to work with CIDR blocks without expanding them, but I suspect the code complexity would blow out, and it's already pretty fierce.

The outline in Module:IP is great, and I initially contemplated something like that, but pragmatism overtook me and I did what was needed for {{blockcalc}}. A more general module would be good, but some of the proposed methods may be difficult to implement. I'm thinking of IPBlock removeIPs—if the input is 1.2.3.224/27 and 1.2.3.243 is removed, the result is 31 IPs which can be summarized as four CIDR blocks and an IP, as shown by blockcalc (this is the result of passing the 31 IPs to {{blockcalc}}):

Sorted 31 IPv4 addresses:

1.2.3.224 – 1.2.3.242

1.2.3.244 – 1.2.3.255

Total affected	Affected addresses	Given addresses	Range	Contribs

32	32	31	1.2.3.224/27	contribs

31	16	16	1.2.3.224/28	contribs
	2	2	1.2.3.240/31	contribs
	1	1	1.2.3.242	contribs
	4	4	1.2.3.244/30	contribs
	8	8	1.2.3.248/29	contribs

The module would either have to keep a list of each individual IP (what Module:IPblock does), or maintain a list of blocks—possible but awkward.

Before starting the implementation, can we discuss the plan. Are addIPs and removeIPs wanted? For any number of IPs?

My pragmatic side also wonders whether the implementation effort would be worthwhile—would it actually be used? The requirement for sensitive IPs probably does not need the generality, and it might not need a module that can process IP addresses at all (although I haven't looked at what is needed for the js output). Johnuniq (talk) 11:03, 14 July 2016 (UTC)[reply]

@Johnuniq: Thank you for taking the time to look at this. In terms of functionality for the JavaScript side of things, we need to know the following:

Whether the IP an admin is about to block is in a sensitive range
Whether the IP range that an admin is about to block overlaps with a sensitive range
A description of the sensitive range
A list of CIDR blocks in the sensitive range

Only the first two need to involve the library, I think - we would need to know whether an IP is in a given range (that is IPBlock's obj:containsIP), and whether two IP blocks overlap with each other. The rest can be done by Module:Sensitive IP addresses.

Thank you also for the demonstration of {{blockcalc}}'s functionality. That is even more impressive than I first thought! I am in two minds about the direction we should go in now that I know more about what would be involved in moving functionality from Module:IPblock to this library.

Option 1: We keep the library small and only implement the functionality that is shared between the different modules. The IPBlock class does not store individual IPs, but only the prefix and bit length.
- Pros: The code will be relatively simple, and can be tested and deployed quickly.
- Cons: Not that much functionality is shared - possibly only the ability to parse IPv4 and IPv6 strings. Integrating the library into Module:IPblock may be more trouble than it's worth.
Option 2: We integrate more functionality from Module:IPblock into the library. Instead of one IPBlock class, we have two classes: IPBlock and Subnet. The IPBlock class stores individual IPs which can be added and removed, and can take a maximum of 65,536 addresses. The Subnet class is like the IPBlock class from option 1 - it only stores prefix and bit length.
- Pros: It will be easier to integrate into Module:IPblock, and it wins a lot of geek credit.
- Cons: The codebase will be much more complicated. Although much of the code exists in Module:IPblock already, moving it may take some serious development effort, and testing/deployment may be slow.

After typing all of that out, I think the way forward has become clearer. At first I think we should implement the IPAddress and Subnet classes, after which I can write Module:Sensitive IP addresses, and then later if we deem it is necessary we can implement the IPBlock class. (Perhaps if someone wants to write another module that needs similar functionality.) Perhaps we should choose different names, as well - something like IPCollection may be better than IPBlock at indicating that objects will hold individual IPs and can be divided up into smaller blocks. I'll have a think about all of this and sketch out another outline for the interface maybe tomorrow. — Mr. Stradivarius ^{♪ talk ♪} 14:27, 14 July 2016 (UTC)[reply]

I added some functions to make IPAddress nearly complete. I just plonked them in with minimal changes to work with the outline—feel free to massage them for style. The following would work.

IPAddress = require('Module:IP').IPAddress
ip1 = IPAddress.new(' 1.2.3.4 ')
ip2 = IPAddress.new('12.34.56.78')
ip3 = IPAddress.new(' 01:002:3:0004:: ')
ip1 < ip2       -- true
tostring(ip1)   -- '1.2.3.4'
tostring(ip3)   -- '1:2:3:4::'

I'll probably be able to do most of the other stuff in the next two or three days. Please see the "notes" section near the top. Johnuniq (talk) 12:36, 15 July 2016 (UTC)[reply]

@Mr. Stradivarius: I changed getIP() to use the new tostring code. That means it returns the normalized string representation of the IP address which may be different from the text that the caller provided. I suspect that is a good idea, but the reason I did it was to make it easier to implement getNextIP and getPreviousIP. It's easier because a new IP can now be constructed using a given table of numbers. It is only possible for methods to do that—a client program does not have access to the ~~IPAddress~~ uniqueId variable and so cannot use the trick employed by getNextIP. Please let me know if there is a better way of integrating the new code.

Please see the TODO items. Fix the style now if wanted, or do it later. What I would like is some thoughts on the first TODO. The points there need to be fixed, and isInSubnet needs to be implemented when Subnet is fixed. Then IPAddress is complete...I think.

I saw Module:IP/testcases but I also created Module:Sandbox/Johnuniq/ip for temporary testing. See the results at Module talk:Sandbox/Johnuniq/ip. Johnuniq (talk) 07:41, 16 July 2016 (UTC)[reply]

Groan, I need watching. Obviously clients do have access to IPAddress! I changed it to uniqueId which is definitely private. I haven't uploaded that yet, and there may be some clever way to avoid the kludge. Johnuniq (talk) 03:16, 17 July 2016 (UTC)[reply]

Sorry for not replying sooner - it's been a busy couple of days. First off, I agree that getIP() should return the normalized string representation of the IP. I expect that keeping the output in a predictable format will be useful for reusers of the library, and if they need their original input string they can always keep it in a variable somewhere. As for getIPParts(), that is a bit ugly, yes. Ideally we shouldn't have to make implementation details like that public. I will have a think whether there is any better way of doing it, but I can't think of any right now. Moving the metatable definition inside IPAddress.new seems like it would be a good idea, but it has the drawback that mt.__eq would no longer work - ~~__eq requires that both objects being compared use the exact same metatable~~. Let me ponder this one a bit longer. I'll also have a go at making the style more uniform. — Mr. Stradivarius ^{♪ talk ♪} 12:15, 17 July 2016 (UTC)[reply]

Actually, I see I have that wrong. __eq requires that the function be the same function object for both objects being compared - the metatables themselves can be different. Let me think some more about this. — Mr. Stradivarius ^{♪ talk ♪} 13:17, 17 July 2016 (UTC)[reply]

Ok, I found a way that works. If we use a unique key and set an __index metamethod inside IPAddress.new, then we can allow access to the data upvalue for any code that knows the unique key. Code that can't access the unique key can't get to the data table, even with pairs or getmetatable. Untrusted code could break the metamethods by setting a new metatable, but that still doesn't allow access to the data. — Mr. Stradivarius ^{♪ talk ♪} 13:55, 17 July 2016 (UTC)[reply]

Subnet

I'm wondering about addIPs(...) and addIPsFromString(s). Are those names from the original idea? I can't see how they relate to my understanding of Subnet. Also, what is "options" in Subnet.new(options)? I would have thought that Subnet.new(ip) was wanted, where ip is a CIDR string like "1.2.3.0/24". And addIPs and addIPsFromString would be removed? If what I've suggested is all that is needed for Subnet, I can probably get it working in a day or so. At some later time we could look at addIPsFromString and others. Johnuniq (talk) 09:47, 16 July 2016 (UTC)[reply]

The idea behind addIPs(...) was that you could do something like addIPs('1.2.3.0', '1.2.3.1', '1.2.3.255'), and that the prefix and bit length would be recalculated to incorporate the new IPs if they were not in the subnet already. addIPsFromString(s) was going to be the same thing, but from a string, like addIPsFromString('1.2.3.0 1.2.3.1 1.2.3.255'). addIPs(...) would also accept IPAddress objects as well as strings. Perhaps we don't really need this functionality, but then again perhaps implementing it wouldn't be all that much hassle. I'm in two minds about it. The "options" in Subnet.new(options) is a table of named arguments. My original idea was to allow Subnet.new('1.2.3.0/24'), Subnet.new('1.2.3.0', '1.2.3.1', '1.2.3.255') and Subnet.new('1.2.3.0 1.2.3.1 1.2.3.255'). If you allow the latter, though, the module has to know whether you want IPv4 addresses or IPv6 addresses, in order to parse something like Subnet.new('1.2.3.0 01:002:3:0004:: 1.2.3.255'). So I thought that maybe we needed something like Subnet.new{ IPs = '1.2.3.0 01:002:3:0004:: 1.2.3.255', version = 'IPv4' }. Perhaps I am trying to make things too complicated, however. — Mr. Stradivarius ^{♪ talk ♪} 12:36, 17 July 2016 (UTC)[reply]

I just updated Module:IP and Module:Sandbox/Johnuniq/ip (with its results at Module talk:Sandbox/Johnuniq/ip). It's mostly done, but very rushed and I'll be checking/refactoring it. Thanks for the replies but I have to go and can't contemplate them now. Back later. Johnuniq (talk) 13:02, 17 July 2016 (UTC)[reply]

Internal IP representation

Something I've been wondering about the internal representation of the IPs - why did you make the bit length of each part 16 bits rather than 32? Lua's double precision floats should be able to precisely represent integers between about -2^53 and 2^53 (I forget the exact cutoff), so an IPv4 address should fit inside one Lua number, and an IPv6 address should fit inside a table of 4 Lua numbers. — Mr. Stradivarius ^{♪ talk ♪} 14:31, 17 July 2016 (UTC)[reply]

I think side-by-side comparisons would be needed to see what was best in practice. However, I chose 16 bits partly for convenience because it makes working with IPv6 easier—a bunch of calculations would be needed to convert the internal representation to-and-from text if 32 bits were used. A second point which weighed more heavily was a hunch that the code emulating bit-wise operations is more efficient working 16 bits at a time. See copyPrefix and setHostBits. Also, the function common_length from Module:IPblock is used frequently to determine the number of prefix bits that two IPs have in common, and it is able to step towards the answer using 16 bits at a time—if the answer is, say 18 bits, it does a very simple calculation for the first 16 bits, then needs only to determine that the next 2 bits are common. I suspect it would be a little less efficient if it were working 32 bits at a time. Johnuniq (talk) 23:39, 17 July 2016 (UTC)[reply]