Talk:Hopcroft–Karp algorithm
![]() | This article has not yet been rated on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||||||||||||||||||||||||||||||
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.
|
Example Python code
def match_hopcroft_karp(adjLR):
"""Find maximum matching in a bipartite graph L u R specified by its L->R adjacency matrix which maps L to an iterable in R."""
matchLR={};matchRL={};lev={}# Initialise 'globals' used by bfs, dfs and main body
def bfs():
L=set([l for l in adjLR if l not in matchLR])
lev.clear();depth=0;seen=set()
# Build each level
while len(L)>0:
for l in L: lev[l]=depth
L1=set()
for l in L:
for r in adjLR[l]:
if r not in seen:
# l->r must be unmatched now, since 'seen' prevents return to l's parent on right
seen.add(r)
if r not in matchRL: return True
L1.add(matchRL[r])
depth+=1;L=L1
return False
def dfs(l,targl):
if lev.get(l,None)!=targl: return False
lev[l]=None
for r in adjLR[l]:
if r not in matchRL or dfs(matchRL[r],targl+1): matchLR[l]=r;matchRL[r]=l;return True
return False
nmatch=0
while bfs():
for l in adjLR:
if l not in matchLR and dfs(l,0): nmatch+=1
return (nmatch,matchLR,matchRL)
if __name__ == '__main__':
adj={0:{'a','b','c'}, 1:{'d'}, 2:{'a','c','e'}, 3:{'d','e'}, 4:{'d'}}
print match_hopcroft_karp(adj)
Placeholder edit to start conversation with David Eppstein. (It's past my bedtime now.) I've reordered the statement 'lev[l]=None' because I think it's clearer that way, though equivalent. Each vertex gets marked as used as soon as it is processed. I think this means that dfs() can only be called at most 2m times in a given phase, once for each edge direction. (Under the normal assumption that the graph is connected, otherwise we'd have to say 2m+n.) I'm sorry if my reverts were rude - I'm not very familiar with Wikipedia editing and the best way to converse with other editors.Alex Selby (talk) 05:18, 15 April 2017 (UTC)
- My general feeling is that we should not have example code in Wikipedia at all, except possibly for very simple algorithms. It is too hard to verify and too easy to make big mistakes, too much original research (unless inappropriately copied from somewhere else) and too far from something that will actually assist human readers in understanding the subject. I think my earlier objections (that because this doesn't make each individual dfs call fast, it doesn't meet the advertised time bound) may have been mistaken, but I think the bigger point is still valid. —David Eppstein (talk) 07:35, 15 April 2017 (UTC)
- Ah, that is a more fundamental objection than I expected. From my point of view, I find example code a very useful adjunct in understanding a concept (alongside an informal description which is also very helpful), and judging by the comments here I am not the only one. Compared with an informal description, where a turn of phrase can hide subtle difficulties, actual code is at least unambiguous. I think Python code has both the advantages of pseudocode, in that it reads more-or-less like English, and also the advantages of a formal system in that it follows an unambiguous system that anyone can execute. For example, in the present pseudocode, Q isn't initialised anywhere. If you were trying to understand the algorithm for the first time, this would be annoying (does Q persist across calls to BFS or not?). And in fact in Hopcroft+Kahn's original paper, there is a bug in their DFS pseudocode (missing DELETE). Of course these can be fixed, but my point is they would never have been there in the first place in code that you can actually execute. Obviously I realise that code can have bugs, but it is forced at least to make sense by the discipline of the language.
- For these reasons I'd be happy to do away with the pseudocode in favour of Python code (not necessarily the program I submitted). The present pseudocode is virtually Python already, but (as mentioned above) it wouldn't work as it is. (Also, I think it is a less straightforward/direct way of writing the algorithm to introduce the special 'NIL' node, though I suppose that is a matter of taste.)
- But I'm not sure how deep your objections go. Perhaps you also think that pseudocode is bad in a Wikipedia article. I would disagree with that too, in that I believe that pseudocode is better than no code at all for similar reasons to those above. Pseudocode imposes more of a discipline than an informal description and makes it harder for the writer to gloss over a tricky point and easier for a reader to cut to the point of what they want to understand. Alex Selby (talk) 11:56, 15 April 2017 (UTC)
- No, pseudocode is ok, because its primary purpose is to be read by humans. Another disadvantage of actual code is that when it is present the articles containing it quickly turn into code farms, with the Ruby and Go and Erland and Haskell and Javascript programmers all wanting to show how it's done in their languages. —David Eppstein (talk) 15:42, 15 April 2017 (UTC)
Untitled
This needs an explanation of the "run really fast" step (http://www.fysh.org/~katie/computing/methodologies.txt)
Delta
Is the only standard terminology we can use? I would prefer to leave to the maximum degree of the graph. Adking80 (talk) 22:06, 18 July 2008 (UTC)
- I have seen (symmetric difference) used in several papers. It seems a much better notation to me. Also, the meaning of is not explained on the page. I have updated the page accordingly. —Preceding unsigned comment added by Michael Veksler (talk • contribs) 22:29, 9 October 2008 (UTC)
Shortestest
Is shortestest a word? ;D User:Ozzie (talk) 20:37, 7 March 2008 (UTC)
Maximum vs. maximal
Changed the incorrect "maximum" to "maximal". —Preceding unsigned comment added by 80.145.225.62 (talk • contribs) 22:44, 3 October 2006 UTC
- I agree. Thanks for persisting after the incorrect revert. —David Eppstein 05:55, 4 October 2006 (UTC)
- I am sorry for the incorrect revert. I now think that I was wrong. I started this article because the algorithm was mentioned in the text of another article. The only reference I had was CLRS, where it said maximum. So when I saw this edit from an unregistered user, I checked in CLRS, saw that it said maximum, and assumed that the editor was either in error or mischevious. Sorry about this. Let me check that we agree with terminology:
- If we go through a one-dimmensional state-space (x-axis below), where each state has a value (y-axis below),
B /\ A / \ /\ / \ / \ / \ / \/ \ / \
- then here both A and B are maximal points, while only B is a maximum. This is the terminology used in the article Matching.
- The description in CLRS is actually just given as an exercise. I tried to solve it, but could not see how to find "maximum set of vertex-disjoint shortest augmenting paths" in , so I just wrote the pseudocode for the algorithm as described, and put it in my ever-growing TODO to figure out the missing part. I now found a description of the pseudo-code which still says maximum in the comments, but the code clearly gives only something maximal. So the problem was simpler than I thought :) Thanks for the correction, and sorry about my error. Klem fra Nils Grimsmo 09:16, 4 October 2006 (UTC)
Difference from Dinic algorithm (Dinits' algorithm)?
What's the difference between this algorithm and Dinic algorithm? To me, it just looks like Dinic algorithm - only run on a specific input, so it has better time complexity than general-case Dinic. --Lukas Mach 22:27, 27 December 2006 (UTC)
- It's none... H-K is 1973 algorithm and Dinic is 1976 algorithm. --Lukas Mach 01:55, 26 February 2007 (UTC)
Another estimations
Another estimations are O((V+E)V 1/2) and O(V 5/2): Lipski W., Kombinatoryka dla Programistow, [Combinatorics for Programmers], Wydawnictwa Naukowo-Techniczne, Warzawa, 1982.--Tim32 (talk) 10:58, 19 November 2007 (UTC)
- Both bounds are worse. We can assume that every vertex is incident to at least one edge, so . We also know that . Adking80 (talk) 22:04, 18 July 2008 (UTC)
Pseudocode
Is it just me, or is the pseudocode rather hard to read? For example, the first loop tripped me a bit, because it is some akward mixture of high-level functions ((u,v) in E) and low-level implementation (for node u, node v). Furthermore, the code checks for nodes being in M. I do not understand this, as M is a matching, and thus, a set of edges? --Tetha (talk) 05:49, 11 June 2008 (UTC)
- M is a matching. When the code asks whether a vertex is in M, it means, is it an endpoint of one of the edges in M? I don't think it's any shorter or clearer than what's here, but if it would help you to see actual code for this algorithm, I have some here. (You can ignore the "imperfections" routine in that file; it solves a different and somewhat more complicated problem.) —David Eppstein (talk) 06:32, 11 June 2008 (UTC)
- At least for me, the python code helps a lot. My definition of a matching came from graph theory, where a matching was just a set of edges that encode the matching (that is, an edge (u,v) in M means: u is matched with v). Your matching is a mapping that maps a node u to the node v, with u being matched with v. In your world, it makes a lot of sense to ask: node in matching, in mine, it did not (I rather would ask: exist an x with (x, u) in M.). Thank you. --Tetha (talk) 09:39, 11 June 2008 (UTC)
LIFO queue?
Page says:
We also maintain a queue(last in first out datastructure)
Last in, first out is a stack, no? —Preceding unsigned comment added by Talex (talk • contribs) 11:40, 20 July 2008 (UTC)
Queue is FIFO, stack is LIFO...I seem to be strange. 210.122.65.2 (talk) 07:50, 18 August 2008 (UTC)
Incorrect pseudo-code
I am sorry that I was trigger-happy with the main article. Only after I have completed my edit did I realize that it should have been discussed on this forum first. (First time to edit a Wikipedia page).
Did not touch the pseudo-code itself, only the part that explains it. It may take me much longer to write a readable and correct version of the pseudo-code.
The pseudo code is not complicated without a reason, it is incorrect. I realized it after I have implemented the algorithm only to see that in some cases it requires almost |V| iterations (rather than the promised ).
The main problem with the pseudo code (other than clarity) is that the BFS stops immediately when it discovers an augmenting path. The is against the way the original algorithm works, when it generates a maximal number of augmenting paths each time.
The proof that only iterations are needed is based on the premises that each BFS iteration will return strictly longer augmenting paths. The current pseudo-code does not work this way.
For example, consider a graph with edges (divided into n subsets):
- (u01, v01) (u01, v02) (u02, v01)
- (u11, v11) (u11, v12) (u12, v11)
...
- (u_n1, v_n1) (u_n1, v_n2) (u_n2, v_n1)
With initial matching marked in bold The pseudo-code will require n iterations, when the real algorithm requires only one. Michael Veksler (talk) 23:09, 9 October 2008 (UTC)
Copied the suggested pseudo-code to the main page. Removed it from here to avoid redundancy of a big piece of information. Michael Veksler (talk) 07:21, 11 October 2008 (UTC)
The pseudo-code was introduced byMichael Veksler (talk) 01:31, 10 October 2008 (UTC)
Minor fixes to the previous version of the algorithm (also by me) Michael Veksler (talk) 06:39, 10 October 2008 (UTC)
This pseudo-code is huge. Maybe it should be rewritten in python as Tetha (talk) has suggested. Unfortunately, I have never written more than few lines in python at a time.
Log: Using u.matching instead of M throughout the algorithm. This gives the right sense of the O(1) nature of operations on M. Michael Veksler (talk) 08:30, 10 October 2008 (UTC)
Log: Put constraints over "for all" elements at the "for all" line, rather than on a separate if line.
I have implemented the pseudo-code and it does not seem to be working (Kursancew) —Preceding unsigned comment added by 189.95.58.234 (talk) 15:38, 25 November 2008 (UTC)
I regret that this pseudo-code came out to be as huge as it is. Maybe it will be possible to refine it into something saner than what I could come up at the time. I have a perl implementation of the algorithm that works, but its readability is poor. If anyone wants to have a look at that perl implementation I will be happy to mail or publicize this implementation. Michael Veksler (talk) 09:34, 16 December 2008 (UTC)
Terminology: augmenting and alternating paths
It would be nice if the definitions of augmenting/alternating paths in Matching#Definition and Hopcroft–Karp algorithm#Alternating paths matched. — Miym (talk) 23:04, 5 March 2009 (UTC)
- It turns out that the Matching article's terminology matches the terminology in Ahuja, Magnanti, and Orlin. So I'll fix up this article to match. —David Eppstein (talk) 23:19, 5 March 2009 (UTC)
Weighted graphs?
I read some papers that, for a maximum weighted bipartite graph matching, claim they use Hopcroft-Karp instead of the Hungarian algorithm, since it's O(n^2.5) instead of O(n^3). See for example "Tracking Across Multiple Cameras With Disjoint Views" (Javed, Shah, 2003). Are these guys just clueless people trying to make it seem like they're using the new "hip" algorithm that is faster instead of the "old" Hungarian algorithm? Hopcroft-Karp has nothing to do with weights. Or am I missing something? This can be considered a widely referenced paper in Computer Vision. Jotaf (talk) 18:22, 31 August 2010 (UTC)
- If you're missing something, so am I; I don't know of a weighted version of Hopcroft-Karp. —David Eppstein (talk) 19:33, 31 August 2010 (UTC)
Pseudo-code conflicts with the algorithm
The pseudo-code DFS from vertices that has a Dist of 0, which starts a path, whereas the algorithm states that we should begin from vertices that ends a path. According to Dinic Algorithm, this corresponds to BFS from the sink and DFS from the source. I am not sure if this will make a difference to the time complexity. — Preceding unsigned comment added by Tonyxty (talk • contribs) 02:04, 14 April 2011 (UTC)
BFS should stop as soon as it reaches NIL
It's been several hours since I implemented this algorithm. This pseudocode is really helpful but...
BFS function does not do what it is supposed to do. As soon as it reaches NIL, it should clear queue Q and return true. Then there's no need to check NIL distance at the end of the function body, false should be returned. Now algorithm proceeds, causing (not asymptotically significant) time loss and maybe strange behavior on undefined Adj[NIL]. 80.188.71.176 (talk) 02:10, 16 January 2012 (UTC)
"Analysis"
"It can be shown that each phase increases the length of the shortest augmenting path by at least one: the phase finds a maximal set of augmenting paths of the given length, so any remaining augmenting path must be longer."
This is not that simple. In fact, showing that the length of the shortest augmenting path increases is perhaps the most difficult part of the entire proof. The error comes down to the distinction between "maximal" and "maximum." There may be other paths of the same length, but just not ones which are vertex-disjoint from these ones. 68.61.30.23 (talk) 16:48, 21 April 2013 (UTC)
- All unassessed articles
- B-Class mathematics articles
- Low-priority mathematics articles
- Stub-Class Computing articles
- Unknown-importance Computing articles
- Automatically assessed Computing articles
- All Computing articles
- B-Class Computer science articles
- Low-importance Computer science articles
- WikiProject Computer science articles