Talk:Hopcroft–Karp algorithm

This article has not yet been rated on Wikipedia's content assessment scale.
It is of interest to the following WikiProjects:

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Mathematics B‑class Low‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
B	This article has been rated as B-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-priority on the project's priority scale.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Computing Stub‑class

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
Stub	This article has been rated as Stub-class on Wikipedia's content assessment scale.
???	This article has not yet received a rating on the project's importance scale.
	This article has been automatically rated by a bot or other tool as Stub-class because it uses a stub template. Please ensure the assessment is correct before removing the `\|auto=` parameter.

Please add the quality rating to the {{WikiProject banner shell}} template instead of this project banner. See WP:PIQA for details.

Computer science B‑class Low‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

B

This article has been rated as B-class on Wikipedia's content assessment scale.

Low

This article has been rated as Low-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Example Python code

def match_hopcroft_karp(adjLR):
    """Find maximum matching in a bipartite graph L u R specified by its L->R adjacency matrix which maps L to an iterable in R."""

    matchLR={};matchRL={};lev={}# Initialise 'globals' used by bfs, dfs and main body

    def bfs():
        L=set([l for l in adjLR if l not in matchLR])
        lev.clear();depth=0;seen=set()
        # Build each level
        while len(L)>0:
            for l in L: lev[l]=depth
            L1=set()
            for l in L:
                for r in adjLR[l]:
                    if r not in seen:
                        # l->r must be unmatched now, since 'seen' prevents return to l's parent on right
                        seen.add(r)
                        if r not in matchRL: return True
                        L1.add(matchRL[r])
            depth+=1;L=L1
        return False

    def dfs(l,targl):
        if lev.get(l,None)!=targl: return False
        lev[l]=None
        for r in adjLR[l]:
            if r not in matchRL or dfs(matchRL[r],targl+1): matchLR[l]=r;matchRL[r]=l;return True
        return False

    nmatch=0
    while bfs():
        for l in adjLR:
            if l not in matchLR and dfs(l,0): nmatch+=1
    return (nmatch,matchLR,matchRL)

if __name__ == '__main__':
    adj={0:{'a','b','c'}, 1:{'d'}, 2:{'a','c','e'}, 3:{'d','e'}, 4:{'d'}}
    print match_hopcroft_karp(adj)

Placeholder edit to start conversation with David Eppstein. (It's past my bedtime now.) I've reordered the statement 'lev[l]=None' because I think it's clearer that way, though equivalent. Each vertex gets marked as used as soon as it is processed. I think this means that dfs() can only be called at most 2m times in a given phase, once for each edge direction. (Under the normal assumption that the graph is connected, otherwise we'd have to say 2m+n.) I'm sorry if my reverts were rude - I'm not very familiar with Wikipedia editing and the best way to converse with other editors.Alex Selby (talk) 05:18, 15 April 2017 (UTC)[reply]

My general feeling is that we should not have example code in Wikipedia at all, except possibly for very simple algorithms. It is too hard to verify and too easy to make big mistakes, too much original research (unless inappropriately copied from somewhere else) and too far from something that will actually assist human readers in understanding the subject. I think my earlier objections (that because this doesn't make each individual dfs call fast, it doesn't meet the advertised time bound) may have been mistaken, but I think the bigger point is still valid. —David Eppstein (talk) 07:35, 15 April 2017 (UTC)[reply]

Ah, that is a more fundamental objection than I expected. From my point of view, I find example code a very useful adjunct in understanding a concept (alongside an informal description which is also very helpful), and judging by the comments here I am not the only one. Compared with an informal description, where a turn of phrase can hide subtle difficulties, actual code is at least unambiguous. I think Python code has both the advantages of pseudocode, in that it reads more-or-less like English, and also the advantages of a formal system in that it follows an unambiguous system that anyone can execute. For example, in the present pseudocode, Q isn't initialised anywhere. If you were trying to understand the algorithm for the first time, this would be annoying (does Q persist across calls to BFS or not?). And in fact in Hopcroft+Kahn's original paper, there is a bug in their DFS pseudocode (missing DELETE). Of course these can be fixed, but my point is they would never have been there in the first place in code that you can actually execute. Obviously I realise that code can have bugs, but it is forced at least to make sense by the discipline of the language.

For these reasons I'd be happy to do away with the pseudocode in favour of Python code (not necessarily the program I submitted). The present pseudocode is virtually Python already, but (as mentioned above) it wouldn't work as it is. (Also, I think it is a less straightforward/direct way of writing the algorithm to introduce the special 'NIL' node, though I suppose that is a matter of taste.)

But I'm not sure how deep your objections go. Perhaps you also think that pseudocode is bad in a Wikipedia article. I would disagree with that too, in that I believe that pseudocode is better than no code at all for similar reasons to those above. Pseudocode imposes more of a discipline than an informal description and makes it harder for the writer to gloss over a tricky point and easier for a reader to cut to the point of what they want to understand. Alex Selby (talk) 11:56, 15 April 2017 (UTC)[reply]

No, pseudocode is ok, because its primary purpose is to be read by humans. Another disadvantage of actual code is that when it is present the articles containing it quickly turn into code farms, with the Ruby and Go and Erland and Haskell and Javascript programmers all wanting to show how it's done in their languages. —David Eppstein (talk) 15:42, 15 April 2017 (UTC)[reply]

Untitled

This needs an explanation of the "run really fast" step (http://www.fysh.org/~katie/computing/methodologies.txt)

Delta

Is $N\Delta P$ the only standard terminology we can use? I would prefer to leave $\Delta$ to the maximum degree of the graph. Adking80 (talk) 22:06, 18 July 2008 (UTC)[reply]

I have seen

N\oplus P

(symmetric difference) used in several papers. It seems a much better notation to me. Also, the meaning of

\Delta

is not explained on the page. I have updated the page accordingly. —Preceding unsigned comment added by Michael Veksler (talk • contribs) 22:29, 9 October 2008 (UTC)[reply]

Shortestest

Is shortestest a word? ;D User:Ozzie (talk) 20:37, 7 March 2008 (UTC)[reply]

Maximum vs. maximal

Changed the incorrect "maximum" to "maximal". —Preceding unsigned comment added by 80.145.225.62 (talk • contribs) 22:44, 3 October 2006 UTC

I agree. Thanks for persisting after the incorrect revert. —David Eppstein 05:55, 4 October 2006 (UTC)[reply]

I am sorry for the incorrect revert. I now think that I was wrong. I started this article because the algorithm was mentioned in the text of another article. The only reference I had was CLRS, where it said maximum. So when I saw this edit from an unregistered user, I checked in CLRS, saw that it said maximum, and assumed that the editor was either in error or mischevious. Sorry about this. Let me check that we agree with terminology:

If we go through a one-dimmensional state-space (x-axis below), where each state has a value (y-axis below),

           B                                                                   
           /\                                                                  
   A      /  \                                                                 
   /\    /    \                                                                
  /  \  /      \                                                               
 /    \/        \                                                              
/                \

then here both A and B are maximal points, while only B is a maximum. This is the terminology used in the article Matching.

The description in CLRS is actually just given as an exercise. I tried to solve it, but could not see how to find "maximum set of vertex-disjoint shortest augmenting paths" in

O(V+E)

, so I just wrote the pseudocode for the algorithm as described, and put it in my ever-growing TODO to figure out the missing part. I now found a description of the pseudo-code which still says maximum in the comments, but the code clearly gives only something maximal. So the problem was simpler than I thought :) Thanks for the correction, and sorry about my error. Klem fra Nils Grimsmo 09:16, 4 October 2006 (UTC)[reply]

Difference from Dinic algorithm (Dinits' algorithm)?

What's the difference between this algorithm and Dinic algorithm? To me, it just looks like Dinic algorithm - only run on a specific input, so it has better time complexity than general-case Dinic. --Lukas Mach 22:27, 27 December 2006 (UTC)[reply]

It's none... H-K is 1973 algorithm and Dinic is 1976 algorithm. --Lukas Mach 01:55, 26 February 2007 (UTC)[reply]

Another estimations

Another estimations are O((V+E)V ^1/2) and O(V ^5/2): Lipski W., Kombinatoryka dla Programistow, [Combinatorics for Programmers], Wydawnictwa Naukowo-Techniczne, Warzawa, 1982.--Tim32 (talk) 10:58, 19 November 2007 (UTC)[reply]

Both bounds are worse. We can assume that every vertex is incident to at least one edge, so

V+E=O(E)

. We also know that

E<V^{2}/2

. Adking80 (talk) 22:04, 18 July 2008 (UTC)[reply]

Pseudocode

Is it just me, or is the pseudocode rather hard to read? For example, the first loop tripped me a bit, because it is some akward mixture of high-level functions ((u,v) in E) and low-level implementation (for node u, node v). Furthermore, the code checks for nodes being in M. I do not understand this, as M is a matching, and thus, a set of edges? --Tetha (talk) 05:49, 11 June 2008 (UTC)[reply]

M is a matching. When the code asks whether a vertex is in M, it means, is it an endpoint of one of the edges in M? I don't think it's any shorter or clearer than what's here, but if it would help you to see actual code for this algorithm, I have some here. (You can ignore the "imperfections" routine in that file; it solves a different and somewhat more complicated problem.) —David Eppstein (talk) 06:32, 11 June 2008 (UTC)[reply]

At least for me, the python code helps a lot. My definition of a matching came from graph theory, where a matching was just a set of edges that encode the matching (that is, an edge (u,v) in M means: u is matched with v). Your matching is a mapping that maps a node u to the node v, with u being matched with v. In your world, it makes a lot of sense to ask: node in matching, in mine, it did not (I rather would ask: exist an x with (x, u) in M.). Thank you. --Tetha (talk) 09:39, 11 June 2008 (UTC)[reply]

LIFO queue?

Page says:

We also maintain a queue(last in first out datastructure)

Last in, first out is a stack, no? —Preceding unsigned comment added by Talex (talk • contribs) 11:40, 20 July 2008 (UTC)[reply]

Queue is FIFO, stack is LIFO...I seem to be strange. 210.122.65.2 (talk) 07:50, 18 August 2008 (UTC)[reply]

Incorrect pseudo-code

I am sorry that I was trigger-happy with the main article. Only after I have completed my edit did I realize that it should have been discussed on this forum first. (First time to edit a Wikipedia page).

Did not touch the pseudo-code itself, only the part that explains it. It may take me much longer to write a readable and correct version of the pseudo-code.

The pseudo code is not complicated without a reason, it is incorrect. I realized it after I have implemented the algorithm only to see that in some cases it requires almost |V| iterations (rather than the promised $O({\sqrt {(|V|)}})$ ).

The main problem with the pseudo code (other than clarity) is that the BFS stops immediately when it discovers an augmenting path. The is against the way the original algorithm works, when it generates a maximal number of augmenting paths each time.

The proof that only $O({\sqrt {(|V|)}})$ iterations are needed is based on the premises that each BFS iteration will return strictly longer augmenting paths. The current pseudo-code does not work this way.

For example, consider a graph with edges (divided into n subsets):

(u01, v01) (u01, v02) (u02, v01)
(u11, v11) (u11, v12) (u12, v11)

...

(u_n1, v_n1) (u_n1, v_n2) (u_n2, v_n1)

With initial matching marked in bold The pseudo-code will require n iterations, when the real algorithm requires only one. Michael Veksler (talk) 23:09, 9 October 2008 (UTC)[reply]

Copied the suggested pseudo-code to the main page. Removed it from here to avoid redundancy of a big piece of information. Michael Veksler (talk) 07:21, 11 October 2008 (UTC)[reply]

The pseudo-code was introduced byMichael Veksler (talk) 01:31, 10 October 2008 (UTC)[reply]

Minor fixes to the previous version of the algorithm (also by me) Michael Veksler (talk) 06:39, 10 October 2008 (UTC)[reply]

This pseudo-code is huge. Maybe it should be rewritten in python as Tetha (talk) has suggested. Unfortunately, I have never written more than few lines in python at a time.

Log: Using u.matching instead of M throughout the algorithm. This gives the right sense of the O(1) nature of operations on M. Michael Veksler (talk) 08:30, 10 October 2008 (UTC)[reply]

Log: Put constraints over "for all" elements at the "for all" line, rather than on a separate if line.

I have implemented the pseudo-code and it does not seem to be working (Kursancew) —Preceding unsigned comment added by 189.95.58.234 (talk) 15:38, 25 November 2008 (UTC)[reply]

I regret that this pseudo-code came out to be as huge as it is. Maybe it will be possible to refine it into something saner than what I could come up at the time. I have a perl implementation of the algorithm that works, but its readability is poor. If anyone wants to have a look at that perl implementation I will be happy to mail or publicize this implementation. Michael Veksler (talk) 09:34, 16 December 2008 (UTC)[reply]

Terminology: augmenting and alternating paths

It would be nice if the definitions of augmenting/alternating paths in Matching#Definition and Hopcroft–Karp algorithm#Alternating paths matched. — Miym (talk) 23:04, 5 March 2009 (UTC)[reply]

It turns out that the Matching article's terminology matches the terminology in Ahuja, Magnanti, and Orlin. So I'll fix up this article to match. —David Eppstein (talk) 23:19, 5 March 2009 (UTC)[reply]

Weighted graphs?

I read some papers that, for a maximum weighted bipartite graph matching, claim they use Hopcroft-Karp instead of the Hungarian algorithm, since it's O(n^2.5) instead of O(n^3). See for example "Tracking Across Multiple Cameras With Disjoint Views" (Javed, Shah, 2003). Are these guys just clueless people trying to make it seem like they're using the new "hip" algorithm that is faster instead of the "old" Hungarian algorithm? Hopcroft-Karp has nothing to do with weights. Or am I missing something? This can be considered a widely referenced paper in Computer Vision. Jotaf (talk) 18:22, 31 August 2010 (UTC)[reply]

If you're missing something, so am I; I don't know of a weighted version of Hopcroft-Karp. —David Eppstein (talk) 19:33, 31 August 2010 (UTC)[reply]

Pseudo-code conflicts with the algorithm

The pseudo-code DFS from vertices that has a Dist of 0, which starts a path, whereas the algorithm states that we should begin from vertices that ends a path. According to Dinic Algorithm, this corresponds to BFS from the sink and DFS from the source. I am not sure if this will make a difference to the time complexity. — Preceding unsigned comment added by Tonyxty (talk • contribs) 02:04, 14 April 2011 (UTC)[reply]

BFS should stop as soon as it reaches NIL

It's been several hours since I implemented this algorithm. This pseudocode is really helpful but...

BFS function does not do what it is supposed to do. As soon as it reaches NIL, it should clear queue Q and return true. Then there's no need to check NIL distance at the end of the function body, false should be returned. Now algorithm proceeds, causing (not asymptotically significant) time loss and maybe strange behavior on undefined Adj[NIL]. 80.188.71.176 (talk) 02:10, 16 January 2012 (UTC)[reply]

"Analysis"

"It can be shown that each phase increases the length of the shortest augmenting path by at least one: the phase finds a maximal set of augmenting paths of the given length, so any remaining augmenting path must be longer."

This is not that simple. In fact, showing that the length of the shortest augmenting path increases is perhaps the most difficult part of the entire proof. The error comes down to the distinction between "maximal" and "maximum." There may be other paths of the same length, but just not ones which are vertex-disjoint from these ones. 68.61.30.23 (talk) 16:48, 21 April 2013 (UTC)[reply]