Wikipedia:Articles for deletion/C10k problem
- The following discussion is an archived debate of the proposed deletion of the article below. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.
The result was keep. No one but the nominator recommends deletion. (Non-admin closure by Intelligentsium 00:30, 4 January 2010 (UTC))[reply]
- C10k problem (edit | talk | history | protect | delete | links | watch | logs | views) – (View log • AfD statistics)
- (Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL)
In my attempts to find information on this topic, every page I found that mentioned "C10K problem" either used the term as a given without justifying it, or referred to the Kegel page referenced in this page, which implies that such a limit exists, without substantiating the implication, and then deal entirely with ways to increase the amount of traffic a web server can handle without any of that text relying on a 10K limit in particular. I don't see that this is a notable topic because it seems to be one person's name for an unsubstantiated phenomenon, and I don't find any evidence that that 10K limit exists. So, possible WP:N and possible WP:V. —Largo Plazo (talk) 17:47, 13 December 2009 (UTC)[reply]
- STRONG KEEP - C10K Problem is very real. Simoncpu (talk) 13:06, 15 December 2009 (UTC)[reply]
- Do you have evidence for this? Since the problem I cited is that I couldn't find real evidence of it, and none is given in the article, we need more than a repetition of the belief that it exists to help us out. —Largo Plazo (talk) 13:32, 15 December 2009 (UTC)[reply]
- Kegel's thesis is the evidence of the problem that we have been experiencing for quite some time now. The C10K label caught on because that label makes sense. Many software developers need not much convincing of its existence because it's pretty obvious. My first impression that someone was questioning its WP:N was "duh." WP:V is possible though. Here are some of the links that reference the C10K Problem. Simoncpu (talk) 07:34, 17 December 2009 (UTC)[reply]
- http://portal.acm.org/citation.cfm?id=1534139
- DOI.org
- http://www.wandisco.com/pdf/dcone-whitepaper.pdf
- http://www.lanl.gov/radiant/pubs/hptcp/tcpwindow.ps
- http://www.eecs.umich.edu/~kaushikv/papers/sirocco_febid.pdf
- http://cs.uni-salzburg.at/~ck/teaching/CS-Seminar-Summer-2004/claudiu-survey.pdf
- http://www.eecs.harvard.edu/~mdw/papers/seda-hotos01.pdf
- The first two links above describe something that was given the name "reverse C10k problem" based on the existence of the alleged C10k problem. Every other one of these mentions it and refers the reader, every time, to Kegel's page, http://www.kegel.com/c10k.html. And on that page, all he says is, "It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now." He, like all the other sources, makes a remark that presupposes there actually is such a limit as the C10k limit, while not providing us with any evidence that anyone has ever actually found there to be such a limit of approximately 10,000 connections. I'm not saying I disbelieve it, but I am saying that so far the "evidence" that has been provided doesn't lift it above the level of "a common conception", doesn't distinguish it from any urban legend or old wive's tale. —Largo Plazo (talk) 08:19, 17 December 2009 (UTC)[reply]
- Kegel's thesis is the evidence of the problem that we have been experiencing for quite some time now. The C10K label caught on because that label makes sense. Many software developers need not much convincing of its existence because it's pretty obvious. My first impression that someone was questioning its WP:N was "duh." WP:V is possible though. Here are some of the links that reference the C10K Problem. Simoncpu (talk) 07:34, 17 December 2009 (UTC)[reply]
- Do you have evidence for this? Since the problem I cited is that I couldn't find real evidence of it, and none is given in the article, we need more than a repetition of the belief that it exists to help us out. —Largo Plazo (talk) 13:32, 15 December 2009 (UTC)[reply]
- Note: This debate has been included in the list of Technology-related deletion discussions. -- —Largo Plazo (talk) 17:50, 13 December 2009 (UTC)[reply]
- Note: This debate has been included in the list of Internet-related deletion discussions. -- —Largo Plazo (talk) 17:51, 13 December 2009 (UTC)[reply]
- I think the problem is not really the 10K connections. The problem is that if the server is not programmed with this "aim", then the server has capable of serving few connections (more than hardware supports). Servers that attach C10k problem, really attach the problem that programming it for that the limitation was the hardware of the server and not the software. And 10K connections is a reasonable limit for such aim. But it's true that there is only one reference (in essence). Thanks. —Preceding unsigned comment added by Xan2 (talk • contribs) 19:18, 15 December 2009 (UTC)[reply]
- Relisted to generate a more thorough discussion so consensus may be reached.
Please add new comments below this notice. Thanks, Tim Song (talk) 02:12, 20 December 2009 (UTC)[reply]
- Weak Keep If there are published academic papers referring to a variant of it (the reverse C10K)m, then it seems reasonable that the problem that is the basis for that name must be very well known in its field. I think the evidence shows some degree of notability. Question, though: is Kegel notable enough for an article--if so, there's a possible merge .
- Hoaxes, legends, and misconceptions can be very well known. I think the issue here is whether it's real. If a topic in information technology were notable, it's very hard to imagine that it would be so difficult to find anything substantiating it on the Web; and if no one can provide any resource (unlike the ones given above) that substantiates it rather than presupposing it, it's tough to see what an article on it could be about. —Largo Plazo (talk) 03:22, 21 December 2009 (UTC)[reply]
- Relisted to generate a more thorough discussion so consensus may be reached.
Please add new comments below this notice. Thanks, Arbitrarily0 (talk) 16:29, 27 December 2009 (UTC)[reply]
- Strong Keep To clarify, the "limit" is certainly not any kind of hard limit; as noted above there won't be any citations found to support this limit. It is simply a recognition that certain kinds of very popular server architectures can't get much beyond a few thousand simultaneous connections. See yaws (web server) - different threading systems are one way to get beyond this limit, lighttpd uses another, select/poll/epoll. I'm confused by this AFD proposal. Can someone enlighten me about the WP:V complaint? Is there some question that web servers like apache run into these limits? WP:N is just crazy talk - see the papers linked above, see the lighttpd article - it was created in response to c10k! ErikHaugen (talk) 21:00, 28 December 2009 (UTC)[reply]
- "...there won't be any citations found to support this limit". You're saying strongly that we should have an unsourced article about something that doesn't exist because something similar to it does exist. I'm quite certain that somebody is capable of writing an article about the phenomenon that does exist, under a title that does not imply the existence of the phenomenon that doesn't exist, with reliable sources that document the real phenomenon, rather than a link to a document that doesn't in any way support the fake phenomenon other than to claim (falsely) that it exists. —Largo Plazo (talk) 23:22, 28 December 2009 (UTC)[reply]
- It isn't a fake phenomenon. http://www.sics.se/~joe/apachevsyaws.html See the little blue and green squiggles in the left side of this graph? This is real. C10k is a name used to describe it. It's overly precise and won't always be accurate, but it is nonetheless a name used for it. ErikHaugen (talk) 00:40, 30 December 2009 (UTC)[reply]
- What is it with people providing references that don't even mention, let alone, support, the thing being claimed? There is no mention on this page of a C10k phemonenon. There is no mention of the number 10,000. All this page shows is that Apache seems to be limited to this number of simultaneous connections, while Yaws seems to be able achieve that number of simultaneous connections. In fact, the number achieved by Yaws is 80,000, which, to the best of my knowledge is much more than 10,000, so while somehow you have the impression that this page in some way supports the existence of a phenomenon known as the C10k limit, it actually seems to be a demonstration I might present it to show that there is no such limit. —Largo Plazo (talk) 02:56, 30 December 2009 (UTC)[reply]
- The link shows the difference between Apache, which like "most web servers" mentioned in the wp article uses a separate o/s thread/process for each connection, and exotic web servers like yaws or lighttpd, which don't - they have "solved" the c10k problem. The graph shows the enormous limitation that web servers constrained by the c10k problem suffer; they are only able to use a fraction of the available hardware. It's quite a dramatic experiment, and clearly demonstrates the failure of the conventional model to handle lots of connections. ErikHaugen (talk) 18:32, 30 December 2009 (UTC)[reply]
- What is it with people providing references that don't even mention, let alone, support, the thing being claimed? There is no mention on this page of a C10k phemonenon. There is no mention of the number 10,000. All this page shows is that Apache seems to be limited to this number of simultaneous connections, while Yaws seems to be able achieve that number of simultaneous connections. In fact, the number achieved by Yaws is 80,000, which, to the best of my knowledge is much more than 10,000, so while somehow you have the impression that this page in some way supports the existence of a phenomenon known as the C10k limit, it actually seems to be a demonstration I might present it to show that there is no such limit. —Largo Plazo (talk) 02:56, 30 December 2009 (UTC)[reply]
- It isn't a fake phenomenon. http://www.sics.se/~joe/apachevsyaws.html See the little blue and green squiggles in the left side of this graph? This is real. C10k is a name used to describe it. It's overly precise and won't always be accurate, but it is nonetheless a name used for it. ErikHaugen (talk) 00:40, 30 December 2009 (UTC)[reply]
- Regarding the last part of your question, you answered WP:V yourself: you said "the 'limit' is not any kind of hard limit ... there won't be any citations to support such limit". This "limit" isn't verifiable because it doesn't exist. End of story: the article doesn't satisfy Wikipedia's criterion that articles must be about verifiable topics; if it's false then it certainly isn't verifiable. That there is some limit isn't remarkable: at every single point in computing history, there has been a limit on processor speed, a limit on storage speed and capacity, a limit on network bandwidth and number of connections. The whole point of the alleged C10K problem is that some special, perplexing limit of around 10,000 connections—near enough to warrant the name—has been reached and has been a special source of frustration. And yet, here you are, telling me that it's a fiction, and in response to my concern that there doesn't seem to be any corroboration supporting it, you confirm that there isn't any such corroboration—and yet you feel strongly that this article on this unverifiable, nay, false topic should be kept. I don't get it. And let's not even get into notability, unless someone wants to rewrite the entire article to be about a notable gross misconception prevailing in some quarters that some limit called "C10k" exists. And even then, to avoid being an original research piece, that article would have to provide citations to resources that address this misconception (widely accepted hoax?), rather than making the first-person observation that there are a lot of references to C10k out there. —Largo Plazo (talk) 04:34, 29 December 2009 (UTC)[reply]
- Thread/process based servers that many were/are using, such as Apache, hit a limit of a few thousand simultaneous connections, thus underutilizing the hardware for non-cpu/etc bound services. What is hard to understand about this? This isn't controversial, is it? In a few years, better hardware might mean the limit is typically 20-30k instead of a few thousand, sure - but the problem is still referred to as c10k, at least for now. The shortcomings of this architecture are surely interesting enough for an article. What else would you call it? Should we rename the article? Is there another we could merge it with? ErikHaugen (talk) 00:40, 30 December 2009 (UTC)[reply]
- Nothing is hard to understand about your first sentence. Your first sentence, however, is not what this article says. Your first sentence has nothing to do with whether there's a specific phenomenon, as claimed in the article, called the C10k phenomenon, that supposedly has something to do, specifically, with the number 10,000? If you'd like to write an article about connection limits on various platforms and their progression over the years, and can provide sources that support what you write about them, please feel free. —Largo Plazo (talk) 03:01, 30 December 2009 (UTC)[reply]
- I've tried to say this elsewhere - I think it might be helpful to stop getting hung up on the number 10,000. The problem is called the c10k problem, and perhaps that is a lousy name since it will probably be 50k in a few years, but that is a name for the problem, and the name is quite popular. Not much we can do about that here. ErikHaugen (talk) 18:32, 30 December 2009 (UTC)[reply]
- Let's go back and look at the very first sentence in the article: "The C10k problem [1] is the name given to a limitation that most web servers currently have which limits the web servers capabilities to only handle about ten thousand simultaneous connections." I didn't make up that this problem is supposed to have something to do with ten thousand connections: it's the salient point of the lead sentence of the article.
- Are any references given in the article that support this? No.
- Is your point that the article needs more references, or are you unconvinced that the phenomenon is real? ErikHaugen (talk) 18:32, 30 December 2009 (UTC)[reply]
- My point is that all the references provided by people determined to document the problem turn out to be non-references, which convinces me even more that there are no actual references, and the whole concept of the C10k problem as defined in the article (and also as implied by its name) is not verifiable. —Largo Plazo (talk) 20:39, 30 December 2009 (UTC)[reply]
- Do any of the references provided by anybody in this discussion corroborate this? No.
- I provided experimental results demonstrating that conventional server architectures can't get past a few thousand connections. See Simoncpu's links above - other academics cite Kegel's article in discussion about how thread/process based architectures don't scale as well as event-driven architectures. Is the fact that this phenomenon is discussed in academic papers good enough? ErikHaugen (talk) 18:32, 30 December 2009 (UTC)[reply]
- As I already said, at any given technological point in history, any medium or device has various capacity limits. I'm sure that 30 years ago there were limits to the number of connections a system could support, and 20 years ago there were limits as well, albeit larger than the ones 10 years earlier, and there are such limits now. I have yet to see where, in this natural progression, C10k suddenly comes into it, and why 10,000 connections calls for special note. You've said elsewhere that I'm "hung up" on the number 10,000. Well, 10k: there it is, right in front of me, in its name, and also in the article's lead section. This deletion discussion is not about deleting the concept of C10k, and it isn't about deleting the concepts of connection limitations or processing inefficiencies or model improvements. It's about deleting this article, which, as it stands, is about something that doesn't exist, and identifying it by a name that you claim doesn't really mean what it says--except that the article claims it does mean what it says. —Largo Plazo (talk) 20:39, 30 December 2009 (UTC)[reply]
- C10K is a term people use commonly to describe the problem. You're right, it is not a very precise/accurate/timeless name (we're going in circles here, I think, about this), but it is the name people are using. Many links have been provided here to support the contention that the name is in widespread use, throughout academia and industry. You seem to agree that an article about the phenomenon itself is potentially reasonable to include on Wikipedia? So... sounds like we can either rename the article or merge/redirect it with another one that describes the problem (is there one?), or you can modify the article to clarify that the 10k limit is not a magical hard limit. Do any of these sound good? ErikHaugen (talk) 23:09, 30 December 2009 (UTC)[reply]
- As I already said, at any given technological point in history, any medium or device has various capacity limits. I'm sure that 30 years ago there were limits to the number of connections a system could support, and 20 years ago there were limits as well, albeit larger than the ones 10 years earlier, and there are such limits now. I have yet to see where, in this natural progression, C10k suddenly comes into it, and why 10,000 connections calls for special note. You've said elsewhere that I'm "hung up" on the number 10,000. Well, 10k: there it is, right in front of me, in its name, and also in the article's lead section. This deletion discussion is not about deleting the concept of C10k, and it isn't about deleting the concepts of connection limitations or processing inefficiencies or model improvements. It's about deleting this article, which, as it stands, is about something that doesn't exist, and identifying it by a name that you claim doesn't really mean what it says--except that the article claims it does mean what it says. —Largo Plazo (talk) 20:39, 30 December 2009 (UTC)[reply]
- Do any of the references provided in this discussion state in any kind of authoritative way what the C10k problem is, as opposed to taking its existence for granted? No.
- Irrelevant. Maybe it would help to think of c10k as a cultural thing. Clearly it exists, because people are talking about it; referenced from papers, etc - do a google search. The phrase "C10K" is not obscure, and needs an article on its own even if the concept is bogus. It isn't bogus, of course, but that's just gravy as far as whether or not this article should be deleted. In any case, to answer your question, the Kegel article explains what the problem is. Nobody else needs to, because it's explained there. Look at this page: http://www.javafaq.nu/java-article572.html - this is a description of a library that uses c10k in the description and defines it. See how this is being used? Clearly we need this wp article! ErikHaugen (talk) 18:32, 30 December 2009 (UTC)[reply]
- We don't carry articles in Wikipedia about things about whose existence some people are convinced exists. We carry articles about things whose existence is supported by reliable documentation. Was there a Year 2000 problem about which Wikipedia should have an article? Yes, there are thousands of sources that explain in detail what the Year 2000 problem is, that indicate what it actually has to do with the year 2000, that indicate in detail what the specific problems were that needed to be addressed, that indicate what kinds of measures were taken to address it, that indicate what kind of failures resulted from inadequate measures taken to address it. In response to the same kind of question about the C10k problem, we are shown sources that imply that it exists or in which the writer is under the understanding that exists, but nothing that shows that it exists. We are shown sources that show that different platforms are limited to different numbers of connections, the general issue of connection limits as opposed to a very specific phenomenon that this article purports to be about. —Largo Plazo (talk) 03:15, 30 December 2009 (UTC)[reply]
- Nothing is hard to understand about your first sentence. Your first sentence, however, is not what this article says. Your first sentence has nothing to do with whether there's a specific phenomenon, as claimed in the article, called the C10k phenomenon, that supposedly has something to do, specifically, with the number 10,000? If you'd like to write an article about connection limits on various platforms and their progression over the years, and can provide sources that support what you write about them, please feel free. —Largo Plazo (talk) 03:01, 30 December 2009 (UTC)[reply]
- Thread/process based servers that many were/are using, such as Apache, hit a limit of a few thousand simultaneous connections, thus underutilizing the hardware for non-cpu/etc bound services. What is hard to understand about this? This isn't controversial, is it? In a few years, better hardware might mean the limit is typically 20-30k instead of a few thousand, sure - but the problem is still referred to as c10k, at least for now. The shortcomings of this architecture are surely interesting enough for an article. What else would you call it? Should we rename the article? Is there another we could merge it with? ErikHaugen (talk) 00:40, 30 December 2009 (UTC)[reply]
- For create a common consense. In my opinion, Largoplazo is right in that the article has _not_ realiable sources. We only have the lighttpd and nginx paragraphs and it's not enougth for any enciclopedia. So we must search sources of this problem. For the other hand, the 10K limit is trivial. It suggests that web servers not benefit from all hardware power of servers. It seems that if the programmers don't program web server for doing it, the web server serves most few pages than it could be serve for hardware limitations. So 10K is not a hard limit. It's a soft limit. A limit for remind this situation. I propose that we create another article: "Software limitations of web servers" or something else and redirect C10k to this page and note in software limitations.... that C10k is one term used to refer to this situation, but as a limit, is soft. What do you think?--Xan2 (talk) 16:56, 29 December 2009 (UTC)[reply]
- "A limit for remind this situation." - well put. wrt your suggestion, sounds good, go for it! Please leave c10k intact until you create the other article, though, so people can find it when they want to know what c10k means. ErikHaugen (talk) 00:40, 30 December 2009 (UTC)[reply]
- Perhaps c10k can redirect here: http://en.wikipedia.org/wiki/Web_server#Load_limits but this section would need to to be fleshed out quite a bit first, it doesn't even touch on c10k (ie, it should be fleshed out anyway!). ErikHaugen (talk) 18:48, 30 December 2009 (UTC)[reply]
- "...there won't be any citations found to support this limit". You're saying strongly that we should have an unsourced article about something that doesn't exist because something similar to it does exist. I'm quite certain that somebody is capable of writing an article about the phenomenon that does exist, under a title that does not imply the existence of the phenomenon that doesn't exist, with reliable sources that document the real phenomenon, rather than a link to a document that doesn't in any way support the fake phenomenon other than to claim (falsely) that it exists. —Largo Plazo (talk) 23:22, 28 December 2009 (UTC)[reply]
- Comment Largoplazo - you said above that "Hoaxes, legends, and misconceptions can be very well known. I think the issue here is whether it's real." I don't think this is the issue, actually. The issue is whether the page should be deleted. We have many pages devoted to hoaxes - piltdown man, etc. I don't think there's any question that we want a page for a particular hoax if it is notable enough. These WP:V discussions belong on the article's talk page - should the c10k article say that c10k is a hoax? That's not the debate here. (c10k is, of course, not a hoax, I'm just saying that whether it is or not is not germane to this AFD.) ErikHaugen (talk) 18:43, 30 December 2009 (UTC)[reply]
- Either edit http://en.wikipedia.org/wiki/Web_server#Load_limits to cover c10k (it doesn't ATM), or keep the page with some 'unstubbing' (it doesn't look as a correct, developed article now); everybody on this AfD agrees the conception exists, even if hard to verify clearly (UFOs are way more unverifiable, but they still DO exist UFO, even if only as an urban legend or the name of objects that can't be identified properly), there are many benchmarks supporting its existence (see the sources provided above); the point is (in my opinion), that Largo Plazo doesn't understand that while the name 'y2k problem' is less commonly used (albeit still used) to cover many similar problems (see http://en.wikipedia.org/wiki/Y2k#Date_bugs_similar_to_Y2K refer ), the name c10k refer to a limit that's only CALLED c10k (and it's not 'soft' in the sense of being easily changeable by upgrading the hardware or software - it's 'soft' because it differs between various specific H/S configurations); the article could be as well called 'Simultaneous Web Server Connection & Load Limits Due To The Underlying Software And Hardware Problems And Limitations' - but IMO that would be redundant, as this particular one is already known in the jargon as the 'c10k problem'... —Preceding unsigned comment added by Vaxquis (talk • contribs) 21:18, 2 January 2010 (UTC)[reply]
- There must be thousands and thousands of pages about UFOs on the Web, such that the Wikipedia on UFOs could reference them, and one could look to them to corroborate and expand on what's written in the article. I am not talking about pages where the someone uses the word UFO in passing, or says, "Here's what we can do about UFOs" without giving any background in what a UFO is or is even supposed to be. I'm talking about pages about UFOs. Not one person in this discussion has managed, despite all their insistence, to provide one comparable page about the alleged C10k problem. Why is that? —Largo Plazo (talk) 00:05, 3 January 2010 (UTC)[reply]
- Is your point that C10K might be a bogus concept? Why are you asking for this? Anyway, to answer it, several have been mentioned above. What is wrong with these: http://www.javafaq.nu/java-article572.html http://www.kegel.com/c10k.html Help me understand what is missing and why it matters. ErikHaugen (talk) 06:56, 3 January 2010 (UTC)[reply]
- There must be thousands and thousands of pages about UFOs on the Web, such that the Wikipedia on UFOs could reference them, and one could look to them to corroborate and expand on what's written in the article. I am not talking about pages where the someone uses the word UFO in passing, or says, "Here's what we can do about UFOs" without giving any background in what a UFO is or is even supposed to be. I'm talking about pages about UFOs. Not one person in this discussion has managed, despite all their insistence, to provide one comparable page about the alleged C10k problem. Why is that? —Largo Plazo (talk) 00:05, 3 January 2010 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made on the appropriate discussion page (such as the article's talk page or in a deletion review). No further edits should be made to this page.