Jump to content

Wikipedia talk:Category math feature

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Aerik (talk | contribs) at 03:57, 29 March 2006 (this vs DPL: <-- add some more thoughts, fix spelling). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Discuss, don't vote. WP:VIE. >Radiant< 15:34, 5 February 2006 (UTC)[reply]

Okay, but straw polls are just to gauge opinions (Wikipedia:Straw polls). -- Zondor 15:41, 5 February 2006 (UTC)[reply]
  • Yes, but you need to get some people to express opinions first, otherwise the situation will escalate into a "for-against" segregation on one particular point, instead of working on a compromise (for instance, the "requests for rollback" poll has backfired severely). Note that this feature already exists (it works in WikiNews) but it is disabled on Wikipedia. You may want to ask the devs why; there may be server load issues etc. >Radiant< 15:44, 5 February 2006 (UTC)[reply]

I'd love this feature. However, I also see a need to include the "category and all subcategories" operator: Category:Norwegians/with_sub & Category:Computer scientists. Perhaps even the default should be "with_sub" - the category of Norwegians who haven't been subclassified further is a pretty boring/useless category. A number of subcategory trees would be unneccessary with this feature, but I'm pretty sure some would remain. --Alvestrand 13:32, 10 February 2006 (UTC)[reply]

Straw poll

This is a straw poll only to gauge opinions, not consensus decision making. Sign your vote below.
  • Support. Categories then need not be so specific, awkwardly long and esoteric, making the organisation so much better. -- Zondor 15:04, 5 February 2006 (UTC)[reply]
  • Support. Many of the discussion on categorisation project could be eliminated using this feature. Not so keen on the name, confusing with mathematical topics. --Salix alba (talk) 19:13, 5 February 2006 (UTC)[reply]
  • Conditional Support. I would like to see an assessment of the resources required by this proposal by someone involved in either the coding of Mediawiki or the Wikimedia server management. BigBlueFish 13:21, 6 February 2006 (UTC)[reply]
    • Perhaps the use of lightweight tags rather than categories can avoid stress on the servers. -- Zondor 03:55, 7 February 2006 (UTC) Categories are somewhat rather useless anyway because of its wiki description. -- Zondor 03:59, 7 February 2006 (UTC)[reply]
  • Support, also with reservations about the feature's name. Perhaps something like "Category intersection" or "Category overlap" might be more intuitive, if overly specific or technically inaccurate. Feature seems a good idea though!  David Kernow 18:33, 11 February 2006 (UTC)[reply]
  • Weak support If this idea actually worked as proposed, it would be a wonderful idea. However, as with other voters, I would need to know how it would affect performance etc. before I could put full support behind it. Chairman S. 01:54, 12 February 2006 (UTC)[reply]
  • Questions. How would we transitition into it, what with the resulting broken links?--Urthogie 16:25, 12 February 2006 (UTC)[reply]
  • Conditional Support. I believe two things are necessary. First, expert attention must be given to a user-friendly interface. Second, BigBlueFish is absolutely right about a cost/benefit assessment. PhatJew 09:01, 14 February 2006 (UTC)[reply]
  • Conditional Support. Same perf issues as people have asked about above. My hunch is that for relatively simple operations like "in cat A and cat B", it wouldn't be that big of a performance crunch. Assuming that categories are stored in a sane way, it's certainly no more complex than search. Has anyone asked Brion about this? --Dantheox 21:22, 18 February 2006 (UTC)[reply]
  • Neutral the idea is a very solid one, and would fix a lot of the issues with the current categorization system. However the interface and implementation is really the crucial factor; any change would have to be to a categorization system that is at least as intuitive as the current one, and making this feature the same would be very tricky. As it stands there are only a very few intersections that are truly useful (by nationality or location, for example) and if it takes more than one or two clicks to get to said category intersection from an article then this feature would be unnecessary clutter. Ziggurat 00:49, 20 February 2006 (UTC)[reply]
  • Yes Please even pretty please. Look at how much this would simplify John Lennon, for only one example of where it would be enormously useful. Septentrionalis 01:16, 25 February 2006 (UTC)[reply]
  • Neutral per Ziggurat. If I intersect "Films" and "History" will I get films about history, or the history of films? If we don't want to lose quality of classification we will have to keep more subcategories than we imagine: and we'll have to keep defending them on CfD from people who think that category math has obsoleted them. Plus, if all those cats that John Lennon is in are worth having as cats, then I want them all to be one-click accessible from John Lennon. So what was the advantage, again? —Blotwell 09:05, 28 February 2006 (UTC)[reply]
    • History is ambiguous so it can mean about history or historical. It needs to be defined or use more specific ones like 1970s or even more specific. -- Zondor 04:26, 1 March 2006 (UTC)[reply]
  • Support, per David Kernow. jareha 05:56, 10 March 2006 (UTC)[reply]
  • Support Q0 12:06, 27 March 2006 (UTC)[reply]
  • Support And I'd be willing to write (re-write, actually) the code, but I'm biased

--Aerik 03:49, 29 March 2006 (UTC)[reply]

Handling sub-categories

While in principle this is a good idea, we'd have to think carefully about exactly how it would work....

One issue that occurs to me is who we handle subcategories. Suppose we take two categories X and Y, then we define category Z = X intersect Y. What you'd expect this to do is that Z would hold all pages in both X and Y. Since subcategories are special types of pages, one would then expect that it would hold all categories that are subcategories of both X and Y.

However, that is often not what you'd expect. Suppose we have the following (concocted) categories:

  • Scientists, with subcategories: Physicists, Chemists, etc.
  • European people, with subcategories: French, German, etc.

Now suppose I want to make a European Scientists = European people * Scientists. You'd expect it to have subcategories "European Physicists", "European Chemists", "French Scientists", "German Scientists", etc., etc. However, in actual fact it has none of those; indeed, if Scientists and European people directly have no pages, chances are the intersection would be empty. It would only have pages directly in both categories or subcategories with both categories as immediate parents.

Someone has already suggested some sort of "subtree" operator. Firstly, as any programmer will tell you, dealing with hierarchial relationships in SQL is a big PITA. Secondly, it has the chance to turn a small reasonable sized category into something absolutely massive. Thirdly, it loses the useful internal structure of the categories.

Let me propose a solution:

  • We use for intersection the syntax Category:A/Category:B. Thus, you can intersect any set of categories you choose just by typing in the address bar "/wiki/Category:A/Category:B". You can also link to any category intersection with [[:Category:A/Category:B]]. My reason for choosing the / is so it works fine in URLs.
  • When intersecting categories, we pick up all pages in both categories, plus all subcategories of either subcategory. However, when we pick up the subcategories of A, we intersect those categories with B, and vice versa. i.e. if Category:C is a subcategory of Category:A, then Category:C/Category:B is a subcategory of Category:A/Category:B
  • Say you want to give the name (X) to Category:A/Category:B. You can edit Category:X and add some markup (similar to #REDIRECT</nowiki) to redirect the X category to the A/B category * The system should ignore as invalid any links from a page to an intersection category or a category that redirects to an intersection category * You can mark Category:A/Category:B page with a title (say using syntax like <nowiki>[[Category:A/Category:B|X]] -- thus is not a conflict with anything else since we have already forbidden intersection categories from getting members) which will be used in any lists. So, if Category:C/Category:B gives itself the name P this way, Category:A/Category:B page will output a link named P to Category:C/Category:B page
  • I think intersection is the main requirement, and supporting anything else might make it too complex than needed. But, if you really want other operators, you can extend the syntax like this: Category:A/union/Category:B, Category:A/minus/Category:B, etc.
  • If we abandon the idea of other operators, we could simply Category:A/Category:B to Category:A/B, assuming no "/" are already in category names.... (OS/2?)
  • Another alternative to the / could be :, e.g. Category:A:B. This is less likely to conflict with OS/2 and friends.
  • Also, this could work for as many terms in the intersection as you like, e.g. Category:A/Category:B/Category:C, etc., although for performance reasons we'd want some limit (2? 3?)
  • Implementation detail: we need to define a cannonical form, e.g. split on the / and sort in ASCIIbetical order, before we store in the database. This is so that when we want to find that Category:C/Category:B has name P, we just need to search on one orderring (B/C) and not on two (B/C,C/B). Also, any category intersection not in cannonical form, it should treat as if it is in cannonical form. Thus if you put text in Category:A/Category:B, and then go to Category:B/Category:A, you'll get the exact same page.

anyway, that's my proposal.

Although in a perfect world something like this is a good idea, I fear that:

  1. it would be easy to do in a way which wasn't really that useful
  2. it would be a fair bit of work to do it properly (in a way that was really useful), and its not clear to me that the extra programmer effort and complexity in the code (which then has to be maintained) would be worth the benefits

--SJK 10:58, 25 March 2006 (UTC) (I do PHP programming for a living, although I haven't really touched MediaWiki....)[reply]

this vs DPL

Isn't this similiar to DPLs? Bawolff 01:18, 27 March 2006 (UTC)[reply]

Yes, in fact, it is. Thanks. -- Zondor 04:36, 27 March 2006 (UTC)[reply]
It's worth noting that the end result of DPLs is similar, the implementation and use are different. This is an alternative, imho more powerful, way of viewing all the categorized or tagged information in a wiki - DPLs do somethign very similar to dynamically create a list of fixed parameters. I think this approach implies more general (like "not as specific" not like quantitiy) categories, and finding information where they intersect. The subcategories are the most significant hurdle of this endeaver, again imho. BTW, I wrote the implementation mentioned on meta, and would be happy to re-write it for mediawiki 1.5/1.6. I'm sure it could stand some scrubbing by a more knowledgeable PHP guy than I, but the SQL I'm using is pretty solid and more efficient that what is (or was when I checked) being used for DPLs. --Aerik 03:44, 29 March 2006 (UTC)[reply]