User:RandomP/Ideas for Wikimedia

First-class citizenship for edits

(This might be a bit hard to read, for now):

One thing I'm thinking about in the long term is whether it would make sense for Wikipedia to treat an "edit" (more precise definition below) as more of a first-class object, like an article; it would have a talk page (though it would make sense to have those included in the "article" (again, more precise definition below)); it would have continued independent existence even if it is reverted; there would be a clear difference between deleting edits (striking them from the record, as in the case of copyvios, slander, personal information being revealed, ...), reverting them (making them no longer part of the article's history, but available for future discussion and other articles to use), and undoing them by putting another edit in the history that does exactly or mostly the reverse of what the first edit did.

As far as I'm aware, the current setup on Wikipedia is this:

a "revision" is a blob, which contains the entire wikicode (post-substitution) that is what you see when you look at an article, minus the title you see above "From Wikipedia", which is derived from the URL when you access it.

An edit is just a tuple of the old and the new revision, plus some extra information (timestamp, comments, that sort of thing, blame attribution).

An article is something that has a name (the page title), points to a revision, and has an edit history.

Suggested New Setup, 1

an "edit" is something like a context-diff: it's not strictly limited to applying only between the two revisions it was made for
more importantly, an "edit" (with an exception, see below), is a "page": it contains, in special magic wikicode markup (to be written), the context diff that it's really all about, as well as the edit summary, time stamp, relevant links, and a talk page.
an "article" is just a page containing a list of edits, that can be applied one after the other to get from an empty page to There's no further information in what an article is. All discussion is in other places, and while the WP server is going to cache the text that results from applying all the edits after each other, it is in theory unnecessary to do so.
the exception is that giving an edit a page is also an edit, which would have to get its page, which would constitute another edit ... you see the problem. My suggested solution is that an edit that is nothing but the creation of a new page made automatically by wikipedia doesn't get its own page until people actually try looking at it. Essentially, we're lazy about it, only create the page when people go looking for it (essentially, when someone hits "history" but is already on a history page, and iterates that): in a way, it gets a special page, not a real page in the namespace. There's no reason for it to get one, because we can trust wikipedia, and all the meta-information that "should" be on an edit's page's first edit's page can be derived from the edit's page's first edit, which can be derived from the edit's page itself. I'll refer to such edits as lazy edits.

So, how would editing a page work? The interface change would be minimal, or entirely nonexistent: you hit "edit this page", change the text, send it off to the server.

The server then generates a context diff out of your edit (kind of, but not totally, like what "Show changes" gives you now); it creates a new page on Wikipedia containing that context diff, your edit summary, your signature, a timestamp, possibly some other metainformation for the popups plugin.

That part always works. Until here, there's no need to even check for an edit conflict, in fact, so edits will no longer get lost if there is a conflict.

Next, the server checks whether the context diff still applies perfectly if it were to be put as the latest edit on the article's list of edits. If so, yay, there's no edit conflict, and we can go ahead (if not, edit conflicts can be dealt with properly, as the edit itself has already been created; the old code could still be simulated, though. We'll continue dealing with the "yay" case).

So the server now adds a reference to the edit page it just created to the list of references to edits that comprise an article's "history" (and only!) page, from which the viewable page is generated by the server when it is requested, and probably cached.

Note that, technically, many more database changes happen:

a blob is created for the context diff + metadata
that blob is, with a lazy edit, given a referencable spot in the database
a blob is created containing a context diff adding the reference created in the previous step to an article's "history" page
that blob is added, with a lazy edit, to the article's history page's history page, which is a special page (at least until someone wants it not to be)

Note that this procedure clearly consists of two parts: an edit is created, and then it is included in an article's history: There's no reason for those two steps to happen at once.

It makes perfect sense, for example, to create a private draft version for an article without making it the official article: you create the edits, and string them together on some private page of yours; then, when you're happy with it, you just stick the edits in the official article's history. Which is another two-stage edit.

Advantages? Edits get to be first-class citizens. Commuting "edits" (you added a paragraph, but this conflicted with someone fixing a minor typo; you want to have a new edit that applies cleanly after the type is fixed) would become real editorial activity, rather than something done ad-hoc. Essentially, you'd be adding a second context diff to an edit's page, and this one applies perfectly; however, this approach has limits, and there'd be one or two arguments about when two diffs should be on the same page, and when they should be on separate pages.

Article histories would be much more useful, because edit wars wouldn't feature in them: edit/revert pairs could be removed (though having a "chronological" article history that includes all edits would still be easy to do).

Disadvantages: people would start editing article history, even when that has no effect on what people view when they look at wikipedia without using the history tab. I don't think that's a huge problem, but some would view it as taking time away from editing Wikipedia

Wikipedia would be slightly more complicated: there'd be a chronological article history, and a logical article history that doesn't go around in circles. Orphaned edits would clutter the database, even when no one cares about them anymore (same now, though).

It's not quite as nice as proposal 2.

The main advantage is that I am absolutely confident I could implement this proposal in reasonable time: while a lot of itty-bitty editing of the server code would be required, it's not like there are any difficult design decisions to be made, other than choosing a good context-diff program.

Suggested new Setup, 2

This is what I actually like more, but unlike the first version, it actually does have problems (though, I think, solvable ones).

The main difference is that while the previous proposal continues to think of what you see in WP articles as a large blob of wikicode, it actually does have a fairly standard hierarchical structure: headings, subheadings, and paragraphs divide it further.

So the idea would be to actually reflect that hierarchical structure in the database/the edits: in effect, an article would correspond to a page full of {{subst}}'s, and it is this page which would be generated from the history (and cached). Same for the talk page.

The main problem is that people are simply not used to editing that way, and it's hard to do with the web interface: while we could just let people edit the blob of wikicode, and then go back and try to find out what they "really" did (this paragraph got moved, that section renamed, and here's a typo fix), that's not a computationally trivial problem. Sure, you could go and split up people's edits into bits that our software could understand, but the risk with spending a lot of time refactoring/editing edits is obvious: before you know it, you're editing edits of edits, and the articles themselves are beginning to seem immaterial ;-)

The possibility of automated paragraph splitting are nice, though: Instead of having an "economy" section in every country article and an "economy of X" article, we could just include the lead paragraph of the economy of X article in the country article. However, we could do so now, without ever getting into situations where a user is uncertain which articles will be influenced by their edits.

Still, my overall impression is it'd be worth it: even with another level required (edit sets?) because edits would be split up to consist of minimal changes (fix one typo, move one paragraph, include one new paragraph, that sort of thing), and many edit sets would involve many history pages (17 typos in 17 sections could form one edit set, which people might want to revert or discuss as a whole)

There are details that give me a slight headache (how would page protection work? copy-on-write for sections. How do you write a diff program that still recognizes a moved paragraph _and_ a typo fix?), the implementation isn't really the main problem. I just think that proposal 2 would change the "ideal" editing process enough to change the way wikipedia works: instead of hitting "edit this page" and rewriting a bad article or paragraph, people would be encouraged (by clean edit histories) to create a new paragraph that's orphaned, first, and then replace the old paragraph when they're done.

I'm not at all sure that would be a bad thing. Entering a term in the search box and hitting "Go" would still take you to a linear text, and it should be fairly easy to convince people that that linear text, not orphaned paragraphs that you created ages ago but failed to include ....