Jump to content

User:Tim Ocean

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Hi. I've been a wikipedist for many years. I'm a moderator of two Wikipedias. One of them is a little Wikipedia and this makes me naturally interested in such cross-wiki comparison.

Currently Wikis are mainly assessed by the depth defined as

This metrics poses a lot of problems:

  • It doesn't make much sense and lacks interpretation.
  • It produces weird results like English Wiki being tens or even hundreds times "better" than other big wikis, including those in large languages like French and German.
  • It experiences inflation for little wikis.
  • It's easy to manipulate by either number of edits or number of special pages of both and it's apparent that some wikis do exactly this.
  • This metrics makes large and little wikis actually incomparabe.

Second and third term in this equation do exactly the same thing - measure contribution of special pages (discuss pages, user pages, categories etc.). Their multiplication introduces a large dependence on the number of special pages and causes giant inflation for little wikis.

I experimented with various metrics and found that the easiest and best is

So third term is the same, second is removed, and first one is altered.

is a function of the type. It grows slower and slower and has an asymptote at 1.

I tested this metrics on various Wikis (biggest Wikis, Wikis in conlangs + some interesting ones) and here are the results:

Wikipedia Articles Words Pages Words/Articles Depth Depth*
English 7085151 5029491210 64391921 709.86 1336 1336
Cebuano 6115898 1326291436 11230117 216.86 2 209
German 3065898 1691653886 8411434 551.76 93 742
French 2718357 1835692395 13711543 675.29 274 1145
Swedish 2619342 489510386 6339381 186.88 18 232
Dutch 2201228 545554446 4737644 247.84 19 281
Spanish 2071945 1314624852 8522690 634.49 193 1016
Russian 2069963 1158572748 8412457 559.71 166 892
Italian 1942793 1061416495 8440484 546.34 195 889
Polish 1673719 540889052 3948992 323.17 36 394
Ukrainian 1396184 551840693 5004148 395.25 61 603
Vietnamese 1296630 360188264 14609301 277.79 536 535
Portuguese 1159159 623934501 5998524 538.26 206 918
Catalan 783388 407411496 1965859 520.06 42 662
Finnish 606606 180505998 1559044 297.57 37 384
Czech 579646 282045339 1616471 486.58 50 660
Hungarian 562443 243447405 1599186 432.84 60 593
Serbo-Croatian 461167 121817798 4626946 264.15 749 503
Esperanto 377333 92540340 846921 245.25 16 288
Lithuanian 223935 50390882 558880 225.02 30 285
Latin 140669 21762451 290730 154.71 15 169
Ido 59986 7872035 84997 131.23 2 82
VolapĂĽk 45855 3295145 163486 71.86 133 109
Scots 34282 7730427 138167 225.50 59 359
Interlingua 30146 3329674 45762 110.45 4 80
Kotava 29896 2748601 36342 91.94 0 34
Interlingue 13358 4024721 17638 301.30 0 155
Sardinian 7728 2366856 17522 306.27 17 362
Kashubian 5495 443007 8892 80.62 8 65
Lingua Franca Nova 4490 1603991 7185 357.24 2 283
Pennsylvania German 2039 154774 6042 75.91 68 106
Novial 1877 155531 4812 82.86 91 107
Tetum 1380 269417 3952 195.23 61 269
Lojban 1348 464157 5816 344.33 214 559
Gothic 976 99310 3946 101.75 113 162
Dinka 338 90704 1126 268.36 43 397
Cree 14 2272 2342 162.29 483867 341

My findings/remarks are as follows:

  • I scaled the results so that English is the same to make it easier to compare.
  • This metrics doesn't produce such weird results like the previous ones.
  • It's inflation free (look at Cree).
  • It cannot be manipulated by number of edits because doesn't use them.
  • It's very hard to manipulate by the number of special pages (look at Serbo-Croatian and Vietnamese).
  • Cheating with the number of special pages has a natural limit which is words/articles.
  • This metrics makes it possible to compare all wikis - large and little.
  • Maybe it can be manipulated by altered script to count words - admins need to say.
  • This metrics is easy in form and has clear interpretation: length and/or quality of articles times contribution of community and/or quality or articles.

I proposed this metrics in the talk of the depth: https://meta.wikimedia.org/wiki/Talk:Wikipedia_article_depth#Depth%20metrics%20makes%20little%20sense%20-%20part%202.

UPDATE:

Graph showing the depth star statistics against number of articles for a sample set of Wikipedias.

I made this graph for my statistics where I plotted the depth* statistics against number of articles and it seems that the metrics not only doesn't experience inflation for little wikipedias, but also it has this statistics generally lower for them as we would expect from little wikipedias where is few editors to develop the wiki.

Seems there is some logarithmic dependence. Perhaps it also allows to discern wikis that manipulated their number of articles. But I'd like to make additional analysis with all wikis.