Jump to content

Wikipedia:PHP script tech talk

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Magnus Manske (talk | contribs) at 06:04, 5 February 2002 (caching proposal). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This is the place to discuss bug fixes and planned feature on a more "technical" level. Please, do not just add bugs and feature requests here at random.

Serious bugs

Things that should be repaired ASAP.

Pages with "wiki.phtml" subpages

  • Anone has an idea why this is? I couldn't reproduce it with my local copy. --Magnus Manske
I haven't seen any of these lately. I *think* it was fixed by using an absolute path for $THESCRIPT. --Brion VIBBER

#REDIRECTs that end in an eternal edit conflict

I'm probably stating the obvious here, but this appears to be caused by the page trying to redirect to a different page than has been specified. If this page doesn't exist (as is usually the case) then the page goes into edit mode when you click Save, which gives an edit conflict. If the page does exist, then no edit conflict occurs, but the redirect does not go to the expected place (as for Zundark/Old_Talk, which is redirected to user:Zundark but actually ends up at user:Zundark/Old_Talk). --Zundark, 2002 Feb 3
(2002/02/03 20:43 PST) Fix is in CVS for the problem when the redirected page does exist. (s/$this->$subPageTitle/$this->subPageTitle/) Doesn't seem to have fixed the doesn't-exist problem, I'll look at it some more. --Brion VIBBER

130.94.122.xxx bug

  • This is serious, because of its potential for masking vandalism.
Possible fix submitted; I suspect that there's some kind of proxying going on at the server end. But I could be totally wrong. --Brion VIBBER
We'll see, it is in the mail now... --Magnus Manske

Change password does not work

For more details, see Wikipedia:Bug Reports. Until this is fixed any users who change their password cannot log in (like me). --User:Chuck Smith

Strange. It works fine on my local copy. Tried to log in with your old password? --Magnus Manske

Volunteers wanted

These tasks need volunteers to hack'em!

Mask minor edits on Recent Changes

Might be fixed by a patch from Brion VIBBER and myself --Magnus Manske

Fix the Recent Changes "(# changes)" counter

Might be fixed by a patch from Brion VIBBER and myself --Magnus Manske

  • I just looked in cvs (thats 2002/2/4 00:02 Amsterdam time) and it seems you still add a variable $addoriginal to the count. But I think that is silly because you should never count the current page if you are counting the changes. So just remove $addoriginal and the problem is solved. -- Jan Hidders (PS. Wouldn't it be nice if the sign-shortcut ~ ~ ~ would always be replaced with name and time? :-))
Hey, that's a good idea! Especially for the bug-report pages... Brion VIBBER (2002/2/4 15:18 PST)
Yes, unfortunately it's the only thing that made sense in my remark. :-/ What I should have said was the following. The variable $addoriginal should be 0 if the page did not already exist the previous day and the current page is a minor edit. -- Jan Hidders (2001/2/5 8:45 GMT+1)

Fixing some parser bugs

Especially the <pre> tags.
I've replaced removeHTMLtags() with behavior more like the old usemod version; ie instead of forbidding a few tags, it allows only a small number. Thus, no <span>, <object> etc. However it still needs to be able to strip out unknown elements/parameters; I can still write naughty things like this. (2002/02/04 20:48 PST) --Brion VIBBER
Also, I commented out the line in subParseContents that makes &amp; followed by text that could be an entity into the entity. I suspect it was put in to fix pages that were getting over-escaped during editing, but that bug seems to be gone now and it just makes it hard to write the name of an entity. Ie, "&amp;" should *not* appear as just an ampersand, but an ampersand followed by "amp;". --BV

Brainstorming

Ideas for solutions needed here.

Speeding up the PHP script

  • Caching of pages for reading only (Jimbo's idea).
    • Could be tricky. Would have to adapt to viewing preferences and newly created pages (red/blue links).
It may be possible to cache a 'common' almost-final version, which can then have a regexp run over it to set the link color and paragraph justification, and then inserted into the header/footer; this would at least save parsing the wiki page every time. Still need to deal with new pages though... Simplest way might be to run the "which pages link to this" check on a newly created page and expire the cached versions of anything that does. --Brion VIBBER
A lot of the regexp work could be skipped by using a quick preprocessor (ideally one that slaps in the header/footer without even looking at the text) and/or CSS. --Uriyan
Yup. CSS won't handle the difference between red links and [classic links]? for new pages, though. I recommend we change or eliminate one or the other. --Brion VIBBER

I take that back, CSS should do fine there. How does this sound:

   This is a <span class="newlinkedge">[</span><a href="foo" class="newlink">new
   link</a><span class="newlinkedge">]<a href="foo">?</a></span>.

where we define either:

   a.newlink { color: red; }
   .newlinkedge { display: none; }

or

   a.newlink { color: black; text-decoration: none; }
   .newlinkedge { }

in the style sheet? The text portion will still be clickable in the old-style case, though that could probably be "fixed" if desired. --Brion VIBBER

(2002/02/03 15:05 PST) I've changed the CVS version to use style sheets for the link colors, paragraph justification, and text/background color. (Try it at my test server, if you can find your way around the partially Esperanto-localized interface.) Keeps down the number of things that need to be changed if somebody wants to change the styles further, and should make the HTML-ized page guts cacheable.

Caching proposal

  1. Create a new field in the cur table named cur_cache (MEDIUMTEXT), empty by default
  2. When a page is viewed,
    1. and the cache has been used X times, it is cleared (enforced up-to-date)
    2. and the cache entry contains text, the cache is adapted to current user settings and displayed
    3. and the cache is empty, the text is rendered, displayed and stored in the cache field
  3. When a new page is created, the cache of all pages that link to the new page is cleared

--Magnus Manske


  • Browser-specific page layout
I notice though, that the tables in the page layout are setting their border properties based on whether the user agent is Internet Explorer. This explains why the tables have thin black borders in Internet Explorer and no borders in Mozilla... Magnus, is there any reason for this? I'd prefer to replace the table with some CSS markup in any case. --Brion VIBBER
The reason is that I like the thin black lines in IE, but other browsers don't support that, they draw all lines black, which looks ugly (try it!). If you know how to change it, go right ahead :) --Magnus Manske
Done, checked in. Looks ever so slightly different in IE and Mozilla, but approximately the same as the previous behavior in IE. Also looks okay in Konqueror 2.2.1, ugly but visible in Opera 6 (some beta version I have), but still doesn't show in Netscape 4.x. Brion VIBBER 2002/02/04 15:21 PST
  • Optimize slow code parts (where? why?)
    • Do we actually know what the slow parts are? My gut feeling is that the Recent Changes page is the slowest, but it would be nice if we could do some actual measurements on a server that is serving only one client but has a large database. (Does somebody have a big SQL dump?) Anyway, presuming that it is, I looked at the code and I think it can be made much much more optimal by combining the two SQL queries into one. Right now it computes a JOIN and a GROUP BY in PHP which can be done far more efficiently by the database. However, it should then be possible to do a GROUP BY on the day, which is now hidden in the time stamp. So we would split this column into a day column and a time column. Do I have your permission to attempt this? (But first I would like to know if it is worth it, i.e., if the Recent Changes page plays a major part in the slowing down. I thought Magnus said something about a memory leak in Apache, so perhaps we should try to find that first.) -- Jan Hidders
      • My (old) suggestion for Recent Changes optimization was to make it a separate table containing the last 5000+ changes. (The table would store only the RC-related data, not the actual page contents. It could be simply added to by the edit function, and trimmed in daily/weekly maintenance.) This table would eliminate the need for each RC page to search the *entire* DB looking for the most-recently-updated pages. --Clifford Adams
        • It doesn't, it uses the indexes to do that. That's the whole point of using a database; they are usually very clever at these things. :-) I would like to add the remark that letting the database do the joins for you does often lead to a performance improvent of orders of magnitude. If I'm right this could be a major boost. -- Jan Hidders (2001/2/5 8:48 GMT+1)
  • Eliminate the access count function ("This page has been accessed 6 times"). I have a feeling that the huge number of writes to the database may be killing performance. A much less way to do this would be to process the Apache log files daily or hourly with a separate script. --Clifford Adams