Jump to content

Wikipedia:Parser bug reports

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Pgdudda (talk | contribs) at 16:39, 15 May 2002 (redirect boken?). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Several asterisks in a row will prevent linewraps (or increase the linewrap length considerably?) Koyaanis Qatsi, Monday, April 8, 2002

See the history of Talk:Terrists for an example. I doubt this is a common issue, though, since most of us use four dashes.  :-)

This is not a bug, so I'm going to move this to the "fixed" site. Asterisks at the start of a line are used by the wiki software to make bullet lists, which can be nested. A row of, say, 20 asterisks is asking the software to make 20 nested bullet lists, and it does so correctly. It fails to wrap lines because bullet lists are indented, and when you ask for 20 indents, that line becomes very long to accommodate them, and it takes the rest of the page with it. In short, DON'T DO THAT, because it's not ever going to change. -- Lee Daniel Crocker

Table positioned between two paragraphs displays at bottom of page
(possibly related to above bug??) Wednesday, April 10, 2002

If you look at the table in Talk:High_German, you'll see that instead of appearing between the two paragraphs of my note, it leaves a "close table" tag where the table belongs and puts the table at the end of the page. I've double-checked my table code for errors, and can't find any. I've also tried just making one big table, with the first and last paragraphs in their own table rows, but the problem persists. Is this a bug, or am I having a Stupid Attack™? pgdudda

You're missing a </center> tag; it looks okay after I added that in. But that did trigger a bug in the parser that caused it to eat the table instead of the center tag... I'll try to fix that, but in the meantime, uh, don't do that. :) Brion VIBBER, Wednesday, April 10, 2002
Oh, so I *was* having a Stupid Attack™, but at least my Stupid Attack helped uncover another bug. Thanks!  :-) pgdudda Thursday, April 11, 2002



Linking error 2/25/02

Oregon consititution had several articles with multiple spaces in them - so the link was Article II (two spaces before this) title here instead of Article II title here and the link resolves to different locations. Rob Salzman

Hmm, I think this is semi-fixed. Anyone still seeing these kinds of errors? Brion VIBBER, Friday, April 19, 2002
STATUS: UNKNOWN

Parser

Last line link in list

(2002/1/29) If the last lines of an article looks like this:

* [http://www.yahoo.com/
Yahoo]

then the bottom part of the page ("Main Page | Recent Changes...") will be indented to the left and screwed up. See SandBox for an example. This only happens if all of the following are true:

  1. we are in a list
  2. we have an URL link
  3. The last letter of the URL is /
  4. The name of the link occurs on the next line
  5. You are using IE 5.5 on Windows. Netscape 4.76 on Linux does not show the effect.

AxelBoldt

(2002/3/2) Right now, I see the bug also in Netscape. An example is at the bottom of Duverger's Law. AxelBoldt

That page renders correctly for me on Mozilla 0.9.8 & Netscape 4.78 (Linux). The example in wikipedia:Sandbox still leaves an indent on the following page contents (which is due to a bug in the wiki-to-html rendering code), but not in the link bar at the bottom (which is now separated by a div tag, so there shouldn't be any interference). Brion VIBBER 2002/03/02

Another linking error - My user page had some external links that were just the usual raw html://yaddayadda.com/etcetera and they used to work, but today they didn't. I wasn't because there was an asterisk or a parenthesis immediately before or after the URL. It doesn't seem to be because the URL ends in a / - I looked at other user pages to compare, and I cannot figure out why it wasn't working - gremlins? (Look back one or two levels in the history of my user page to see the formats - I've since forced it to work, by hiding the URL from the displayed page.) -- Marj Tiefert, Wednesday, May 15, 2002


Parser generates extra whitespace

The Bipolar disorder page is full of extra whitespace - looking at the article reveals lots of <p> </p;gt; and <pre> </pre> spans generated.

Similarly, if an otherwise emply line contains some white space, the previous parser took that as a paragraph break, while the new parser treats it as a block of indented nothing, resulting in too much space between the paragraphs.

If whitespace precedes a #, then it is taken to be a numbered list, while before it was taken as a literal # (which is the correct behavior, especially useful for programs). AxelBoldt

STATUS : Solved in CVS

Bad table code can screw up layout

(2002/1/28) In the Quaternions article, the first part of the article appears at the bottom of the page, as do all the QuickBar links. --Zundark

This was caused by Bad Table Code in the article. There was no closing TR tag for the last row in the table, and an extra open TR tag after the end of the table. I've fixed the article... The parser could probably be made to be able to normalize these things, though (ie, remove table-ish tags not inside &amp;amp;lt;TABLE&amp;amp;gt;...&amp;amp;lt;/TABLE&amp;amp;gt;) --Brion Vibber

Parser issues with header lines

The display of Eight queens puzzle is... less than optimal. The problem is that the leading space on a line used to disable the processing of '#': now the Python program example is damaged.


Definition lists produce invalid HTML, could use some improvement as well

(2002/4/16) Lee Daniel Crocker The line

; term : definition

is rendered as

<DL><dt> term </DL><DL><dt><dd> definition</DL>

Note that neither the "dt" nor "dd" elements are properly closed. Further,

(2002/1/25) Definition lists like:

Term 1
Definition 1.
Term 2
Definition 2.

each get put in separate &amp;amp;lt;dl&amp;amp;gt; tags, resulting in too much spacing between them. Carey Evans

While we're at it, it would be nice if the DD/DL elements were only closed off on a full blank line (or end of article), and not just a single newline. That would make them more consistent with regular paragraph text, and make articles with long definitions easier to write and edit.

Specifically,

; term
  : long definition blah blah blah blah blah blah blah blah blah
   blah blah blah blah blah blah blah blah blah blah blah blah blah
   blah blah blah blah blah blah
  

should be rendered identically to

; term
  : long definition blah blah blah blah blah blah blah blah blah blah blah
    blah blah blah blah blah blah blah blah blah blah blah blah blah blah
    blah blah blah
  

This should also be the case for the ULs and OLs created by * and #. Of course, if the first character of a new line within a DD is ";", then close the DD and open a DT; if it is ":", insert an empty DT and open a new DD. When a full blank line is encountered, close both

the open DD and the DL. I'll take a look at the parser code to see if that's possible.

STATUS : Solved in CVS

Character entities in links

Sat Feb 2 00:23:40 UTC 2002: On list of food additives, I have additives like [[&amp;amp;beta;-cyclodextrine]]. When I click on the question mark to create an article about it, I get the Main Page displayed for edit instead. Note that since &amp;amp; is a safe character in URI path segments, escaping it as %26 has no effect.

This is due to a bug in the code putting too many HTML escapes into the title; if it were working correctly, the %26 escape would indeed have an effect. My recommendation until this is resolved: use β-cyclodextrine ([[beta-cyclodextrine|&beta;-cyclodextrine]]). --Brion Vibber
There's probably good arguments for actually writing "beta-cyclodextrine" in the article. However, my point about the % escape is that according to RFC 2396, there is no difference between %26 and just & in the path of the URL. --Carey
Well, there's the RFC and then there's the actual behavior of the software... PHP does not seem to consider %26 to be an ampersand for the purposes of extracting variables from the URL's *query* bit. At least my reading of the RFC agrees with it: ?3.4 Query Component ... Within a query component, the characters ..."@", "&", "="... are reserved.? It's not a problem in the path, only in the query when you're e.g. editing the page. --BV
The URL rewriting to give nice URLs like http://www.wikipedia.com/wiki/MainPage rather than .../wiki.phtml?MainPage makes this a bit more complicated. There's no question mark in the URL for this edit page, so Apache is probably justified in converting %26 to & internally, before processing the Alias or RewriteRule directive, or http://www.wikipedia.com/%77%69%6B%69/ wouldn't work. --Carey

(Ideally the URL would be encoded as %CE%B2-cyclodextrine, the UTF-8 encoding of GREEK SMALL LETTER BETA.)

Impossible until the database is converted from ISO-8859-1 to UTF-8. --BV
I would just write &lt;? echo urlencode(recode("h..utf8", $title)) ?&gt;. --Carey
Yeah, that could probably work as long as titles are normalized internally. I'll try banging the code into place... --BV

References: RFC 2396, W3C on i18n of URIs


--Carey Evans


Problem in "printable version" page?

Please go to Category theory and try the "printable version" link; you will probably see that the word functors remains as a blue link instead of becoming simple italics text. I was unable to spot any sort of difference from other links that would cause this strange behaviour, and I suppose it can be considered a bug, since the printable version should not contain any link in the text part. Daniel M

Yup, it's a bug, caused by the fact that the link looks like [[functor|<b>functors</b>]]. It's fixed in the development version of the code. AxelBoldt
STATUS : Solved in CVS



It's always a positive sign that somebody is working hard to improve a system when brand goofy new bugs start to appear. I just put together a new preliminary subject outline for philately with a certain amount of bulleted text. Now, when I enter a new first level bullet after a line with a second level bullet, that first level bullet doesn't appear in the article. Eclecticology, Thursday, May 9, 2002

I've noted this problem in numbered lists too, or something like it. Also, it used to be possible to create a mixed list with a structure something like:
  • One header
    1. A sub-item
    2. Another sub-item
    3. Yet another sub-item
  • A new header
    1. A sub-item for the second header
    2. And so forth

(see Montgomery County, Maryland for an example!) All of a sudden, this has stopped giving reasonable results. -- BRG (Friday, May 10, 2002, but first noticed earlier)


Again with philately: When I linked to this from the "recent changes" page, I was aghast that the work that I had done was gone without explanation. I checked the history and the (diff)'s, but that showed that my input was all still there, and that apparently no human had touched it. From this I made a couple of minor edits (just to have something different) and saved it again. That was last night. Now again if I try to link from "recent changes" I still get an older version of the page. At this point I am beginning to wonder to what extent a link reliably gives the most recent version of a page. Eclecticology, Friday, May 10, 2002

This morning I'd noticed a similar problem with the article on Maryland, which I've worked on a lot; only it had apparently gone back many versions. While trying to restore my edits, however, I found the correct version suddenly appeared! -- BRG

#REDIRECT appears to be broken; 

When I try to access a redirected article, it appears with the "Redirect from foo_bar" header, but the article itself appears as:

  1. REDIRECT foo_bar

Any idea what's happening? Refreshing the page has no effect. I'm using IE6.0 on WinXP. (2002.05.15 pgdudda)