Talk:Comparison of regular expression engines
![]() | Computing: Software Unassessed | ||||||||||||
|
Ill-defined terms
Too many of the terms used as headings are vague or apply only to the terminology used for one RE engine. What this article really needs is a glossary of its terms.
There's also a fair point to be made that many of the tables here could be prose, and that would facilitate citing them. -Harmil 19:47, 27 April 2007 (UTC)
- I agree a terminology description would be useful. However I strongly disagree some of the tables should be converted to text. First because that takes away this articles main feature - the ability to see differences within seconds without reading for hours - and secondly citing Wikipedia is discouraged anyway. // Sping 17:20, 11 July 2007 (UTC)
- Care to give examples of terminology you consider too vague or applicable only to "the terminology used for one RE engine?" (I'm not really sure what you mean by that.) I think the terms are fairly straightforward. IMO, a bigger problem is that a very large number of significant features supported by some regex libraries are not currently represented in the comparison tables here. --Monger 04:03, 12 July 2007 (UTC)
- What on earth is a "Lazy Quantifier"? I can't find mention of it anywhere else. 72.220.174.159 20:24, 26 July 2007 (UTC)
- You must not have looked very far. In the content of regular expression quantifiers, lazy is the opposite of greedy. See http://www.regular-expressions.info/repeat.html for more info. I've also seen lazy quantifiers described as "non-greedy" or "reluctant". --Monger 01:17, 27 July 2007 (UTC)
- Lazy is not the opposite of greedy, that is a poor name. Also, I've never seen it called "lazy" before, non-greedy is the standard term.mathrick (talk) 23:42, 2 February 2008 (UTC)
- You must not have looked very far. In the content of regular expression quantifiers, lazy is the opposite of greedy. See http://www.regular-expressions.info/repeat.html for more info. I've also seen lazy quantifiers described as "non-greedy" or "reluctant". --Monger 01:17, 27 July 2007 (UTC)
Removing flavors with no information
Unless others disagree, I plan to remove from the comparison tables any flavors and engines which currently have no information about their features listed. Currently, this includes the following:
- ActionScript3.0
- Boost.Xpressive
- Grep
- GRETA
- Jakarta/Regexp
- Oniguruma
- SubEthaEdit
- Tcl 8.1
- TextMate
I would encourage others to list information about these engines' features, especially since a few of them are very significant and commonly used. However, I do not see any value in listing them without any information (none include any more than a couple "no"s). --Monger 00:54, 17 July 2007 (UTC)
- I've gone ahead and done this. --Monger 01:00, 20 July 2007 (UTC)
Unicode property support
I have not found any evidence, that Python supports unicode properties (like \p{L}
). I'm not sure how it is about another implementations, so I am fixing only the Python item. See e.g. [1]. Mykhal (talk) 21:10, 9 January 2008 (UTC)
Only ICU and Perl offer full Unicode property support as of this writing; notes added. I cannot find any evidence that vim supports Unicode properties (like \pL
, \p{Lu}
, \p{Alphabetic}
, \p{Script=Latin}
, or \p{Line_Break=A_Letter}
. I have removed its support.
I strongly suggest that just mentioning Unicode property support is far too broad a brush for usefulness. The most important thing is whether a regex system is or is not compliant with the requirements spelt out in Unicode Regular Expressions. This is quite specific about formal requirements, such as Level 1, Level 2, or Level 3. Suggestions? Standards compliance is easily referenceable through specific claims in each language's documentation.
Even mentioning whether things like \w
, \s
, and \b
work with Unicode or whether thye're ASCII-only would be much more useful than the current column features. 17:50, 5 February 2010 (UTC)
Languages?
What exactly is the Languages table supposed to show? Languages which have regexes builtin? Languages for which a regex library exists? Something else? As it stands today, it's completely meaningless. mathrick (talk) 00:30, 3 February 2008 (UTC)
Table footnotes
I found the footnotes on these tables to be nigh on useless. Why are they using refun? I can see using refun when there are only one or two notes, but not when there are 7. I was forced to compare the link names on the endnotes with the notes themselves to figure out which note I was interested in reading. Argonel (talk) 21:43, 28 May 2008 (UTC)
Speed
Another interesting point of comparison could of course be speed (or type of implementation); some references in paper Regular Expression Matching Can Be Simple And Fast . --Lapo Luchini (talk) 14:58, 31 August 2008 (UTC)