User:Wvbailey/Parsing PM
This is a sandbox page for a look at how to parse the propositional formulas in Principia Mathematica. ⊃ Ɔ ≡ ⊂ ∪ ∩ ∨ ∧ ∨ ∩ ∪ ⊂ ⊃ Ɔ ≡ ε Λ ℩ ⊓ ⊔ ▪■︰✸ ✹ ✱ ∪ ∩ ∨ 〉 〈 ≡ ⇒ ⊃ ⊏ ⊐ ⊢ ⊦
Algorithm development
Peano 1889
van Heijenoort comments that "Peaon's notation is quite superior to that of Boole and schroeder, and it marks an important transition toward modern logic" (p. 84). He primarily faults Peano on his lack of a rule of detachment (modus ponens). But what about his symbolism? How does that fare?
Peano states:
- We shall generally write signs on a single line. To show the order in which they should be taken, we use parentheses, as in algebra, or dots, ., :, :., ::, and so on.
- "To understand a formula divded by dots we first take together the signs that are not separated by any dot, next those sperarted by one dot, then thos seqparated by two dots, and so on.
- For example, let a, b, c, . . . be any signs. Then ab.cd means (ab)(cd); and ab.cd:ef.gh:.k means (((ab)(cd))((ef)(gh)))k." (p. 86)
What is key here is Peano's definition of logical and and his subsequent abbreviation:
- "The sign ∩ is read and. Let a and b be propositions; then a ∩ b is the simultaneious affirmation of the propositions a and b. For the sake of brevity, we ordinarily write ab instead of a ∩ b." (p. 87)
The question becomes, can the example be parsed, i.e. converted back into a formula with the familiar parentheses. What we will attemp is have a "production" of the following form, where ssss is a symbol-string made of symbols that are not dots. In other words the search left or right through the string stops when a dot is encountered, or an end of the formula. The symbols ⊏ and ⊐ are "directives" to remind us that we have to go left and right through the string s
- s.s ⇒ s[ )( ]s
- s:s ⇒ s[ )( ]s, etc.
But what happens when the symbol string includes dots of a lower number? For example, what happens in this case:
- abc.def:gh:.jk.l
Peano wants us to start with the lowest number of dots. In this example this would be 1 dot. This occurs two places
- ab.cd:ef.gh:.k ⇒ ab [ )( ] cd:ef.gh:.k
- search left thru ab and hit left end of formula: [ab)( ]cd:ef.gh:.k
- search right through cd and hit double-dot and stop: [ab)(cd]:ef.gh:.k
- [ab)(cd]:ef.gh:.k ⇒ [ab)(cd]:ef [ )( ] gh:.k ⇒
- search left until either dots are hit or end of string: [ab)(cd]:[ef)( ] gh:.k
- search right until either dots are hit or end of string: [ab)(cd]:[ef)(gh]:.k
One-dot occurrences are exhausted. Now proceed to 2-dot occurrences:
- [ab)(cd]:[ef)(gh]:.k ⇒ [ab)(cd] [ )( ] [ef)(gh]:.k
- search left until either dots are hit or end of string: [[ab)(cd])( ] [ef)(gh]:.k
- search right until either dots are hit or end of string: [[ab)(cd])([ef)(gh]]:.k
Two-dot occurrences are exhausted. Now proceed to 3-dot occurrences:
- [[ab)(cd])([ef)(gh]]:.k ⇒ [[ab)(cd])([ef)(gh]][ )( ] k
- search left until either dots are hit or end of string: [[[ab)(cd])([ef)(gh]])( ] k
- search right until either dots are hit or end of string: [[[ab)(cd])([ef)(gh]])(k]
Clean-up: change the brackets to parentheses:
- (((ab)(cd))((ef)(gh)))(k)
With the exception of the right-most (k) this agrees with Peano's example.
Let try #6 in his list of "Propositions of logic", and this time use the production s . s ⇒ s [ ) ∩ ( ], i.e. include the sign for logical and
- a = b.b ⊃ c :⊃. a⊃c ⇒ a = b [ )∩( ] b ⊃ c :⊃. a⊃c ⇒ [ a = b )∩( b ⊃ c ]:⊃. a⊃c ⇒
- [ a = b )∩( b ⊃ c ] : ⊃. a ⊃ c ⇒ [ a = b )∩( b ⊃ c ] : ⊃ [ )∩( ] a⊃c ⇒ [ a = b )∩( b ⊃ c ] :[ ⊃ )∩( a⊃c ] ⇒
- [ a = b )∩( b ⊃ c ] :[ ⊃ )∩( a⊃c ] ⇒ [ a = b )∩( b ⊃ c ] [ )∩( ] [ ⊃ )∩( a⊃c ] ⇒ [[ a = b )∩( b ⊃ c ] )∩([ ⊃ )∩( a⊃c ] ]
- cleanup the : ((a = b )∩( b ⊃ c ))∩(( ⊃ )∩( a⊃c ))
- Reinsert the sign for logical and:
Does this make sense? If a equals b and b implies c, then a implies c. It does make sense but only if we can figure out what is to be done with the odd ∩(( ⊃ )∩. Here's Peano's answer:
- "Punctuation signs may be omitted if . . . only one formula, which is just the one we want to write, has meaning." (p. 87)
In other words, just use your intuition.
6This is a simplification of the algorithm given in [1 ].
7
Principia Mathematica decoding of propositional formulas
Abbreviations: d is a generic dot symbol, O is the generic "disjunctive operator" drawn from ⊃, ≡, V, plus the sign = used for definitions, Õ Ō will be any symbol not ⊃, ≡, V, =. P is a "proposition symbol" e.g. p, q, r, s, t, etc. [ will be a ( (parenthesis) mark that acts as a "directive" or reminder that the [ must travel left through the string. ] will be a )-mark that must travel right through the string. " s " is generically "a string of symbols (of any sort)". PM:9 observes that "Group I" symbols consist of those with "dots adjoining a sign of implication (⊃) or of [logical-]equivalence (≡), or of disjuction (V) or of equality by definition (= Df)". "Group III" symbols consists of dots which stand between propositions in order to idnicate a logical product".
Dots { . , : , :. , ::, etc }: Each instance of a collection of dots such as " :. " is a single symbol. It should be thought of as akin to the numeral " 3 ", a single symbol. In other words, although " :. " is made of a colon and a period it is a single sign. (Peano used the single-sign ∴, and the single symbol ∷ is useful). An instance of a dot-symbol can have no limit to its count of dots. The count of dots in a symbol-intance represents [PRECEDENCE?? over dots to the left or right, and this is what matters]. Thus " :. " (or ∴ ) represents "3 dots"; it has a precedence of 3, which is higher than a 2-dot symbol and a 1-dot symbol. The following "schema" will use the letter " d " to represent, generically, a dot symbol of any precedence.
Group III symbol representing logical and : Generically PdP, in words: a dot between two proposition-symbols. Has the lowest "force". Easy to spot: if a dot-symbol of any count appears between two signs that are not { ⊃, ≡, V, = } the dot-symbol represents logical and, for example " p:q ". As noted above, Peano abbreviated the symbolism from " (...) ∩ (...) " to " ( ... )(...) here by either " )( " with no sign between ) and ( (i.e. the classical Boolean logical product) or if one prefers, " & " or " ∧ " (Peano used " ∩ "). Generically, PdP produces " P [ )( ] P " or depending on one's preference " P [ ) & ( ] P ", " P [ ) ∧ ( ] q ", etc.
A dot between two non-operator symbols Õ Ō(these will be propositions) a pair of parentheses occur on either side of the operator-symbol. The brackets are to remind ourselves that we need to close the parentheses.
- { .⊃. , .≡. , :V: , :=: } => { [ )⊃( ] , [ )≡( ] , [ )V( ] , [ )=( ] }
The problem will be to find where in the string to close the left and right parentheses. Until it encounters a dot-symbol with a number equal to or higher than the dot-symbol that spawned it (or the left end of the string), the " [ " sign will continue to the left. In a similar manner the " ] " bracket will move right.
The problem will be to find where in the string to close the left and right parentheses.
Group I symbols aka "operator symbols" generically dOd where O is { ⊃, ≡, V, = }: "Group I symbols" is the usage of PM. Pope et. al. uses the word "operator" to indicate what they call the "disjunctive operators" ⊃, ≡, V to which the definition sign = needs to be added. If dots are on one side of an operator symbol they will be on the other side too. Let O, for generic Operator, come from the set { ⊃, ≡, V, = }. On both left and right sides of an instance of an O-symbol will be a dot symbol, but not necessarily of the same count, generically " dOd ". Each instance of this represents a composite symbol. For example:
- The schema is dOd. This can be "decompressed" as follows:
- { .O. , :O:, :O. , .O: , .O:., :O:., etc } where O comes from the set { ⊃, ≡, V, = }:
- { { .⊃. , .≡. , :V: , :=: }, { :⊃: , :≡: , :V: , :=: }, { :⊃. , :≡. , :V. , :=. }, { .⊃: , .≡: , .V: , .=: }, etc }
Whenever such a operator occurs a pair of parentheses occur on either side of the operator-symbol. The brackets are to remind ourselves that we need to close the parentheses.
- { .⊃. , .≡. , :V: , :=: } => { [ )⊃( ] , [ )≡( ] , [ )V( ] , [ )=( ] }
The problem will be to find where in the string to close the left and right parentheses. Until it encounters a dot-symbol with a number equal to or higher than the dot-symbol that spawned it (or the left end of the string), the " [ " sign will continue to the left. In a similar manner the " ] " bracket will move right.
Logical NOT (negation): Per contemporary symbolism: if there's a single proposition say " p " the negation of this is " ~ p ". If there's an expression to be negated, this will have parentheses enclosing it:
- ~(~a V ~b) defines logical and i.e. a & b
An assertion of truth: (In the following, d is a generic dot-symbol described above). If the symbols ⊢d appear at the left end of a formula they represent an assertion of truth of what follows (cf PM:8). The count of this d will be one higher than the highest count inside the expression that follows[1] This higher-precedence dot-symbol puts "full stops" around the expression (cf "full stops" per PM:8).
Precedence of the signs: logical AND is the lowest precedence. Logical and should be treated first. Peano expected the lowest-precedence dot-symbol (e.g. 1 dot) to be treated first, then the next higher
As noted by Pope et. al. [2]:
- This language was designed for parsing by humans. Formal language description mechanisms had not been discovered in 1927 when Principia Mathematica was written. As it happens, the notation can be described using an operator precedence grammar with an infinite number of operators . . . here there is no limit to the number of dots that can accompany a conjunction or disjunction."
Just as in Peano, PM replaces parentheses with dots. But the [algorithm] is more sophisticated, and it eliminates the problem observed in the Peano example Again, Dot-symbols are of this sort: { ▪, :, :., ::, etc ad infinitum }
Dots have two usages. The first To indicate and, the "dot" or "dots" are free-standing between symbols that are not disjunctive connectives ( { V, ⊃, ≡, = } see below). This usage is easy to spot; the grammar will always be of the form
- s ▪ s, s:s, s:.s, etc
Here s is a symbol string where the symbols closest to the dot (on left and right) are propositional symbols such as a, b, c, p, q, r and not one of these "operator" symbols: { V, ⊃, ≡, = }. By the Peano method (Rule #1 below), we can parse this as follows (Parsing p.q.r requires an additional rule; start on the left and move to the right[3]
- p.q:r ⇒
- p [ )&( ] q:r ⇒
- [ p )&( q] :r ⇒
- [ p )&( q] [ )&( ] r ⇒
- [ [ p )&( q] )&( r ] ⇒
- ( ( p ) & ( q ) ) & ( r )
But not the dots shown here that are associated with (on either side of) the sign =
- p ⊃ q .=. ~p V q
The second usage of dots is when they are "associated"[4] with symbols such as "the disjunctive connectives"[5] V (logical inclusive or), ⊃ (implication), ≡ (logical equivalence, if and only if). This "association" actually results in a "schema" of composite symbols, an infinite number of possibilities, as shown below. This includes the sign " = " which PM uses in definitions.
The dot-grammar consists of the following composite signs that behave as "operators". In the following O can be one of the following symbols { V, ⊃, ≡, = }. If dots appear on one side they must appear on the other, but not necessarily of the same number.
- { .O. , :O:, :O. , .O: , .O:., :O:., etc infinitum }
Another view of this would be:
- { .O. , ..O.., .O.., ..O. , .O..., ..O..., etc
- Rule #3: "Negation of anything other than a simple proposition . . . requires parentheses".
- Rule #1: "Dots for conjunction [logical and, s ▪ s ] have precedence left and right over any smaller group of dots until either end of the proposition", [however]
- Rule #2: "Dots beside a disjunctive operator [V, ⊃, ≡, and = ] have precedence away from the operator over any smaller number
of dots next to a disjunctive connective [V, ⊃, ≡, and = ] or a smaller or equal number of dots for conjunction [s ▪ s ].
- Rule #0: Parsing starts at the left, "Both conjunction and the disjunctive connectives are left associative".
An algorithm
Abbreviations: "O-dot symbol" or "O-dot" indicate an symbol chosen from the set { V, ⊃, ≡, = } together with associated left- and right-dots. "&-dot symbol" or "&-dot" indicate a symbol [as defined above]. "Dot-symbol" indicates of the following signs{ . , :, :. , :: , etc }.
- Start at the left of the symbol-string and move to the right. Motions through the string will be to the left and to the right. They will be limited by "end-stop" symbols (see below). Copying or "pushing" and "popping" from stack(s) will be required. Observe that if the left-most symbols are ⊢ followed by dots, the dots will always be of a higher number than the highest-number dot-symbol in the string. These are not necessary to parse the string, and they will be discarded.
- ⊢:.p V q : p .V. q ⊃ r :⊃. p V r
- If present, discard the assertion sign ⊢ and the dot-symbol that follows it.
:: .p V q : p .V. q ⊃ r :⊃. p V r
- Put parentheses around the entire expression. These are left- and right-end stops. Alternately use new symbol $.
- ( p V q : p .V. q ⊃ r :⊃. p V r )
- $ p V q : p .V. q ⊃ r :⊃. p V r $
- Locate the dot-symbol with the highest number of dots. It may be either the O-dot symbol, or a free-standing &-dot symbol or they may be equivalent. [WORK ON THE LOGIC OF THIS, define an abbreviation the &-dot symbol, the or-dot symbol] : [If both they are equal OR if the or-dot symbol is greater the or-rule #2 has precedence, else the and rule #1 has precedence.
- A 2-dot indicates and between q : p, but a 2-dot is to the left of the implication sign, i.e. :⊃. So the or-rule #2 has precedence.
- Rule #2 requires a parenthesis-production to the left and right. Do the left side first and then the right (arbitrary):
- Expand the left-side dots: ( s .O ➨ ➔ ⇒ ( s [ ) O ⇒ rule #2
- Expand the right-side dots: O. s ) ⇒ O [ ) s ) ⇒ rule #2
- Rule #1 requires a production that extends parenthesis-searching to both the left and right.:
- ( s . s ) ⇒ ( s [ )&( ] s ) ⇒ rule #1
Previous attempts
- 3.2 Operator-precedence parsing algorithm
- We will now construct an algorithm for parsing operator precedence grammars. As seen in the explanation above the way we can read an operator precedence grammar is from left to right, when we see a symbol we just accept it as is, and when it is an operator we will examine the operator to the right of it. If the operator has a greater or equal precedence ( ⋖ or ≐ ) than the operator to the right of it we have to evaluate the operator. If the operator has a lower precedence ( ⋖ ) than the operator to the right of it we will have to wait until the operator right from of it evaluates. We can implement this using two stacks, one stack for the symbols and one stack for the 'suspended' operators. Figure 5 shows an imperative way of expressing this algorithm.6
- 4 Example: propositional formulas a la Principia Mathematica
- Principia Mathematica (12] introduces a notation for propositional formulas designed to eliminate most parentheses from formulas. The method to accomplish this is to add dots to break an expression into parts, more dots meaning a larger break. A conjunction consists of at least one dot. Dots associated with a conjunction result in less of a break than dots associated with a disjunctive connective such as V, → or ↔. Negation of anything other than a simple proposition still requires parentheses.
The precedence rules for dotted connectives are as follows:
- •1 Dots for conjunction have precedence left and right over any smaller group of dots until either end of the proposition.
- 3 p V q : p .V. q ⊃ r :⊃. p V r ⇒ see rule 2; “ :⊃ “ has precedence over q : p ⇒
- Reduce to the generic P is a proposition, O is an "operator", numbers indicate the number of dots:
- $ P O P 2 P 1 O 1 P O P 2 O 1 P O P $
- Scan left to right through the string: Look for " P n P ", and mark with > < around the n:
- $ P O
1 O 1 P O P 2 O 1 P O P $
- Scan left to right through the string: Look for n O n, and mark with ] [ around the O:
- $ P O P > 2 < P 1 ]O[ 1 P O P 2 ]O[ 1 P O P $
- $ P O P 2 P 1 O 1 P O P 2 O 1 P O P $
- Reduce to the generic P is a proposition, O is an "operator", numbers indicate the number of dots:
- •2 Dots beside a disjunctive operator have precedence away from the operator over any smaller number of dots next to a disjunctive connective or a smaller or equal number of dots for conjunction.
( p V q : p .V. q ⊃ r :⊃. p V r ) ⇒ ( p V q : p .V. q ⊃ r 〈 ) ⊃. p V r ) ⇒ ( ( p V q : p .V. q ⊃ r ) ⊃. p V r ) ⇒ ( ( p V q : p .V. q ⊃ r ) ⊃ ( 〉 p V r ) ⇒ ( ( p V q : p .V. q ⊃ r ) ⊃ (p V r ) ) Go inside the formula to the left of the senior connective ( ( p V q : p .V. q ⊃ r ) By rule 1, the “ s : s “ has precedence over the single dots of .V. ⇒ ( p V q : p .V. q ⊃ r ) ⇒ ( p V q 〈 ) & ( 〉 p .V. q ⊃ r ) ⇒ ( ( p V q ) & ( p .V. q ⊃ r ) ) Look inside, find dots on right side: ( p .V. q ⊃ r ) ) ⇒ ( ( p 〈 ) V. q ⊃ r ) ) ⇒ ( ( p ) V. q ⊃ r ) ) ⇒ ( ( p ) V ( 〉 q ⊃ r ) ) ⇒ ( ( p ) V ( q ⊃ r 〉 ) ) ⇒ ( ( p ) V ( q ⊃ r ) ) ) Assemble the fully-parsed formula ( ( p V q ) & ( p .V. q ⊃ r ) ) ⇒ ( ( p V q ) & ( ( p ) V ( q ⊃ r ) ) ) ( ( p V q : p .V. q ⊃ r ) ⊃ (p V r ) ) ⇒ ( ( ( p V q ) & ( ( p ) V ( q ⊃ r ) ) ) ⊃ (p V r ) ) And this agrees with Russell p. 10.
What is missing is this notion derived from Peano: the production and the search left and right per rules 1 and 2, sL . sR → sL 〈 ) & ( 〉 sR → 〈 sL ) & ( sR 〉 And the idea of composite symbols, i.e. associating dots with the operators: .V. , :V. , :V: , . ⊃. , :⊃. etc
≡, ∨, ➨ ➔
•3 Both conjunction and the disjunctive connectives are left associative.
This language was designed for parsing by humans. Formal language description mechanisms had not been discovered in 1927 when Principia Mathematica was written. As it happens, the notation can be described using an operator precedence grammar with an infinite number of operators as shown in figure 7.
The interesting part of the precedence table is the handling of the conjunction and disjunction operators. Most operator precedence grammars have a finite number of operators, but here there is no limit to the number of dots that can accompany a conjunction or disjunction.
All that is required to parse this grammar using the operator precedence parser defined in section 3 is to create a function encapsulating figure 7 and define parsers for the operator symbols.
Monadic Parsing: A Case Study Bernard Pope, Simon Taylor and Mark Vielaard Department of Computer Science The University of Melbourne Parkville, Vic. 3052, Australia Abstract One of the selling points for functional languages is the ease with which parsers for simple languages can be expressed. In this report we give an introduction to monadic and operator precedence parsing in functional languages, using the parsing of propositional logic formulas as an example.
Here is the source of the following: http://www.patrickkellogg.com/school/papers/chomsky.htm
- "1. Recursive grammar.
- "A recursively enumerable grammar is one that can be instantiated by an algorithm. Another way of saying this is: "there is a Turing machine that decides it" (Lewis, 1998, p. 195). Given an input that is "valid" for the grammar, the Turing machine will always halt in a correct "answer" state. When written using standard notation, both sides of the rewrite rules can have as many symbols as are needed (example, A B C D � M N O P Q (etc.)). A grammar of this type could be said to be "incomplete", since the behavior of the Turing machine is not specified for all inputs. This is the same problem that Bertrand Russell and Alfred North Whitehead had when constructing their unified formal logic "Principia Mathematica" (Hofstadter, 1979, pp 17-619)."
- Hofstadter, Douglas R., "G�del, Escher, Bach: An Eternal Golden Braid", Vintage Books, New York. 1979
Footnotes
- ^ This can be inferred from the discussion at PM:10, but not stated outright.
- ^ Monadic Parsing: A case Study, "4 Example: propositional formulas a la Principia Mathematica"
- ^ " . . . we can read an operator precedence grammar . . . from left to right, when we see a symbol we just accept it as it is . . ." (Pope et. al.:7)
- ^ Word used by Pope et. al.
- ^ Words used by Pope et. al.
References
- Pope, Taylor, Waylaard [no date], "Monadic Parsing: A Case Study",