User:Wvbailey/Parsing PM
This is a sandbox page for a look at how to parse the propositional formulas in Principia Mathematica. ⊃ Ɔ ≡ ⊂ ∪ ∩ ∨ ∧ ∨ ∩ ∪ ⊂ ⊃ Ɔ ≡ ε Λ ℩ ⊓ ⊔ ▪■︰✸ ✹ ✱ ∪ ∩ ∨ 〉 〈 ≡ ⇒ ⊃ ⊏ ⊐ ⊢ ⊦
Algorithm development
Peano 1889
van Heijenoort comments that "Peaon's notation is quite superior to that of Boole and schroeder, and it marks an important transition toward modern logic" (p. 84). He primarily faults Peano on his lack of a rule of detachment (modus ponens). But what about his symbolism? How does that fare?
Peano states:
- We shall generally write signs on a single line. To show the order in which they should be taken, we use parentheses, as in algebra, or dots, ., :, :., ::, and so on.
- "To understand a formula divded by dots we first take together the signs that are not separated by any dot, next those sperarted by one dot, then thos seqparated by two dots, and so on.
- For example, let a, b, c, . . . be any signs. Then ab.cd means (ab)(cd); and ab.cd:ef.gh:.k means (((ab)(cd))((ef)(gh)))k." (p. 86)
What is key here is Peano's definition of logical and and his subsequent abbreviation:
- "The sign ∩ is read and. Let a and b be propositions; then a ∩ b is the simultaneious affirmation of the propositions a and b. For the sake of brevity, we ordinarily write ab instead of a ∩ b." (p. 87)
The question becomes, can the example be parsed, i.e. converted back into a formula with the familiar parentheses. What we will attemp is have a "production" of the following form, where ssss is a symbol-string made of symbols that are not dots. In other words the search left or right through the string stops when a dot is encountered, or an end of the formula. The symbols ⊏ and ⊐ are "directives" to remind us that we have to go left and right through the string s
- s.s ⇒ s[ )( ]s
- s:s ⇒ s[ )( ]s, etc.
But what happens when the symbol string includes dots of a lower number? For example, what happens in this case:
- abc.def:gh:.jk.l
Peano wants us to start with the lowest number of dots. In this example this would be 1 dot. This occurs two places
- ab.cd:ef.gh:.k ⇒ ab [ )( ] cd:ef.gh:.k
- search left thru ab and hit left end of formula: [ab)( ]cd:ef.gh:.k
- search right through cd and hit double-dot and stop: [ab)(cd]:ef.gh:.k
- [ab)(cd]:ef.gh:.k ⇒ [ab)(cd]:ef [ )( ] gh:.k ⇒
- search left until either dots are hit or end of string: [ab)(cd]:[ef)( ] gh:.k
- search right until either dots are hit or end of string: [ab)(cd]:[ef)(gh]:.k
One-dot occurrences are exhausted. Now proceed to 2-dot occurrences:
- [ab)(cd]:[ef)(gh]:.k ⇒ [ab)(cd] [ )( ] [ef)(gh]:.k
- search left until either dots are hit or end of string: [[ab)(cd])( ] [ef)(gh]:.k
- search right until either dots are hit or end of string: [[ab)(cd])([ef)(gh]]:.k
Two-dot occurrences are exhausted. Now proceed to 3-dot occurrences:
- [[ab)(cd])([ef)(gh]]:.k ⇒ [[ab)(cd])([ef)(gh]][ )( ] k
- search left until either dots are hit or end of string: [[[ab)(cd])([ef)(gh]])( ] k
- search right until either dots are hit or end of string: [[[ab)(cd])([ef)(gh]])(k]
Clean-up: change the brackets to parentheses:
- (((ab)(cd))((ef)(gh)))(k)
With the exception of the right-most (k) this agrees with Peano's example.
Let try #6 in his list of "Propositions of logic", and this time use the production s . s ⇒ s [ ) ∩ ( ], i.e. include the sign for logical and
- a = b.b ⊃ c :⊃. a⊃c ⇒ a = b [ )∩( ] b ⊃ c :⊃. a⊃c ⇒ [ a = b )∩( b ⊃ c ]:⊃. a⊃c ⇒
- [ a = b )∩( b ⊃ c ] : ⊃. a ⊃ c ⇒ [ a = b )∩( b ⊃ c ] : ⊃ [ )∩( ] a⊃c ⇒ [ a = b )∩( b ⊃ c ] :[ ⊃ )∩( a⊃c ] ⇒
- [ a = b )∩( b ⊃ c ] :[ ⊃ )∩( a⊃c ] ⇒ [ a = b )∩( b ⊃ c ] [ )∩( ] [ ⊃ )∩( a⊃c ] ⇒ [[ a = b )∩( b ⊃ c ] )∩([ ⊃ )∩( a⊃c ] ]
- cleanup the : ((a = b )∩( b ⊃ c ))∩(( ⊃ )∩( a⊃c ))
- Reinsert the sign for logical and:
Does this make sense? If a equals b and b implies c, then a implies c. It does make sense but only if we can figure out what is to be done with the odd ∩(( ⊃ )∩. Here's Peano's answer:
- "Punctuation signs may be omitted if . . . only one formula, which is just the one we want to write, has meaning." (p. 87)
In other words, just use your intuition.
Principia Mathematica decoding of propositional formulas
PM:9 observes that "Group I" symbols consist of those with "dots adjoining a sign of implication (⊃) or of [logical-]equivalence (≡), or of disjuction (V) or of equality by definition (= Df)". "Group III" symbols consists of dots which stand between propositions in order to idnicate a logical product".
Dots { . , : , :. , ::, etc }: A collection of dots such as " :. " is a composite symbol, meaning it should be thought of as e.g. " 3 ", a single symbol. In other words, although " :. " is made of a colon and a period it is a single sign. (Peano used the single-sign ∴, and the single symbol ∷ is useful). An instance of a dot-symbol can have no limit to its count of dots. The count of dots in a symbol-intance represents [PRECEDENCE?? over dots to the left or right, and this is what matters]. Thus " :. " (or ∴ ) represents "3 dots"; it has a precedence of 3, which is higher than a 2-dot symbol and a 1-dot symbol. The following "schema" will use the letter " d " to represent, generically, a dot symbol of any precedence.
Group III symbols, logical and: Lowest "force". Easy to spot: if a dot-symbol of any count appears between two signs that are not { ⊃, ≡, V, = } the dot-symbol represents logical and, for example " p:q ". As noted above, Peano abbreviated the symbolism from " (...) ∩ (...) " to " ( ... )(...) here by either " )( " with no sign between ) and ( (i.e. the classical Boolean logical product) or if one prefers, " & " or " ∧ " (Peano used " ∩ "). Example: the signs " p:q " yields " p)(q " or depending on one's preference " p ) & ( q ", " p ) ∧ ( q ", etc. The problem will be to find where in the string to close the left and right parentheses.
Group I symbols aka "operator symbols" generically dOd where O is { ⊃, ≡, V, = }: "Group I symbols" is the usage of PM. Pope et. al. uses the word "operator" to indicate what they call the "disjunctive operators" ⊃, ≡, V to which the definition sign = needs to be added. If dots are on one side of an operator symbol they will be on the other side too. Let O, for generic Operator, come from the set { ⊃, ≡, V, = }. On both left and right sides of an instance of an O-symbol will be a dot symbol, but not necessarily of the same count, generically " dOd ". Each instance of this represents a composite symbol. For example:
- The schema is dOd. This can be "decompressed" as follows:
- { .O. , :O:, :O. , .O: , .O:., :O:., etc } where O comes from the set { ⊃, ≡, V, = }:
- { { .⊃. , .≡. , :V: , :=: }, { :⊃: , :≡: , :V: , :=: }, { :⊃. , :≡. , :V. , :=. }, { .⊃: , .≡: , .V: , .=: }, etc }
Whenever such a operator occurs a pair of parentheses occur on either side of the operator-symbol. The brackets are to remind ourselves that we need to close the parentheses.
- { .⊃. , .≡. , :V: , :=: } => { [ )⊃( ] , [ )≡( ] , [ )V( ] , [ )=( ] }
The problem will be to find where in the string to close the left and right parentheses. Until it encounters a dot-symbol with a number equal to or higher than the dot-symbol that spawned it (or the left end of the string), the " [ " sign will continue to the left. In a similar manner the " ] " bracket will move right.
Logical NOT (negation): Per contemporary symbolism: if there's a single proposition say " p " the negation of this is " ~ p ". If there's an expression to be negated, this will have parentheses enclosing it:
- ~(~a V ~b) defines logical and i.e. a & b
An assertion of truth: (In the following, d is a generic dot-symbol described above). If the symbols ⊢d appear at the left end of a formula they represent an assertion of truth of what follows (cf PM:8). The count of this d will be one higher than the highest count inside the expression that follows[1] This higher-precedence dot-symbol puts "full stops" around the expression (cf "full stops" per PM:8).
Precedence of the signs: logical AND is the lowest precedence. Logical and should be treated first. Peano expected the lowest-precedence dot-symbol (e.g. 1 dot) to be treated first, then the next higher
As noted by Pope et. al. [2]:
- This language was designed for parsing by humans. Formal language description mechanisms had not been discovered in 1927 when Principia Mathematica was written. As it happens, the notation can be described using an operator precedence grammar with an infinite number of operators . . . here there is no limit to the number of dots that can accompany a conjunction or disjunction."
Just as in Peano, PM replaces parentheses with dots. But the [algorithm] is more sophisticated, and it eliminates the problem observed in the Peano example Again, Dot-symbols are of this sort: { ▪, :, :., ::, etc ad infinitum }
Dots have two usages. The first To indicate and, the "dot" or "dots" are free-standing between symbols that are not disjunctive connectives ( { V, ⊃, ≡, = } see below). This usage is easy to spot; the grammar will always be of the form
- s ▪ s, s:s, s:.s, etc
Here s is a symbol string where the symbols closest to the dot (on left and right) are propositional symbols such as a, b, c, p, q, r and not one of these "operator" symbols: { V, ⊃, ≡, = }. By the Peano method (Rule #1 below), we can parse this as follows (Parsing p.q.r requires an additional rule; start on the left and move to the right[3]
- p.q:r ⇒
- p [ )&( ] q:r ⇒
- [ p )&( q] :r ⇒
- [ p )&( q] [ )&( ] r ⇒
- [ [ p )&( q] )&( r ] ⇒
- ( ( p ) & ( q ) ) & ( r )
But not the dots shown here that are associated with (on either side of) the sign =
- p ⊃ q .=. ~p V q
The second usage of dots is when they are "associated"[4] with symbols such as "the disjunctive connectives"[5] V (logical inclusive or), ⊃ (implication), ≡ (logical equivalence, if and only if). This "association" actually results in a "schema" of composite symbols, an infinite number of possibilities, as shown below. This includes the sign " = " which PM uses in definitions.
The dot-grammar consists of the following composite signs that behave as "operators". In the following O can be one of the following symbols { V, ⊃, ≡, = }. If dots appear on one side they must appear on the other, but not necessarily of the same number.
- { .O. , :O:, :O. , .O: , .O:., :O:., etc infinitum }
Another view of this would be:
- { .O. , ..O.., .O.., ..O. , .O..., ..O..., etc
- Rule #3: "Negation of anything other than a simple proposition . . . requires parentheses".
- Rule #1: "Dots for conjunction [logical and, s ▪ s ] have precedence left and right over any smaller group of dots until either end of the proposition", [however]
- Rule #2: "Dots beside a disjunctive operator [V, ⊃, ≡, and = ] have precedence away from the operator over any smaller number
of dots next to a disjunctive connective [V, ⊃, ≡, and = ] or a smaller or equal number of dots for conjunction [s ▪ s ].
- Rule #0: Parsing starts at the left, "Both conjunction and the disjunctive connectives are left associative".
An algorithm
Abbreviations: "O-dot symbol" or "O-dot" indicate an symbol chosen from the set { V, ⊃, ≡, = } together with associated left- and right-dots. "&-dot symbol" or "&-dot" indicate a symbol [as defined above]. "Dot-symbol" indicates of the following signs{ . , :, :. , :: , etc }.
- Start at the left of the symbol-string and move to the right. Motions through the string will be to the left and to the right. They will be limited by "end-stop" symbols (see below). Copying or "pushing" and "popping" from stack(s) will be required. Observe that if the left-most symbols are ⊢ followed by dots, the dots will always be of a higher number than the highest-number dot-symbol in the string. These are not necessary to parse the string, and they will be discarded.
- ⊢:.p V q : p .V. q ⊃ r :⊃. p V r
- If present, discard the assertion sign ⊢ and the dot-symbol that follows it.
:: .p V q : p .V. q ⊃ r :⊃. p V r
- Put parentheses around the entire expression. These are left- and right-end stops. Alternately use new symbol $.
- ( p V q : p .V. q ⊃ r :⊃. p V r )
- $ p V q : p .V. q ⊃ r :⊃. p V r $
- Locate the dot-symbol with the highest number of dots. It may be either the O-dot symbol, or a free-standing &-dot symbol or they may be equivalent. [WORK ON THE LOGIC OF THIS, define an abbreviation the &-dot symbol, the or-dot symbol] : [If both they are equal OR if the or-dot symbol is greater the or-rule #2 has precedence, else the and rule #1 has precedence.
- A 2-dot indicates and between q : p, but a 2-dot is to the left of the implication sign, i.e. :⊃. So the or-rule #2 has precedence.
- Rule #2 requires a parenthesis-production to the left and right. Do the left side first and then the right (arbitrary):
- Expand the left-side dots: ( s .O ➨ ➔ ⇒ ( s [ ) O ⇒ rule #2
- Expand the right-side dots: O. s ) ⇒ O [ ) s ) ⇒ rule #2
- Rule #1 requires a production that extends parenthesis-searching to both the left and right.:
- ( s . s ) ⇒ ( s [ )&( ] s ) ⇒ rule #1
Footnotes
- ^ This can be inferred from the discussion at PM:10, but not stated outright.
- ^ Monadic Parsing: A case Study, "4 Example: propositional formulas a la Principia Mathematica"
- ^ " . . . we can read an operator precedence grammar . . . from left to right, when we see a symbol we just accept it as it is . . ." (Pope et. al.:7)
- ^ Word used by Pope et. al.
- ^ Words used by Pope et. al.
References
- Pope, Taylor, Waylaard [no date], "Monadic Parsing: A Case Study",