Unifying Synchronous Tree-Adjoining Grammars and Tree Transducers via Bimorphisms

Stuart M. Shieber. 2006. Unifying Synchronous Tree-Adjoining Grammars and Tree Transducers via Bimorphisms. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), Trento, Italy, 3–7 April.

Stuart M. Shieber Division of Engineering and Applied Sciences Harvard University Cambridge, MA, USA [email protected]

Abstract We place synchronous tree-adjoining grammars and tree transducers in the single overarching framework of bimorphisms, continuing the unification of synchronous grammars and tree transducers initiated by Shieber (2004). Along the way, we present a new definition of the tree-adjoining grammar derivation relation based on a novel direct inter-reduction of TAG and monadic macro tree transducers. Tree transformation systems such as tree transducers and synchronous grammars have seen renewed interest, based on a perceived relevance to new applications, such as importing syntactic structure into statistical machine translation models or founding a formalism for speech command and control. The exact relationship among a variety of formalisms has been unclear, with a large number of seemingly unrelated formalisms being independently proposed or characterized. An initial step toward unifying the formalisms was taken (Shieber, 2004) in making use of the formallanguage-theoretic device of bimorphisms, previously used to characterize the tree relations definable by tree transducers. In particular, the tree relations definable by synchronous tree-substitution grammars (STSG) were shown to be just those definable by linear complete bimorphisms, thereby providing for the first time a clear relationship between synchronous grammars and tree transducers. In this work, we show how the bimorphism framework can be used to capture a more powerful formalism, synchronous tree-adjoining grammars, providing a further uniting of the various and disparate formalisms.

After some preliminaries (Section 1), we begin by recalling the definition of tree-adjoining grammars and synchronous tree-adjoining grammars (Section 2). We turn then to a set of known results relating context-free languages, tree homomorphisms, tree automata, and tree transducers to extend them for the tree-adjoining languages (Section 3), presenting these in terms of restricted kinds of functional programs over trees, using a simple grammatical notation for describing the programs. This allows us to easily express generalizations of the notions: monadic macro tree homomorphisms, automata, and transducers, which bear (at least some of) the same interrelationships that their traditional simpler counterparts do (Section 4). Finally, we use this characterization to place the synchronous TAG formalism in the bimorphism framework (Section 5), further unifying tree transducers and other synchronous grammar formalisms. We also, in passing, provide a new characterization of the relation between TAG derivation and derived trees, and a new simpler and more direct proof of the equivalence of TALs and the output languages of monadic macro tree transducers.

1

Preliminaries

We will notate sequences with angle brackets, e.g., ha, b, ci, or where no confusion results, simply as abc, with the empty string written ε. Trees will have nodes labeled with elements of a RANKED ALPHABET, a set of symbols F, each with a non-negative integer RANK or ARITY assigned to it, determining the number of children for nodes so labeled. To emphasize the arity of a symbol, we will write it as a parenthesized superscript, for instance f (n) for a symbol f of arity n. Analogously, we write F(n) for the set of symbols in F with arity n. Symbols with arity zero (F(0) ) are called NULLARY symbols or CON -

The set of nonconstants is written F(≥1) . To express incomplete trees, trees with “holes” waiting to be filled, we will allow leaves to be labeled with variables, in addition to nullary symbols. The set of TREES OVER A RANKED AL PHABET F AND VARIABLES X, notated T(F, X), is the smallest set such that (i) f ∈ T(F, X) for all f ∈ F(0) ; (ii) x ∈ T(F, X) for all x ∈ X; and (iii) f (t1 , . . . ,tn ) ∈ T(F, X) for all f ∈ F(≥1) , and t1 , . . . ,tn ∈ T(F, X). We abbreviate T(F, 0), / where the set of variables is empty, as T(F), the set of GROUND TREES over F. We will also make use of the set of n numerically ordered variables Xn = {x1 , . . . , xn }, and write x, y, z as synonyms for x1 , x2 , x3 , respectively. Trees can also be viewed as mappings from TREE ADDRESSES , sequences of integers, to the labels of nodes at those addresses. The address ε is the address of the root, 1 the address of the first child, 12 the address of the second child of the first child, and so forth. We will use the notation t/p to pick out the subtree of the node at address p in the tree t. Replacing the subtree of t at address p by a tree t 0 , written t[p 7→ t 0 ] is defined as (using · for the insertion of an element on a list) STANTS.

the definition given above): f (n) ∈ F(n) x ∈ X ::= x0 | x1 | x2 | · · · t ∈ T(F, X) ::= f (m) (t1 , . . . ,tm ) | x The notation allows definition of classes of expressions (e.g., F(n) ) and specifies metavariables over them ( f (n) ). These classes can be primitive (F(n) ) or defined (X), even inductively in terms of other classes or themselves (T(F, X)). We use the metavariables and subscripted variants on the right-hand side to represent an arbitrary element of the corresponding class. Thus, the elements t1 , . . . ,tm stand for arbitrary trees in T(F, X), and x an arbitrary variable in X. Because numerically subscripted versions of x appear explicitly on the right hand side of the rule defining variables, numerically subscripted variables (e.g., x1 ) on the right-hand side of all rules are taken to refer to the specific elements of x, whereas otherwise subscripted elements (e.g., xi ) are taken generically.

2

Tree-Adjoining Grammars

Tree adjoining grammar (TAG) is a tree grammar formalism distinguished by its use of a tree adjunction operation. Traditional presentations . of TAG, which we will assume familiarity with, take the symbols in elementary and derived trees to be unranked; nodes labeled with a given nonThe HEIGHT of a tree t, notated height(t), is determinal symbol may have differing numbers of fined as follows: height(x) = 0 for all x ∈ X and n children. (Joshi and Schabes (1997) present a height( f (t1 , . . . ,tn )) = 1 + maxi=1 height(ti ) for all good overview.) For example, foot nodes of auxf ∈ F. iliary trees and substitution nodes have no chilWe can use trees with variables as CONTEXTS dren, whereas the similarly labeled root nodes in which to place other trees. A tree in T(F, Xn ) must have at least one. Similarly, two nodes with will be called a context, typically denoted with the the same label but differing numbers of children symbol C. For a context C ∈ T(F, Xn ) and a semay match for the purpose of allowing an adquence of n trees t1 , . . . ,tn ∈ T(F), the SUBSTITU junction (as the root nodes of α1 and β1 in FigTION OF t1 , . . . ,tn INTO C, notated C[t1 , . . . ,tn ], is ure 1). In order to integrate TAG with tree transdefined inductively as follows: ducers, however, we move to a ranked alphabet, which presents some problems and opportunities. ( f (u1 , . . . , um ))[t1 , . . . ,tn ] (In some ways, the ranked alphabet definition of = f (u1 [t1 , . . . ,tn ], . . . , um [t1 , . . . ,tn ]) TAGs is slightly more elegant than the traditional xi [t1 , . . . ,tn ] = ti . one.) Although the bulk of the later discussion integrating TAGs and transducers assumes (withA tree t ∈ T(F, X) is LINEAR if and only if no out loss of expressivity (Joshi and Schabes, 1997, variable in X occurs more than once in t. fn. 6)) a limited form of TAG that includes adjuncWe will use a notation akin to BNF to specify tion but not substitution, we define the more comequations defining functional programs of various plete form here. We will thus take the nodes of TAG trees to be sorts. As an introduction to the notation we will use, here is a grammar defining trees over a ranked labeled with symbols from a ranked alphabet F; alphabet and variables (essentially identically to a given symbol then has a fixed arity and a fixed t[ε 7→ t 0 ] = t 0 f (t1 , . . . ,tn )[(i · p) 7→ t 0 ] = f (t1 , . . . ,ti [p 7→ t 0 ], . . . ,tn ) for 1 ≤ i ≤ n

α1 : S

α2 : T

T↓

c

β1 : S0/ a

β2 : S0/ S

S∗

S∗

1

S T

S

b a

α1 :

b

c

β1 : S0/ a

1

S∗

β2 : S0/ S

b a

1

S∗

εS : S ∗ S b

Figure 1: Sample TAG for the copy language { wcw | w ∈ {a, b}∗ }.

Figure 2: Sample core-restricted TAG for the copy language { wcw | w ∈ {a, b}∗ }.

number of children. However, in order to maintain information about which symbols may match for the purpose of adjunction and substitution, we take the elements of F to be explicitly formed as pairs of an unranked label e and an arity n. (For notational consistency, we will use e for unranked and f for ranked symbols.) We will notate these elements, abusing notation, as e(n) , and make use of a function |·| to unrank symbols in F, so that |e(n) | = e. To handle foot nodes, for each non-nullary symbol e(i) ∈ F(≥1) , we will associate a new nullary symbol e∗ , which one can take to be the pair of e and ∗; the set of such symbols will be notated F∗ . Similarly, for substitution nodes, F↓ will be the set of nullary symbols e↓ for all e(i) ∈ F(≥1) . These additional symbols, since they are nullary, will necessarily appear only at the frontier of trees. Finally, to allow null adjoining constraints, for each f ∈ F(i) , we introduce a symbol f0/ also of arity i, and take F0/ to be the set of all such symbols. We will extend the function |·| to provide the unranked symbol associated with these symbols as well, so |e↓ | = |e∗ | = |e(i) 0/ | = e. A TAG is then a quadruple hF, S, I, Ai, where F is a ranked alphabet; S ∈ F is a distinguished initial symbol; I is the set of initial trees, a finite subset of T(F ∪ F0/ ∪ F↓ ); and A is the set of auxiliary trees, a finite subset of T(F ∪F0/ ∪F↓ ∪F∗ ). An auxiliary tree β whose root is labeled f must have exactly one node labeled with | f |∗ ∈ F∗ and no other nodes labeled in F∗ ; this node is its foot node, its address notated foot(β ). In Figure 1, α1 and α2 are initial trees; β1 and β2 are auxiliary trees. In order to allow reference to a particular tree in the set P, we associate with each tree in P a unique index, conventionally notated with a subscripted α or β for initial and auxiliary trees respectively. This further allows us to have multiple instances of a tree in I or A, distinguished by their index. (We will abuse notation by using the index and the tree that it names interchangably.) The trees are combined by two operations, substitution and adjunction. Under substitution, a

node labeled e↓ (at address p) in a tree α can be replaced by an initial tree α 0 with the corresponding label f at the root when | f | = e. The resulting tree, the substitution of α 0 at p in α, is α[p 7→ α 0 ]. Under adjunction, an internal node of α at p labeled f ∈ F is split apart, replaced by an auxiliary tree β rooted in f 0 when | f | = | f 0 |. The resulting tree, the adjunction of β at p in α, is α[p 7→ β [foot(β ) 7→ α/p]]. This definition (by requiring f to be in F, not F∗ or F↓ ) maintains the standard convention, without loss of expressivity, that adjunction is disallowed at foot nodes and substitution nodes. The TAG in Figure 1 generates a tree set whose yield is the non-context-free copy language { wcw | w ∈ {a, b}∗ }. The arities of the nodes are suppressed, as they are clear from context. A derivation tree D records the operations over the elementary trees used to derive a given derived tree. Each node in the derivation tree specifies an elementary tree α, the node’s child subtrees Di recording the derivations for trees that are adjoined or substituted into that tree. A method is required to record at which node in α the tree specified by child subtree Di operates. For trees recording derivations in context-free grammars, there are exactly as many substitution operations as nonterminals on the right-hand side of the rule used. Thus, child order in the derivation tree can be used to record the identity of the substitution node. But for TAG trees, operations occur throughout the tree, and some, namely adjunctions, can be optional, so a simple convention using child order is not possible. Traditionally, the branches in the derivation tree have been notated with the address of the node in the parent tree at which the child node operates. Figure 4 presents a derivation tree (a) using this notation, along with the corresponding derived tree (b) for the string abcab. For simplicity below, we use a stripped down TAG formalism, one that loses no expressivity in weak generative capacity but is easier for analysis purposes. First, we make all adjunction obligatory, in the

3

A

B0/ a

1

2

B↓ A∗

B b

Figure 3: Sample TAG tree marked with diacritics to show the permutation of operable nodes. sense that if a node in a tree allows adjunction, an adjunction must occur there. To get the effect of optional adjunction, for instance at a node labeled B, we add a vestigial tree of a single node εB = B∗ , which has no adjunction sites and does not itself modify any tree that it adjoins into. It thus founds the recursive structure of derivations. Second, now that it is determinate whether an operation must occur at a node, the number of children of a node in a derivation tree is determined by the elementary tree at that node; it is just the number of adjunction or substitution nodes in the tree, the OPERABLE NODES. All that is left to determine is the mapping between child order in the derivation tree and node in the elementary tree labeling the parent, that is, a permutation π on the operable nodes (or equivalently, their addresses), so that the i-th child of a node labeled α in a derivation tree is taken to specify the tree that operates at the node πi in α. This permutation can be thought of as specified as part of the elementary tree itself. For example, the tree in Figure 3, which requires operations at the nodes at addresses ε, 12, and 2, may be associated with the permutation h12, 2, εi. This permutation can be marked on the tree itself with numeric diacritics i , as shown in the figure. Finally, as mentioned before, we eliminate substitution (Joshi and Schabes, 1997, fn. 6). With these changes, the sample TAG grammar and derivation tree of Figures 1 and 4(a) might be expressed with the core TAG grammar and derivation tree of Figures 2 and 4(c).

3 3.1

Tree Transducers, Homomorphisms, and Automata Tree Transducers

Informally, a TREE TRANSDUCER is a function from T(F) to T(G) defined such that the symbol at the root ofthe input tree and a current state determines an output context in which the recursive images of the subtrees are placed. Formally, we

can define a transducer as a kind of functional program, that is, a set of equations characterized by the following grammar for equations Eqn. (The set of states is conventionally notated Q, with members notated q. One of the states is distinguished as the INITIAL STATE of the transducer.)1 q∈Q f ∈ F(n) g(n) ∈ G(n) xi ∈ X ::= x0 | x1 | x2 | · · · (n)

Eqn ::= q( f (n) (x1 , . . . , xn )) = τ (n) (n)

(n)

τ (n) ∈ R(n) ::= g(m) (τ1 , . . . , τm ) | q j (xi ) where 1 ≤ i ≤ n Intuitively speaking, the expressions in R(n) are right-hand-side terms using variables limited to the first n. For example, the grammar allows definition of the following set of equations defining a tree transducer:2 q( f (x)) = g(q0 (x), q(x)) q(a) = a q0 ( f (x)) = f (q0 (x)) q0 (a) = a This transducer allows for the following derivation: q( f ( f (a)) = g(q0 ( f (a), q( f (a)))) = g( f (q0 (a)), g(q0 (a), q(a))) = g( f (a), g(a, a)) The relation defined by a tree transducer with initial state q is { ht, ui | q(t) = u }. By virtue of nondeterminism in the equations, multiple equations for a given state q and symbol f , tree transducers define true relations rather than merely functions. T REE HOMOMORPHISMS are a subtype of tree transducers, those with only a single state, hence essentially stateless. Other subtypes of tree transducers can be defined by restricting the trees τ 1 Strictly speaking, what we define here are nondeterministic top-down tree transducers. 2 Full definitions of tree transducers typically describe a transducer in terms of a set of states, an input and output ranked alphabet, and an initial state, in addition to the set of transitions, that is, defining equations. We will leave off these details, in the expectation that the sets of states and symbols can be inferred from the equations, and the initial state determined under a convention that it is the state defined in the textually first equation. Note also that we avail ourselves of consistent renaming of the variables x1 , x2 , and so forth, where convenient for readability.

that form the right-hand sides of equations, the elements of R(n) used. A transducer is LINEAR if all such τ are linear; is COMPLETE if τ contains every variable in Xn ; is ε - FREE if τ 6∈ Xn ; is SYMBOL - TO - SYMBOL if height(τ) = 1; and is a DELABELING if τ is complete, linear, and symbolto-symbol. Another subcase is TREE AUTOMATA, tree transducers that compute a partial identity function; these are delabeling tree transducers that preserve the label and the order of arguments. Because they compute only the identity function, tree automata are of interest for their domains, not the mappings they compute. Their domains define tree languages, in particular, the so-called REGU LAR TREE LANGUAGES . 3.2

The Bimorphism Characterization of Tree Transducers

Tree transducers can be characterized directly in terms of equations defining a simple kind of functional program, as above. There is an elegant alternative characterization of tree transducers in terms of a constellation of elements of the various subtypes of transducers — homomorphisms and automata — we have introduced, called a bimorphism. A bimorphism is a triple hL, hi , ho i, consisting of a regular tree language L (or, equivalently, a tree automaton) and two tree homomorphisms hi and ho . The tree relation defined by a bimorphism is the set of tree pairs that are generable from elements of the tree language by the homomorphisms, that is, L(hL, hi , ho i) = {hhi (t), ho (t)i | t ∈ L}

.

We can limit attention to bimorphisms in which the input or output homomorphisms are restricted to a certain type, linear (L), complete (C), epsilonfree (F), symbol-to-symbol (S), delabeling (D), or unrestricted (M). We will write B(I, O) where I and O characterize a subclass of homomorphisms for the set of bimorphisms for which the input homomorphism is in the subclass indicated by I and the output homomorphism is in the subclass indicated by O. Thus, B(D, M) is the set of bimorphisms for which the input homomorphism is a delabeling but the output homomorphism can be arbitrary. The tree relations definable by tree transducers turn out to be exactly this class B(D, M) (Comon et al., 1997). The bimorphism notion thus allows us to characterize the tree transductions purely in terms of tree automata and tree homomorphisms.

We have shown (Shieber, 2004) that the tree relations defined by synchronous tree-substitution grammars were exactly the relations B(LC, LC). Intuitively speaking, the tree language in such a bimorphism represents the set of derivation trees for the synchronous grammar, and each homomorphism represents the relation between the derivation tree and the derived tree for one of the projected tree-substitution grammars. The homomorphisms are linear and complete because the tree relation between a tree-substitution grammar derivation tree and its associated derived tree is exactly a linear complete tree homomorphism. To characterize the tree relations defined by a synchronous tree-adjoining grammar, it similary suffices to find a simple homomorphism-like characterization of the tree relation between TAG derivation trees and derived trees. In Section 5 below, we show that linear complete embedded tree homomorphisms, which we introduce next, serve this purpose.

4

Embedded Tree Transducers

Embedded tree transducers are a generalization of tree transducers in which states are allowed to take a single additional argument in a restricted manner. They correspond to a restrictive subcase of macro tree transducers with one recursion variable. We use the term “embedded tree transducer” rather than the more cumbersome “monadic macro tree transducer” for brevity and by analogy with embedded pushdown automata (Schabes and Vijay-Shanker, 1990), another automata-theoretic characterization of the tree-adjoining languages. We modify the grammar of transducer equations to add an extra argument to each occurrence of a state q. To highlight the special nature of the extra argument, it is written in angle brackets before the input tree argument. We uniformly use the otherwise unused variable x0 for this argument in the left-hand side, and add x0 as a possible right-hand side itself. Finally, right-hand-side occurrences of states may be passed an arbitrary further righthand-side tree in this argument.

f

(n)

q∈Q ∈ F(n) xi ∈ X ::= x0 | x1 | x2 | · · · Eqn ::= qh[x0 ]i( f (n) (x1 , . . . , xn )) = τ (n) (n)

(n)

τ (n) ∈ R(n) ::= f (m) (τ1 , . . . , τm ) | x0 |

(n)

q j hτ j i(xi ) where 1 ≤ i ≤ n

Embedded transducers are strictly more expressive than traditional transducers, because the extra argument allows unbounded communication between positions unboundedly distant in depth in the output tree. For example, a simple embedded transducer can compute the reversal of a string, e.g., 1(2(2(nil))) reverses to 2(2(1(nil))). (This is not computable by a traditional tree transducer.) It is given by the following equations: rhi(x) = = r0 hx0 i(1(x)) = r0 hx0 i(2(x)) = r0 hx0 i(nil)

r0 hnili(x) x0 r0 h1(x0 )i(x) r0 h2(x0 )i(x)

(1)

This is, of course, just the normal accumulating reverse functional program, expressed as an embedded transducer. The additional power of embedded transducers is, we will show in this section, exactly what is needed to characterize the additional power that TAGs represent over CFGs in describing tree languages. In particular, we show that the relation between a TAG derivation tree and derived tree is characterized by a deterministic linear complete embedded tree transducer (DLCETT). The relation between tree-adjoining languages and embedded tree transducers may be implicit in a series of previous results in the formal-language theory literature.3 For instance, Fujiyoshi and Kasai (2000) show that linear, complete monadic context-free tree grammars generate exactly the tree-adjoining languages via a normal form for spine grammars. Separately, the relation between context-free tree grammars and macro tree transducers has been described, where the relationship between the monadic variants of each is implicit. Thus, taken together, an equivalence between the tree-adjoining languages and the image languages of monadic macro tree transducers might be pieced together. In the present work, we define the relation between tree-adjoining languages and linear complete monadic tree transducers directly, simply, and transparently, by giving explicit constructions in both directions, carefully handling the distinction between the unranked trees of tree-adjoining grammars and the ranked trees of macro tree transducers and other important issues of detail in the constructions. The proof requires reductions in both directions. First, we show that for any TAG we can construct a DLCETT that specifies the tree relation between the derivation trees for the TAG and the derived 3 We

are indebted to Uwe M¨onnich for this observation.

trees. Then, we show that for any DLCETT we can construct a TAG such that the tree relation between the derivation trees and derived trees is related through a simple homomorphism to the DLCETT tree relation. 4.1

From TAG to Transducer

Given an elementary tree α with the label A at its root, let the sequence π = hπ1 , . . . , πn i be a permutation on the nodes in α at which adjunction occurs. (We use this ordering by means of the diacritic representation below.) Then, if α is an auxiliary tree, construct the equation qA hx0 i(α(x1 , . . . , xn )) = bαc and if α is an initial tree, construct the equation qA hi(α(x1 , . . . , xn )) = bαc where the right-hand-side transformation b·c is defined by4 bA0/ (t1 , . . . ,tn )c b k A(t1 , . . . ,tn )c bA∗ c bac

A(bt1 c, . . . , btn c) qA hbA0/ (t1 , . . . ,tn )ci(xk ) x0 a (2) Note that the equations are linear and complete, because each variable xi is generated once as the tree α is traversed, namely at position πi in the traversal (marked with i ), and the variable x0 is generated at the foot node only. Thus, the generated embedded tree transducer is linear and complete. Because only one equation is generated per tree, the transducer is trivially deterministic. By way of example, we consider the core TAG grammar given by the following trees: α: βA : βB : εB : εC : εD :

= = = =

1 A(e) A0/ ( 1 B(a), 2 C( 3 D(A∗ ))) 1 B(b, B∗ ) B∗ C∗ D∗

4 It may seem like trickery to use the diacritics in this way, as they are not really components of the tree being traversed, but merely reflexes of an extrinsic ordering. But their use is benign. The same transformation can be defined, a bit more cumbersomely, keeping the permutation π separate, by tracking the permutation and the current address p in a revised transformation b·cπ,p defined as follows:

bA0/ (t1 , . . . ,tn )cπ,p bA(t1 , . . . ,tn )cπ,p bA∗ cπ,p bacπ,p

= = = =

A(bt1 cπ,p·1 , . . . , btn cπ,p·n ) qA hbA0/ (t1 , . . . ,tn )cπ,p i(xπ −1 (p) ) x0 a

We then use bαcπ,ε for the transformation of the tree α.

α1

As a final step, useful later for the bimorphism characterization of synchronous TAG, it is straightforward to show that the transducer so constructed is the composition of a regular tree language and a linear complete embedded tree homomorphism.

β1

4.2

β2

Given a linear complete embedded tree transducer, we construct a corresponding TAG as follows: For each rule of the form

S a

S S

b S 1 α2

α1

ε β1 2 β2

(a)

b a

S T

εS

c

(b)

(c)

Figure 4: Derivation and derived trees for the sample grammars: (a) derivation tree for the grammar of Figure 1; (b) corresponding derived tree; (c) corresponding derivation tree for the core TAG version of the grammar in Figure 2. Starting with the auxiliary tree βA = A0/ ( 1 B(a), 2 C( 3 D(A∗ ))), the adjunction sites, corresponding to the nodes labeled B, C, and D at addresses 1, 2, and 21, have been arbitrarily given a preorder permutation. We therefore construct the equation as follows: qA hx0 i(βA (x1 , x2 , x3 )) = bA0/ ( 1 B(a), 2 C( 3 D(A∗ )))c = A(b 1 B(a)c, b 2 C( 3 D(A∗ ))c) = A(qB hbB0/ (a)ci(x1 ), b 2 C( 3 D(A∗ ))c) = A(qB hB(bac)i(x1 ), b 2 C( 3 D(A∗ ))c) = ··· = A(qB hB(a)i(x1 ), qC hC(qD hD(x0 )i(x3 ))i(x2 ))

From Transducer to TAG

qi h[x0 ]i( f (m) (x1 , . . . , xm )) = τ we build a tree named hqi , f , τi. Where this tree appears is determined solely by the state qi , so we take the root node of the tree to be the state. Any foot node in the tree will also need to be marked with the same label, so we pass this information down as the tree is built inductively. The tree is therefore of the form qi 0/ (dτei ) where the right-hand-side transformation d·ei constructs the remainder of the tree by the inductive walk of τ, with the subscript noting that the root is labeled qi . d f (t1 , . . . ,tm )ei dq j hτi(xk )ei dx0 ei daei

= f0/ (dt1 ei , . . . , dtm ei ) = k q j (dτei ) = qi ∗ = a

Note that at x0 , a foot node is generated of the proper label. (Because the equation is linear, only one foot node is generated, and it is labeled appropriately by construction.) Where recursive processing of the input tree occurs (q j hτi(xl )), we Similar derivations for the remaining trees yield generate a tree that admits adjunctions at q j . The the (deterministic linear complete) embedded tree role of the diacritic k is merely to specify the pertransducer defined by the following set of equamutation of operable nodes for interpreting derivations: tion trees; it says that the k-th child in a derivation qA hi(α(x1 )) = qA hA(e)i(x1 ) tree rooted in the current elementary tree is taken qA hx0 i(βA (x1 , x2 , x3 )) = to specify adjunctions at this node. A(qB hB(a)i(x1 ), qC hC(qD hD(x0 )i(x3 ))i(x2 )) The trees generated by this TAG are intended qB hx0 i(βB (x1 )) = qB hB(b, x0 )i(x1 ) to correspond to the outputs of the corresponding qB hx0 i(εB ()) = x0 tree transducer. Because of the more severe conqC hx0 i(εC ()) = x0 straints on TAG, in particular that all combinatoqD hx0 i(εD ()) = x0 rial limitations on putting subtrees together must We can use this transducer to compute the derived be manifest in the labels in the trees themselves, tree for the derivation tree α(βA (βB (εB ), εC , εD )). the outputs actually contain more structure than the corresponding transducer output. In particuqA hi(α(βA (βB (εB ), εC , εD ))) lar, the state-labeled nodes are merely for book= qA hA(e)i(βA (βB (εB ), εC , εD )) keeping. A homomorphism removing these nodes = A( qB hB(a)i(βB (εB )), gives the desired transducer output. Most imporqC hC(qD hD(A(e))i(εD ))i(εC )) = A(qB hB(b, B(a))i(εB ),C(qD hD(A(e))i(εD ))) tantly, then, the weak generative capacity of TAGs and LCETTs are identical. = A(B(b, B(a)),C(D(A(e))))

Some examples may clarify the construction. Recall the reversal embedded transducer in (1) above. The construction above generates a TAG containing the following trees. We have given them indicative names rather than the cumbersome ones of the form hqi , f , τi. α: βnil : β1 : β2 :

r0/ (1 : r0 (nil)) r0 0/ (r0 ∗ ) r0 0/ (1 : r0 (10/ (r0 ∗ ))) r0 0/ (1 : r0 (20/ (r0 ∗ )))

It is simple to verify that the derivation tree α(β1 (β2 (β2 (βnil )))) derives the tree r(r06 (2(r0 (2(r0 (1(r0 (nil)))))))) Simple homomorphisms that extract the input function symbols on the input and drop the bookkeeping states on the output reduce these trees to 1(2(2(nil))) and 2(2(1(nil))) respectively, just as for the corresponding tree transducer.

5

Synchronous TAGs as Bimorphisms

The major advantage of characterizing TAG derivation in terms of tree transducers (via the compilation (2)) is the integration of synchronous TAGs into the bimorphism framework. A synchronous TAG (Shieber, 1994) is composed of a set of triples htL ,tR , _i where the two trees tL and tR are elementary trees and _ is a set of links specifying pairs of linked operable nodes from tL and tR . Without loss of generality, we can stipulate that each operable node in each tree is impinged upon by exactly one link in _. (If a node is unlinked, the triple can never be used; if overlinked, a set of replacement triples can be “multiplied out”.) In this case, a projection of the triples on first or second component, with a permutation defined by the corresponding projections on the links, is exactly a TAG as defined above. Thus, derivations proceed just as in a single TAG except that nodes linked by some link in _ are simultaneously operated on by paired trees derived by the grammar. In order to model a synchronous grammar formalism as a bimorphism, the well-formed derivations of the synchronous formalism must be characterizable as a regular tree language and the relation between such derivation trees and each of the paired derived trees as a homomorphism of some sort. For synchronous tree-substitution grammars, derivation trees are regular tree languages, and the

map from derivation to each of the paired derived trees is a linear complete tree homomorphism. Thus, synchronous tree-substitution grammars fall in the class of bimorphisms B(LC, LC). The other direction can be shown as well; all bimorphisms in B(LC, LC) define tree relations expressible by an STSG. A similar result follows immediately for STAG. Crucially relying on the result above that the derivation relation is a DLCETT, we can use the method of Shieber (2004) directly to characterize the synchronous TAG tree relations as just B(ELC, ELC). We have thus integrated synchronous TAG with the other transducer and synchronous grammar formalisms falling under the bimorphism umbrella.

Acknowledgements We wish to thank Mark Dras, Uwe M¨onnich, Rebecca Nesson, James Rogers, and Ken Shan for helpful discussions on the topic of this paper. This work was supported in part by grant IIS-0329089 from the National Science Foundation.

References H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. 1997. Tree automata techniques and applications. Available at: http://www.grappa.univ-lille3.fr/ tata. Release of October 1, 2002. A. Fujiyoshi and T. Kasai. 2000. Spinal-formed context-free tree grammars. Theory of Computing Systems, 33:59–83. Aravind Joshi and Yves Schabes. 1997. Treeadjoining grammars. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, pages 69–124. Springer, Berlin. Yves Schabes and K. Vijay-Shanker. 1990. Deterministic left to right parsing of tree adjoining languages. In Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pages 276– 283, Pittsburgh, Pennsylvania, 6–9 June. Stuart M. Shieber. 1994. Restricting the weakgenerative capacity of synchronous tree-adjoining grammars. Computational Intelligence, 10(4):371– 385, November. Also available as cmp-lg/9404003. Stuart M. Shieber. 2004. Synchronous grammars as tree transducers. In Proceedings of the Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+7), pages 88– 95, Vancouver, Canada, May 20-22.

Stuart M. Shieber. 2006. Unifying Synchronous Tree-Adjoining Grammars and Tree Transducers via Bimorphisms. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06), Trento, Italy, 3–7 April.

Stuart M. Shieber Division of Engineering and Applied Sciences Harvard University Cambridge, MA, USA [email protected]

Abstract We place synchronous tree-adjoining grammars and tree transducers in the single overarching framework of bimorphisms, continuing the unification of synchronous grammars and tree transducers initiated by Shieber (2004). Along the way, we present a new definition of the tree-adjoining grammar derivation relation based on a novel direct inter-reduction of TAG and monadic macro tree transducers. Tree transformation systems such as tree transducers and synchronous grammars have seen renewed interest, based on a perceived relevance to new applications, such as importing syntactic structure into statistical machine translation models or founding a formalism for speech command and control. The exact relationship among a variety of formalisms has been unclear, with a large number of seemingly unrelated formalisms being independently proposed or characterized. An initial step toward unifying the formalisms was taken (Shieber, 2004) in making use of the formallanguage-theoretic device of bimorphisms, previously used to characterize the tree relations definable by tree transducers. In particular, the tree relations definable by synchronous tree-substitution grammars (STSG) were shown to be just those definable by linear complete bimorphisms, thereby providing for the first time a clear relationship between synchronous grammars and tree transducers. In this work, we show how the bimorphism framework can be used to capture a more powerful formalism, synchronous tree-adjoining grammars, providing a further uniting of the various and disparate formalisms.

After some preliminaries (Section 1), we begin by recalling the definition of tree-adjoining grammars and synchronous tree-adjoining grammars (Section 2). We turn then to a set of known results relating context-free languages, tree homomorphisms, tree automata, and tree transducers to extend them for the tree-adjoining languages (Section 3), presenting these in terms of restricted kinds of functional programs over trees, using a simple grammatical notation for describing the programs. This allows us to easily express generalizations of the notions: monadic macro tree homomorphisms, automata, and transducers, which bear (at least some of) the same interrelationships that their traditional simpler counterparts do (Section 4). Finally, we use this characterization to place the synchronous TAG formalism in the bimorphism framework (Section 5), further unifying tree transducers and other synchronous grammar formalisms. We also, in passing, provide a new characterization of the relation between TAG derivation and derived trees, and a new simpler and more direct proof of the equivalence of TALs and the output languages of monadic macro tree transducers.

1

Preliminaries

We will notate sequences with angle brackets, e.g., ha, b, ci, or where no confusion results, simply as abc, with the empty string written ε. Trees will have nodes labeled with elements of a RANKED ALPHABET, a set of symbols F, each with a non-negative integer RANK or ARITY assigned to it, determining the number of children for nodes so labeled. To emphasize the arity of a symbol, we will write it as a parenthesized superscript, for instance f (n) for a symbol f of arity n. Analogously, we write F(n) for the set of symbols in F with arity n. Symbols with arity zero (F(0) ) are called NULLARY symbols or CON -

The set of nonconstants is written F(≥1) . To express incomplete trees, trees with “holes” waiting to be filled, we will allow leaves to be labeled with variables, in addition to nullary symbols. The set of TREES OVER A RANKED AL PHABET F AND VARIABLES X, notated T(F, X), is the smallest set such that (i) f ∈ T(F, X) for all f ∈ F(0) ; (ii) x ∈ T(F, X) for all x ∈ X; and (iii) f (t1 , . . . ,tn ) ∈ T(F, X) for all f ∈ F(≥1) , and t1 , . . . ,tn ∈ T(F, X). We abbreviate T(F, 0), / where the set of variables is empty, as T(F), the set of GROUND TREES over F. We will also make use of the set of n numerically ordered variables Xn = {x1 , . . . , xn }, and write x, y, z as synonyms for x1 , x2 , x3 , respectively. Trees can also be viewed as mappings from TREE ADDRESSES , sequences of integers, to the labels of nodes at those addresses. The address ε is the address of the root, 1 the address of the first child, 12 the address of the second child of the first child, and so forth. We will use the notation t/p to pick out the subtree of the node at address p in the tree t. Replacing the subtree of t at address p by a tree t 0 , written t[p 7→ t 0 ] is defined as (using · for the insertion of an element on a list) STANTS.

the definition given above): f (n) ∈ F(n) x ∈ X ::= x0 | x1 | x2 | · · · t ∈ T(F, X) ::= f (m) (t1 , . . . ,tm ) | x The notation allows definition of classes of expressions (e.g., F(n) ) and specifies metavariables over them ( f (n) ). These classes can be primitive (F(n) ) or defined (X), even inductively in terms of other classes or themselves (T(F, X)). We use the metavariables and subscripted variants on the right-hand side to represent an arbitrary element of the corresponding class. Thus, the elements t1 , . . . ,tm stand for arbitrary trees in T(F, X), and x an arbitrary variable in X. Because numerically subscripted versions of x appear explicitly on the right hand side of the rule defining variables, numerically subscripted variables (e.g., x1 ) on the right-hand side of all rules are taken to refer to the specific elements of x, whereas otherwise subscripted elements (e.g., xi ) are taken generically.

2

Tree-Adjoining Grammars

Tree adjoining grammar (TAG) is a tree grammar formalism distinguished by its use of a tree adjunction operation. Traditional presentations . of TAG, which we will assume familiarity with, take the symbols in elementary and derived trees to be unranked; nodes labeled with a given nonThe HEIGHT of a tree t, notated height(t), is determinal symbol may have differing numbers of fined as follows: height(x) = 0 for all x ∈ X and n children. (Joshi and Schabes (1997) present a height( f (t1 , . . . ,tn )) = 1 + maxi=1 height(ti ) for all good overview.) For example, foot nodes of auxf ∈ F. iliary trees and substitution nodes have no chilWe can use trees with variables as CONTEXTS dren, whereas the similarly labeled root nodes in which to place other trees. A tree in T(F, Xn ) must have at least one. Similarly, two nodes with will be called a context, typically denoted with the the same label but differing numbers of children symbol C. For a context C ∈ T(F, Xn ) and a semay match for the purpose of allowing an adquence of n trees t1 , . . . ,tn ∈ T(F), the SUBSTITU junction (as the root nodes of α1 and β1 in FigTION OF t1 , . . . ,tn INTO C, notated C[t1 , . . . ,tn ], is ure 1). In order to integrate TAG with tree transdefined inductively as follows: ducers, however, we move to a ranked alphabet, which presents some problems and opportunities. ( f (u1 , . . . , um ))[t1 , . . . ,tn ] (In some ways, the ranked alphabet definition of = f (u1 [t1 , . . . ,tn ], . . . , um [t1 , . . . ,tn ]) TAGs is slightly more elegant than the traditional xi [t1 , . . . ,tn ] = ti . one.) Although the bulk of the later discussion integrating TAGs and transducers assumes (withA tree t ∈ T(F, X) is LINEAR if and only if no out loss of expressivity (Joshi and Schabes, 1997, variable in X occurs more than once in t. fn. 6)) a limited form of TAG that includes adjuncWe will use a notation akin to BNF to specify tion but not substitution, we define the more comequations defining functional programs of various plete form here. We will thus take the nodes of TAG trees to be sorts. As an introduction to the notation we will use, here is a grammar defining trees over a ranked labeled with symbols from a ranked alphabet F; alphabet and variables (essentially identically to a given symbol then has a fixed arity and a fixed t[ε 7→ t 0 ] = t 0 f (t1 , . . . ,tn )[(i · p) 7→ t 0 ] = f (t1 , . . . ,ti [p 7→ t 0 ], . . . ,tn ) for 1 ≤ i ≤ n

α1 : S

α2 : T

T↓

c

β1 : S0/ a

β2 : S0/ S

S∗

S∗

1

S T

S

b a

α1 :

b

c

β1 : S0/ a

1

S∗

β2 : S0/ S

b a

1

S∗

εS : S ∗ S b

Figure 1: Sample TAG for the copy language { wcw | w ∈ {a, b}∗ }.

Figure 2: Sample core-restricted TAG for the copy language { wcw | w ∈ {a, b}∗ }.

number of children. However, in order to maintain information about which symbols may match for the purpose of adjunction and substitution, we take the elements of F to be explicitly formed as pairs of an unranked label e and an arity n. (For notational consistency, we will use e for unranked and f for ranked symbols.) We will notate these elements, abusing notation, as e(n) , and make use of a function |·| to unrank symbols in F, so that |e(n) | = e. To handle foot nodes, for each non-nullary symbol e(i) ∈ F(≥1) , we will associate a new nullary symbol e∗ , which one can take to be the pair of e and ∗; the set of such symbols will be notated F∗ . Similarly, for substitution nodes, F↓ will be the set of nullary symbols e↓ for all e(i) ∈ F(≥1) . These additional symbols, since they are nullary, will necessarily appear only at the frontier of trees. Finally, to allow null adjoining constraints, for each f ∈ F(i) , we introduce a symbol f0/ also of arity i, and take F0/ to be the set of all such symbols. We will extend the function |·| to provide the unranked symbol associated with these symbols as well, so |e↓ | = |e∗ | = |e(i) 0/ | = e. A TAG is then a quadruple hF, S, I, Ai, where F is a ranked alphabet; S ∈ F is a distinguished initial symbol; I is the set of initial trees, a finite subset of T(F ∪ F0/ ∪ F↓ ); and A is the set of auxiliary trees, a finite subset of T(F ∪F0/ ∪F↓ ∪F∗ ). An auxiliary tree β whose root is labeled f must have exactly one node labeled with | f |∗ ∈ F∗ and no other nodes labeled in F∗ ; this node is its foot node, its address notated foot(β ). In Figure 1, α1 and α2 are initial trees; β1 and β2 are auxiliary trees. In order to allow reference to a particular tree in the set P, we associate with each tree in P a unique index, conventionally notated with a subscripted α or β for initial and auxiliary trees respectively. This further allows us to have multiple instances of a tree in I or A, distinguished by their index. (We will abuse notation by using the index and the tree that it names interchangably.) The trees are combined by two operations, substitution and adjunction. Under substitution, a

node labeled e↓ (at address p) in a tree α can be replaced by an initial tree α 0 with the corresponding label f at the root when | f | = e. The resulting tree, the substitution of α 0 at p in α, is α[p 7→ α 0 ]. Under adjunction, an internal node of α at p labeled f ∈ F is split apart, replaced by an auxiliary tree β rooted in f 0 when | f | = | f 0 |. The resulting tree, the adjunction of β at p in α, is α[p 7→ β [foot(β ) 7→ α/p]]. This definition (by requiring f to be in F, not F∗ or F↓ ) maintains the standard convention, without loss of expressivity, that adjunction is disallowed at foot nodes and substitution nodes. The TAG in Figure 1 generates a tree set whose yield is the non-context-free copy language { wcw | w ∈ {a, b}∗ }. The arities of the nodes are suppressed, as they are clear from context. A derivation tree D records the operations over the elementary trees used to derive a given derived tree. Each node in the derivation tree specifies an elementary tree α, the node’s child subtrees Di recording the derivations for trees that are adjoined or substituted into that tree. A method is required to record at which node in α the tree specified by child subtree Di operates. For trees recording derivations in context-free grammars, there are exactly as many substitution operations as nonterminals on the right-hand side of the rule used. Thus, child order in the derivation tree can be used to record the identity of the substitution node. But for TAG trees, operations occur throughout the tree, and some, namely adjunctions, can be optional, so a simple convention using child order is not possible. Traditionally, the branches in the derivation tree have been notated with the address of the node in the parent tree at which the child node operates. Figure 4 presents a derivation tree (a) using this notation, along with the corresponding derived tree (b) for the string abcab. For simplicity below, we use a stripped down TAG formalism, one that loses no expressivity in weak generative capacity but is easier for analysis purposes. First, we make all adjunction obligatory, in the

3

A

B0/ a

1

2

B↓ A∗

B b

Figure 3: Sample TAG tree marked with diacritics to show the permutation of operable nodes. sense that if a node in a tree allows adjunction, an adjunction must occur there. To get the effect of optional adjunction, for instance at a node labeled B, we add a vestigial tree of a single node εB = B∗ , which has no adjunction sites and does not itself modify any tree that it adjoins into. It thus founds the recursive structure of derivations. Second, now that it is determinate whether an operation must occur at a node, the number of children of a node in a derivation tree is determined by the elementary tree at that node; it is just the number of adjunction or substitution nodes in the tree, the OPERABLE NODES. All that is left to determine is the mapping between child order in the derivation tree and node in the elementary tree labeling the parent, that is, a permutation π on the operable nodes (or equivalently, their addresses), so that the i-th child of a node labeled α in a derivation tree is taken to specify the tree that operates at the node πi in α. This permutation can be thought of as specified as part of the elementary tree itself. For example, the tree in Figure 3, which requires operations at the nodes at addresses ε, 12, and 2, may be associated with the permutation h12, 2, εi. This permutation can be marked on the tree itself with numeric diacritics i , as shown in the figure. Finally, as mentioned before, we eliminate substitution (Joshi and Schabes, 1997, fn. 6). With these changes, the sample TAG grammar and derivation tree of Figures 1 and 4(a) might be expressed with the core TAG grammar and derivation tree of Figures 2 and 4(c).

3 3.1

Tree Transducers, Homomorphisms, and Automata Tree Transducers

Informally, a TREE TRANSDUCER is a function from T(F) to T(G) defined such that the symbol at the root ofthe input tree and a current state determines an output context in which the recursive images of the subtrees are placed. Formally, we

can define a transducer as a kind of functional program, that is, a set of equations characterized by the following grammar for equations Eqn. (The set of states is conventionally notated Q, with members notated q. One of the states is distinguished as the INITIAL STATE of the transducer.)1 q∈Q f ∈ F(n) g(n) ∈ G(n) xi ∈ X ::= x0 | x1 | x2 | · · · (n)

Eqn ::= q( f (n) (x1 , . . . , xn )) = τ (n) (n)

(n)

τ (n) ∈ R(n) ::= g(m) (τ1 , . . . , τm ) | q j (xi ) where 1 ≤ i ≤ n Intuitively speaking, the expressions in R(n) are right-hand-side terms using variables limited to the first n. For example, the grammar allows definition of the following set of equations defining a tree transducer:2 q( f (x)) = g(q0 (x), q(x)) q(a) = a q0 ( f (x)) = f (q0 (x)) q0 (a) = a This transducer allows for the following derivation: q( f ( f (a)) = g(q0 ( f (a), q( f (a)))) = g( f (q0 (a)), g(q0 (a), q(a))) = g( f (a), g(a, a)) The relation defined by a tree transducer with initial state q is { ht, ui | q(t) = u }. By virtue of nondeterminism in the equations, multiple equations for a given state q and symbol f , tree transducers define true relations rather than merely functions. T REE HOMOMORPHISMS are a subtype of tree transducers, those with only a single state, hence essentially stateless. Other subtypes of tree transducers can be defined by restricting the trees τ 1 Strictly speaking, what we define here are nondeterministic top-down tree transducers. 2 Full definitions of tree transducers typically describe a transducer in terms of a set of states, an input and output ranked alphabet, and an initial state, in addition to the set of transitions, that is, defining equations. We will leave off these details, in the expectation that the sets of states and symbols can be inferred from the equations, and the initial state determined under a convention that it is the state defined in the textually first equation. Note also that we avail ourselves of consistent renaming of the variables x1 , x2 , and so forth, where convenient for readability.

that form the right-hand sides of equations, the elements of R(n) used. A transducer is LINEAR if all such τ are linear; is COMPLETE if τ contains every variable in Xn ; is ε - FREE if τ 6∈ Xn ; is SYMBOL - TO - SYMBOL if height(τ) = 1; and is a DELABELING if τ is complete, linear, and symbolto-symbol. Another subcase is TREE AUTOMATA, tree transducers that compute a partial identity function; these are delabeling tree transducers that preserve the label and the order of arguments. Because they compute only the identity function, tree automata are of interest for their domains, not the mappings they compute. Their domains define tree languages, in particular, the so-called REGU LAR TREE LANGUAGES . 3.2

The Bimorphism Characterization of Tree Transducers

Tree transducers can be characterized directly in terms of equations defining a simple kind of functional program, as above. There is an elegant alternative characterization of tree transducers in terms of a constellation of elements of the various subtypes of transducers — homomorphisms and automata — we have introduced, called a bimorphism. A bimorphism is a triple hL, hi , ho i, consisting of a regular tree language L (or, equivalently, a tree automaton) and two tree homomorphisms hi and ho . The tree relation defined by a bimorphism is the set of tree pairs that are generable from elements of the tree language by the homomorphisms, that is, L(hL, hi , ho i) = {hhi (t), ho (t)i | t ∈ L}

.

We can limit attention to bimorphisms in which the input or output homomorphisms are restricted to a certain type, linear (L), complete (C), epsilonfree (F), symbol-to-symbol (S), delabeling (D), or unrestricted (M). We will write B(I, O) where I and O characterize a subclass of homomorphisms for the set of bimorphisms for which the input homomorphism is in the subclass indicated by I and the output homomorphism is in the subclass indicated by O. Thus, B(D, M) is the set of bimorphisms for which the input homomorphism is a delabeling but the output homomorphism can be arbitrary. The tree relations definable by tree transducers turn out to be exactly this class B(D, M) (Comon et al., 1997). The bimorphism notion thus allows us to characterize the tree transductions purely in terms of tree automata and tree homomorphisms.

We have shown (Shieber, 2004) that the tree relations defined by synchronous tree-substitution grammars were exactly the relations B(LC, LC). Intuitively speaking, the tree language in such a bimorphism represents the set of derivation trees for the synchronous grammar, and each homomorphism represents the relation between the derivation tree and the derived tree for one of the projected tree-substitution grammars. The homomorphisms are linear and complete because the tree relation between a tree-substitution grammar derivation tree and its associated derived tree is exactly a linear complete tree homomorphism. To characterize the tree relations defined by a synchronous tree-adjoining grammar, it similary suffices to find a simple homomorphism-like characterization of the tree relation between TAG derivation trees and derived trees. In Section 5 below, we show that linear complete embedded tree homomorphisms, which we introduce next, serve this purpose.

4

Embedded Tree Transducers

Embedded tree transducers are a generalization of tree transducers in which states are allowed to take a single additional argument in a restricted manner. They correspond to a restrictive subcase of macro tree transducers with one recursion variable. We use the term “embedded tree transducer” rather than the more cumbersome “monadic macro tree transducer” for brevity and by analogy with embedded pushdown automata (Schabes and Vijay-Shanker, 1990), another automata-theoretic characterization of the tree-adjoining languages. We modify the grammar of transducer equations to add an extra argument to each occurrence of a state q. To highlight the special nature of the extra argument, it is written in angle brackets before the input tree argument. We uniformly use the otherwise unused variable x0 for this argument in the left-hand side, and add x0 as a possible right-hand side itself. Finally, right-hand-side occurrences of states may be passed an arbitrary further righthand-side tree in this argument.

f

(n)

q∈Q ∈ F(n) xi ∈ X ::= x0 | x1 | x2 | · · · Eqn ::= qh[x0 ]i( f (n) (x1 , . . . , xn )) = τ (n) (n)

(n)

τ (n) ∈ R(n) ::= f (m) (τ1 , . . . , τm ) | x0 |

(n)

q j hτ j i(xi ) where 1 ≤ i ≤ n

Embedded transducers are strictly more expressive than traditional transducers, because the extra argument allows unbounded communication between positions unboundedly distant in depth in the output tree. For example, a simple embedded transducer can compute the reversal of a string, e.g., 1(2(2(nil))) reverses to 2(2(1(nil))). (This is not computable by a traditional tree transducer.) It is given by the following equations: rhi(x) = = r0 hx0 i(1(x)) = r0 hx0 i(2(x)) = r0 hx0 i(nil)

r0 hnili(x) x0 r0 h1(x0 )i(x) r0 h2(x0 )i(x)

(1)

This is, of course, just the normal accumulating reverse functional program, expressed as an embedded transducer. The additional power of embedded transducers is, we will show in this section, exactly what is needed to characterize the additional power that TAGs represent over CFGs in describing tree languages. In particular, we show that the relation between a TAG derivation tree and derived tree is characterized by a deterministic linear complete embedded tree transducer (DLCETT). The relation between tree-adjoining languages and embedded tree transducers may be implicit in a series of previous results in the formal-language theory literature.3 For instance, Fujiyoshi and Kasai (2000) show that linear, complete monadic context-free tree grammars generate exactly the tree-adjoining languages via a normal form for spine grammars. Separately, the relation between context-free tree grammars and macro tree transducers has been described, where the relationship between the monadic variants of each is implicit. Thus, taken together, an equivalence between the tree-adjoining languages and the image languages of monadic macro tree transducers might be pieced together. In the present work, we define the relation between tree-adjoining languages and linear complete monadic tree transducers directly, simply, and transparently, by giving explicit constructions in both directions, carefully handling the distinction between the unranked trees of tree-adjoining grammars and the ranked trees of macro tree transducers and other important issues of detail in the constructions. The proof requires reductions in both directions. First, we show that for any TAG we can construct a DLCETT that specifies the tree relation between the derivation trees for the TAG and the derived 3 We

are indebted to Uwe M¨onnich for this observation.

trees. Then, we show that for any DLCETT we can construct a TAG such that the tree relation between the derivation trees and derived trees is related through a simple homomorphism to the DLCETT tree relation. 4.1

From TAG to Transducer

Given an elementary tree α with the label A at its root, let the sequence π = hπ1 , . . . , πn i be a permutation on the nodes in α at which adjunction occurs. (We use this ordering by means of the diacritic representation below.) Then, if α is an auxiliary tree, construct the equation qA hx0 i(α(x1 , . . . , xn )) = bαc and if α is an initial tree, construct the equation qA hi(α(x1 , . . . , xn )) = bαc where the right-hand-side transformation b·c is defined by4 bA0/ (t1 , . . . ,tn )c b k A(t1 , . . . ,tn )c bA∗ c bac

A(bt1 c, . . . , btn c) qA hbA0/ (t1 , . . . ,tn )ci(xk ) x0 a (2) Note that the equations are linear and complete, because each variable xi is generated once as the tree α is traversed, namely at position πi in the traversal (marked with i ), and the variable x0 is generated at the foot node only. Thus, the generated embedded tree transducer is linear and complete. Because only one equation is generated per tree, the transducer is trivially deterministic. By way of example, we consider the core TAG grammar given by the following trees: α: βA : βB : εB : εC : εD :

= = = =

1 A(e) A0/ ( 1 B(a), 2 C( 3 D(A∗ ))) 1 B(b, B∗ ) B∗ C∗ D∗

4 It may seem like trickery to use the diacritics in this way, as they are not really components of the tree being traversed, but merely reflexes of an extrinsic ordering. But their use is benign. The same transformation can be defined, a bit more cumbersomely, keeping the permutation π separate, by tracking the permutation and the current address p in a revised transformation b·cπ,p defined as follows:

bA0/ (t1 , . . . ,tn )cπ,p bA(t1 , . . . ,tn )cπ,p bA∗ cπ,p bacπ,p

= = = =

A(bt1 cπ,p·1 , . . . , btn cπ,p·n ) qA hbA0/ (t1 , . . . ,tn )cπ,p i(xπ −1 (p) ) x0 a

We then use bαcπ,ε for the transformation of the tree α.

α1

As a final step, useful later for the bimorphism characterization of synchronous TAG, it is straightforward to show that the transducer so constructed is the composition of a regular tree language and a linear complete embedded tree homomorphism.

β1

4.2

β2

Given a linear complete embedded tree transducer, we construct a corresponding TAG as follows: For each rule of the form

S a

S S

b S 1 α2

α1

ε β1 2 β2

(a)

b a

S T

εS

c

(b)

(c)

Figure 4: Derivation and derived trees for the sample grammars: (a) derivation tree for the grammar of Figure 1; (b) corresponding derived tree; (c) corresponding derivation tree for the core TAG version of the grammar in Figure 2. Starting with the auxiliary tree βA = A0/ ( 1 B(a), 2 C( 3 D(A∗ ))), the adjunction sites, corresponding to the nodes labeled B, C, and D at addresses 1, 2, and 21, have been arbitrarily given a preorder permutation. We therefore construct the equation as follows: qA hx0 i(βA (x1 , x2 , x3 )) = bA0/ ( 1 B(a), 2 C( 3 D(A∗ )))c = A(b 1 B(a)c, b 2 C( 3 D(A∗ ))c) = A(qB hbB0/ (a)ci(x1 ), b 2 C( 3 D(A∗ ))c) = A(qB hB(bac)i(x1 ), b 2 C( 3 D(A∗ ))c) = ··· = A(qB hB(a)i(x1 ), qC hC(qD hD(x0 )i(x3 ))i(x2 ))

From Transducer to TAG

qi h[x0 ]i( f (m) (x1 , . . . , xm )) = τ we build a tree named hqi , f , τi. Where this tree appears is determined solely by the state qi , so we take the root node of the tree to be the state. Any foot node in the tree will also need to be marked with the same label, so we pass this information down as the tree is built inductively. The tree is therefore of the form qi 0/ (dτei ) where the right-hand-side transformation d·ei constructs the remainder of the tree by the inductive walk of τ, with the subscript noting that the root is labeled qi . d f (t1 , . . . ,tm )ei dq j hτi(xk )ei dx0 ei daei

= f0/ (dt1 ei , . . . , dtm ei ) = k q j (dτei ) = qi ∗ = a

Note that at x0 , a foot node is generated of the proper label. (Because the equation is linear, only one foot node is generated, and it is labeled appropriately by construction.) Where recursive processing of the input tree occurs (q j hτi(xl )), we Similar derivations for the remaining trees yield generate a tree that admits adjunctions at q j . The the (deterministic linear complete) embedded tree role of the diacritic k is merely to specify the pertransducer defined by the following set of equamutation of operable nodes for interpreting derivations: tion trees; it says that the k-th child in a derivation qA hi(α(x1 )) = qA hA(e)i(x1 ) tree rooted in the current elementary tree is taken qA hx0 i(βA (x1 , x2 , x3 )) = to specify adjunctions at this node. A(qB hB(a)i(x1 ), qC hC(qD hD(x0 )i(x3 ))i(x2 )) The trees generated by this TAG are intended qB hx0 i(βB (x1 )) = qB hB(b, x0 )i(x1 ) to correspond to the outputs of the corresponding qB hx0 i(εB ()) = x0 tree transducer. Because of the more severe conqC hx0 i(εC ()) = x0 straints on TAG, in particular that all combinatoqD hx0 i(εD ()) = x0 rial limitations on putting subtrees together must We can use this transducer to compute the derived be manifest in the labels in the trees themselves, tree for the derivation tree α(βA (βB (εB ), εC , εD )). the outputs actually contain more structure than the corresponding transducer output. In particuqA hi(α(βA (βB (εB ), εC , εD ))) lar, the state-labeled nodes are merely for book= qA hA(e)i(βA (βB (εB ), εC , εD )) keeping. A homomorphism removing these nodes = A( qB hB(a)i(βB (εB )), gives the desired transducer output. Most imporqC hC(qD hD(A(e))i(εD ))i(εC )) = A(qB hB(b, B(a))i(εB ),C(qD hD(A(e))i(εD ))) tantly, then, the weak generative capacity of TAGs and LCETTs are identical. = A(B(b, B(a)),C(D(A(e))))

Some examples may clarify the construction. Recall the reversal embedded transducer in (1) above. The construction above generates a TAG containing the following trees. We have given them indicative names rather than the cumbersome ones of the form hqi , f , τi. α: βnil : β1 : β2 :

r0/ (1 : r0 (nil)) r0 0/ (r0 ∗ ) r0 0/ (1 : r0 (10/ (r0 ∗ ))) r0 0/ (1 : r0 (20/ (r0 ∗ )))

It is simple to verify that the derivation tree α(β1 (β2 (β2 (βnil )))) derives the tree r(r06 (2(r0 (2(r0 (1(r0 (nil)))))))) Simple homomorphisms that extract the input function symbols on the input and drop the bookkeeping states on the output reduce these trees to 1(2(2(nil))) and 2(2(1(nil))) respectively, just as for the corresponding tree transducer.

5

Synchronous TAGs as Bimorphisms

The major advantage of characterizing TAG derivation in terms of tree transducers (via the compilation (2)) is the integration of synchronous TAGs into the bimorphism framework. A synchronous TAG (Shieber, 1994) is composed of a set of triples htL ,tR , _i where the two trees tL and tR are elementary trees and _ is a set of links specifying pairs of linked operable nodes from tL and tR . Without loss of generality, we can stipulate that each operable node in each tree is impinged upon by exactly one link in _. (If a node is unlinked, the triple can never be used; if overlinked, a set of replacement triples can be “multiplied out”.) In this case, a projection of the triples on first or second component, with a permutation defined by the corresponding projections on the links, is exactly a TAG as defined above. Thus, derivations proceed just as in a single TAG except that nodes linked by some link in _ are simultaneously operated on by paired trees derived by the grammar. In order to model a synchronous grammar formalism as a bimorphism, the well-formed derivations of the synchronous formalism must be characterizable as a regular tree language and the relation between such derivation trees and each of the paired derived trees as a homomorphism of some sort. For synchronous tree-substitution grammars, derivation trees are regular tree languages, and the

map from derivation to each of the paired derived trees is a linear complete tree homomorphism. Thus, synchronous tree-substitution grammars fall in the class of bimorphisms B(LC, LC). The other direction can be shown as well; all bimorphisms in B(LC, LC) define tree relations expressible by an STSG. A similar result follows immediately for STAG. Crucially relying on the result above that the derivation relation is a DLCETT, we can use the method of Shieber (2004) directly to characterize the synchronous TAG tree relations as just B(ELC, ELC). We have thus integrated synchronous TAG with the other transducer and synchronous grammar formalisms falling under the bimorphism umbrella.

Acknowledgements We wish to thank Mark Dras, Uwe M¨onnich, Rebecca Nesson, James Rogers, and Ken Shan for helpful discussions on the topic of this paper. This work was supported in part by grant IIS-0329089 from the National Science Foundation.

References H. Comon, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. 1997. Tree automata techniques and applications. Available at: http://www.grappa.univ-lille3.fr/ tata. Release of October 1, 2002. A. Fujiyoshi and T. Kasai. 2000. Spinal-formed context-free tree grammars. Theory of Computing Systems, 33:59–83. Aravind Joshi and Yves Schabes. 1997. Treeadjoining grammars. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages, volume 3, pages 69–124. Springer, Berlin. Yves Schabes and K. Vijay-Shanker. 1990. Deterministic left to right parsing of tree adjoining languages. In Proceedings of the 28th Annual Meeting of the Association for Computational Linguistics, pages 276– 283, Pittsburgh, Pennsylvania, 6–9 June. Stuart M. Shieber. 1994. Restricting the weakgenerative capacity of synchronous tree-adjoining grammars. Computational Intelligence, 10(4):371– 385, November. Also available as cmp-lg/9404003. Stuart M. Shieber. 2004. Synchronous grammars as tree transducers. In Proceedings of the Seventh International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+7), pages 88– 95, Vancouver, Canada, May 20-22.