On Indirect Anaphora Resolution
A. F. Gelbukh
Laboratory, Center for Computing Research (CIC), National Polytechnic Institute
Av. Juan de Dios Bátiz, esq. Mendizabal, Zacatenco, C.P. 07738, Mexico D.F., Mexico.
The paper discusses an algorithm of indirect anaphora resolution based on the use of a dictionary of prototypic scenarios associated with each headword, and also of a thesaurus of the standard type. The case of indirect anaphora expressed by a demonstrative pronoun, as in: “I sold a house. What can I do with this money?”, is discussed in more detail. The conditions for filtration of the candidates for the antecedent are presented. The structure of the prototypic scenarios dictionary is discussed.
Key words: anaphora resolution, indirect anaphora, semantics, dictionaries.
Anaphora resolution is one of the most challenging tasks of natural language processing. It is necessary in a wide range of NLP tasks, from language understanding to statistics, translation, and abstracting (Aone and McKee 1993; Carter 1987; Cornish 1996; Fox 1987; Fretheim and Gundel 1996; Hahn, Strube, and Markert 1996; Hirst 1981; Kameyama 1997; Mitkov 1997).
There are two cases of anaphoric relations:
(1) Direct anaphora, as in the discourse “I had a look at a new house yesterday. Its kitchen was extra large” (it = house) and
(2) So-called indirect anaphora, as in the discourse “I had a look at a new house yesterday. The kitchen was extra large” (the kitchen = of the house), example from (Chafe 1974).
In the latter case, the anaphoric relation holds between two conceptually different words, kitchen and house; note that there is no coreference between these two words. As we will show, coreference holds between the word kitchen in the text and the word kitchen implicitly introduced in the discourse by the word house.
Resolution of indirect anaphora and even detection of the presence of indirect anaphora are especially difficult (Erku and Gundel 1987; Gundel, Hedberg, and Zacharski 1988; Indirect Anaphora 1996; Sanford et al. 1983). The most frequent marker of indirect anaphora is definiteness, expressed in English with nouns by the definite article, also see (Ward and Birner 1994). In the case of direct anaphora, in written texts personal pronouns nearly always are used in the anaphoric function, so in the examples like (1) above at least the presence of anaphora is obvious. In the examples like (2), however, the definite article can be used not only in the anaphoric function, the other possible functions being deixis, contraposition, etc. (see below).
Additionally, the definite article is not the unique way of expression of indirect anaphora. A particular type of indirect anaphora markers is found in the expressions with demonstrative pronouns, as in the example “I sold a house. What can I do with this money?”. This example is even more complicated since the demonstrative pronoun can express even more meanings than the definite article.
Thus, two problems arise with respect to the indirect anaphora: (a) to detect the presence of the indirect anaphora, and (b) to resolve the ambiguity of the anaphoric link. However, we will approach the problem in the opposite order: We will try to plausibly resolve the anaphoric link and, if we succeed, consider that the anaphoric element is detected in the discourse. Our paper discusses a way of a dictionary-driven resolution of indirect anaphora and the peculiarities of the use of the demonstrative pronouns in the anaphoric function.
First, we will consider some useful examples. Then we will formulate three necessary conditions for the existing of an indirect anaphoric link. Finally, we will discuss an algorithm that uses these conditions to detect anaphoric links in the text.
Let us consider the following examples of indirect anaphora. For further discussion we need also the information about the possibility or impossibility of demonstrative pronouns in various contexts. The unacceptable variants are marked with an asterisk.
1. I bought a house. The/*This kitchen (walls, roof) was extremely large.
2. I bought a house. The/*These dimensions were 20 ´ 20.
3. I bought a house. The/*This previous owner was happy.
4. I was buying a house. I counted the/*this money carefully.
5. I sold a house. What can I do with the/this money?
6. I bought a house. I liked the/this price.
7. John was eating. The/*This table (dish) was dirty.
8. John was eating. It was dark in the/*this forest.
9. John was eating. The/This food was delicious.
10. John was eating. The/These apples were delicious.
11. John was singing. The/This noise disturbed Peter.
12. John was singing. Peter disliked the/this noise.
13. John was reading. He liked the/this author.
14. John died. The/*This widow was mad with grief.
For instance, in the example 1 the indirect anaphoric relation holds between kitchen and house: the kitchen is the kitchen of this house. In each of these sentences, we consider a purely anaphoric meaning of the definite article or the pronoun; at least these examples can have such a meaning. The variants marked with an asterisk are not possible in the anaphoric interpretation.
However, in some cases of definite article or demonstrative pronoun no anaphoric relation like the one in the previous examples is possible:
15. *I bought a house. The/This flowers are beautiful.
16. *John was eating. It was dark in the/this theater.
17. *John attended a religious ceremony. The mullah and rabbi preached a sermon.
On the other hand, these examples are quite acceptable if there is no anaphoric relation between flowers and house, hall and eat. In these cases, the two sentences may have no direct relationship and the second sentence may refer to a broader context, or the definite article or pronoun can have a deictic function: the speaker can just point with finger to the flowers or be in the theater.
Particularly interesting is the case of demonstrative pronouns. Though most of the examples above do not allow the use of such a pronoun in the anaphoric interpretation, they sound perfectly reasonable in other interpretations:
18. I bought a house. This kitchen (walls, roof) was extremely large.
19. I was buying a house. I counted this money carefully.
One of the possible non-anaphoric interpretations of such examples is contraposition: “this kitchen is large while the other kitchens are not;” in this case a special intonational stress is used which is not reflected in the written text. Another possible non-anaphoric interpretation is again deictic function: the speaker is physically in this kitchen or is showing this money to the listener.
Yet another example that does not allow the anaphoric relation is:
20. *Peter disliked that John was eating here. The/This table was dirty.
Figure 1. Three types of indirect anaphoric relationships.
A question arises: How to distinguish the cases of possible anaphoric relation expressed by the definite article or demonstrative pronoun in the discourse? In other words, in what cases such a relation is possible? In the next sections, we discuss some necessary conditions for this relation.
Indirect anaphora can be thought of as coreference between a word and an entity implicitly introduced in the text before. We call such entities implicitly or even potentially introduced by a word a prototypic scenario of this word. Thus, anaphoric relation here holds between a word and an element of the prototypic scenario of another word in the text; such an element does not have the surface representation in the text. The idea of explicit scenarios was developed, for example, in (Shank, Lebowitz, and Birnbaum 1980).
There are three possible types of the indirect anaphora depending on the relations between the antecedent and the anaphor: (1) the anaphor is a word in the text while the antecedent is an element of a scenario implied by another word; this is the most common case, and (2) vice versa, an implied concept makes reference to a word in the text (a rather rare case), (3) the reference is made between the implied concepts (an even rarer case). Let us consider the following examples (see Figure 1):
21. John was eating. The table was dirty.
22. John died. The widow was mad with grief.
23. John was buried. The widow was mad with grief.
Here the definite articles are used with the words table and widow. However, these words (and the corresponding concepts) do not appear literally in the discourse before. What is the reason for their definiteness? It can be explained by the existence of the indirect anaphoric relation: eat ¬ table, die ¬ widow, bury ¬ widow. In the first example the antecedent to eat contains in its prototypic scenario a slot for a place with a possible value table. In the second example the verb to die is included in the lexical meaning of the word widow. In the third example, the concept to die is in common with the lexical meanings of widow and to bury.
Thus, we can formulate the following preliminary version of a necessary condition for the possibility of indirect anaphora:
Condition 1 (preliminary). Indirect anaphora is possible only if any of the following conditions holds:
· The anaphor belongs to the scenario of the antecedent, or
· The antecedent belongs to the scenario of the anaphor, or
· Their scenarios intersect.
The three parts of Condition 1 that are corresponding to the three types of the indirect anaphora (Figure 1) are not equal. Thus, the decision that was made on the base of the first part has better scores (probability) than one based on the second. The third part is even rarer case and so the less probable. Thus, in the quantitative algorithm they have different scores (probabilities).
Let us note that indirect anaphora can combine with some phenomena involving substitution of one word for another, such as the use of synonyms, more general or more specific terms (hypernyms and hyponyms, see examples 12 and 10, correspondingly), metaphor (example 13), or changing of the surface part of speech (derivation). Such phenomena are transparent for indirect anaphora, though it is worth noting that these phenomena (except for derivation) decrease the appropriateness of indirect anaphora in the text and thus decrease the reliability of the result of its detection. Probably, the greater distance between the corresponding notions, the less fluent sounds the indirect anaphoric expression.
We will call the words that can be substituted for an anaphor or antecedent compatible with it. They are equivalent to the source word for our algorithms. By a compatible word we may mean, for example, a synonym, hypernym, hyponym, or metaphor of the source word, though not any of these, depending on a specific context. The rules to determine compatibility are beyond the scope of this article. For example, the relation of compatibility is not symmetric: a metaphor can hardly appear as a scenario element, while its appearance as a surface anaphoric element, see the first type of anaphoric relationship on Figure 1, is more probable. Analogously, a hypernym can hardly be used as a surface anaphor while the hidden antecedent is its hyponym. Probably, the reasons for this are in the mechanism of the indirect anaphora when the presence of potentially introduced concept and its surface representation (and, thus, the situation type) should be clarified in the explicit context, see examples 8, 10; in the example 24:
24. John attended a religious ceremony. The mullah preached a sermon.
only the second sentence clarifies that the ceremony was Muslim. These facts need further investigation. Here, we just emphasize that they are to be taken into account while applying the conditions of the indirect anaphora formulated below to the compatibility of the anaphor and the antecedent.
Now we can formulate an improved version of Condition 1:
Condition 1. Indirect anaphora is possible only if any of the following conditions holds:
· The anaphor is compatible with an element of the scenario of the antecedent, or
· The antecedent is compatible with an element of the scenario of the anaphor, or
· Their scenarios intersect (in the meaning of compatibility).
In this article, we consider only anaphoric links between words in different sentences (or different parts of a compound sentence). An interesting discussion of the possibility of anaphoric links within a simple sentence will be the topic of another our work. Here we will only discuss one complication related to embedded sentences.
As the example 20 shows, Condition 1 is not the only necessary condition for the possibility of an anaphoric link. Our further analysis will be connected with the problematics induced by this example. At the first glance, the following condition is adequate:
Condition 2 (preliminary). Indirect anaphora is possible only for the uppermost semantic level of the situation.
In the example 20, the uppermost level of the situation is “Peter disliked” and the indirect anaphora to the embedded situation is not possible. The uppermost semantic level obviously corresponds syntactically to the main part of the complex sentence. However, this condition is not true for the following example:
25. John was dismayed to find that his car wouldn’t start. The/*This battery was dead.
Since both the example 20 and 25 consist of two sentences, the difference between them can not depend on the relations of the first and the second parts. We believe that the difference is in the level of coherence between the two sentences. In the example 25, the two sentences are coherent: a new sentence can be constructed by connecting them with a subordinate conjunction because. In the example 20, no subordinate conjunction is suitable. So we can introduce a notion of syntactic connectability in the sense described above. If two sentences can be connected by a subordinate conjunction, then they are syntactically connectable. This notion obviously has a discourse nature and is related to text coherence (Downing and Noonan 1995; Fraurud 1992, 1996; Partee and Sgall 1996; Tomlin 1987). Let us consider more examples on this notion:
26. John disliked that I bought a house. The kitchen (walls, roof) was extremely large.
27. John disliked that I bought a house. The previous owner was happy.
28. *John was satisfied that I bought a house. I disliked the price.
29. *I disliked that John was eating there. The food was delicious.
30. *I was dismayed that John was reading there. He liked the author.
31. *I was very upset that John died. The widow was mad with grief.
The examples 26 and 27 are acceptable only if we assume that there is a causal relation between the first and the second sentences, i.e., if John does not like large kitchens (example 26) and for some reason hates the previous owner (example 27). If there is no such relation, then the examples are unacceptable. Thus, we can modify Condition 2:
Condition 2. If the parts containing the anaphor and the antecedent are not syntactically connected, then indirect anaphora is possible only for the uppermost semantic level of the situation.
Unfortunately, so far we do not have any algorithm for detecting syntactic connectability. Meanwhile, Condition 2 could be justified by a statistical evaluation: what is more frequent in the texts, the presence or absence of the syntactic connectability? This requires further investigations, though our preliminary results show that the cases when the subordinate relations take place usually do have a surface representation. So we do use Condition 2 in our algorithm, though the cases where Condition 2 would be applicable are rather rare in the real texts.
We emphasise that Conditions 1 and 2 concern only the possibility of the indirect anaphora but not its obligatory presence, thus being necessary but not sufficient conditions.
It can be observed that the anaphors in our examples have different status in the prototypic scenario of the antecedents. Some of them are necessary parts of the lexical meaning of the corresponding antecedent (as in examples 5, 6, 9) and thus are implicitly presented in the situation, while some are not. For example, the Random House Webster’s dictionary defines the word sell as “to transfer (goods) to or render (services) for another in exchange for money; dispose of to a purchaser for a price.” Thus, the words “money” (as a concept, but not a physical object) and “price” are parts of the lexical meaning of the word sell.
Besides, analysing the meaning of antecedents it was observed that the possibility of usage of the demonstrative pronouns is connected with some element in their semantics. The antecedent denotes the process or situation.
As the analysis of the examples show, these two factors are crucial for the possibility of the indirect anaphoric relation expressed by a demonstrative pronoun. So the following condition is also necessary in this case:
Condition 3. Indirect anaphora can be expressed by a demonstrative pronoun if the both of the following conditions hold:
· The antecedent denotes a process or situation and
· The anaphor is included into the lexical meaning of the antecedent.
It is worth noting that a lexical meaning always is part of scenario, so the second part of Condition 3 specifies Condition 1 without having any contradictions with it.
Indeed, the examples 1 to 3 have the antecedents denoting objects (house ¬ kitchen, house ¬ dimensions, house ¬ previous owner). In the examples 4, 7, 8, 14 the anaphors are not included into the lexical meaning of the antecedents (buy ¬ money (as the physical object), eat ¬ table, eat ¬ forest, die ¬ widow).
The other examples (5, 6, and 9 to 13) allow the use of the demonstrative pronoun. The examples 5, 6, and 9 are the standard cases; note that in the example 4 money is a physical object that is not obligatory in the situation (the buying could be with a credit card, to say), while in the example 5 it is an abstract entity, the price, and is a part of the lexical meaning of the verb, this is why in the example 4 the demonstrative pronoun is forbidden, while in the example 5 it is allowed. Example 12 demonstrates generalisation: sing ¬ noise, when the prototypic noun would be singing or song. Example 10 demonstrates specification: eat ¬ apples (a kind of food which is a part of the lexical meaning of eat). For the algorithm to be able to test Condition 3, some of the elements of the scenario are marked as “necessary” in our dictionary, while the others are “optional.” We took this information mainly from English-English explicative dictionaries: the words mentioned in the definitions are marked as “obligatory”. However, in many cases handwork was necessary to mark additional words.
Additionally, the dictionary contains the basic semantic class of the word: thing versus process or situation (regardless of the surface part of speech). This information in many cases can also be found in the FACTOTUM SemNet dictionary mentioned below.
The algorithm for detecting anaphoric links works as follows. It considers each word. If a word is introduced with a definite article or a demonstrative pronoun, then it is a potential anaphor, and the algorithm tries to find a plausible antecedent for it. It looks for the possible candidates for antecedents basing on the linear and structural distance from the potential anaphor. In the simplest case, it is sufficient to try the preceding words right to left, scoring them lower as the distance grows; the algorithm stops when an antecedent is found or the scores become too little. As we have mentioned, the current algorithm does not try the words within the same simple sentence.
For each potential antecedent, the conditions described above are tested. As we have discussed, in some cases (for instance, in measuring compatibility) the degree of satisfaction of the condition can be determined as a score, rather than as yes-or-no answer. In such a case, the scores for the conditions and the distance are combined (multiplied), and a threshold is used to decide when a pair of words passes the test. If the candidate satisfies all applicable conditions (or scores above the threshold), the anaphoric link is found.
To check the possibility of an indirect anaphoric link between two words, a dictionary that lists the members of the prototypic scenario for each word is used. In our case, we used a dictionary compiled from several sources, such as Clasitex’s dictionary (Guzmán-Arenas 1998), FACTOTUM SemNet dictionary derived from the Roget thesaurus, and some other dictionaries.
Our dictionary of prototypic scenarios has the structure suggested in (Guzmán-Arenas 1998), and discussed in greater detail in (Gelbukh, Sidorov, and Guzmán-Arenas 1999). For the simplest case, in such a dictionary each word is related to the words that can signify the potential participants of the situation expressed by the entry word. The types of relations between words are not specified that means that the relation should not be one of the standard predetermined relations (part, actant, etc.). This kind of knowledge is not used in algorithm. For example, the dictionary entry for the word church includes the words related to this one in the dictionaries mentioned above: priest, candle, icon, prayer, etc.
To check compatibility of words (synonymy, generalisation, specification, metaphor) we use a thesaurus compiled on the based of FACTOTUM SemNet dictionary, WordNet, and some other sources.
We have discussed a dictionary-based algorithm of filtration of the possible candidates for antecedents of the indirect anaphora expressed with a definite article or, as a special case, with a demonstrative pronoun. Namely, the algorithm checks the three conditions: (1) the intersection between the scenarios, (2) the syntactic plausibility of the relation, and (3) in the case of demonstrative pronouns, the semantic type of the antecedent and inclusion of the anaphor in the list of the “obligatory participants” of the antecedent. In practice, we suggest to use this algorithm rather for detection of the very presence of the indirect anaphora, and not just for detection of the antecedent when the presence of anaphoric relation is known.
The topic of a work we are preparing now is the conditions for indirect anaphoric links within a simple sentence. In our future papers we also plan to analyse the situation in the languages without articles (like Russian) where it is complicated by ambiguity of the expressions without the pronoun; also the broader context probably has some influence on the hidden anaphora and its formal markers.
We also plan to extend the information present in the dictionary. First, the dictionary should include a kind of “weights” of the elements of the scenario. The obligatory elements have the highest weight; however, the “optional” elements can be more closely related to the headword or be rather far from it. For example, the word table in the example 7 is not obligatory, but a very probable participant of the situation of eating. On the other hand, the word forest in the example 8 is a possible, but low-probable participant of this situation. In the example 16, the word theatre seems to be impossible as a place of eating. Such weights can be obtained both from some semantic dictionaries as the number of links between the words, and from a large corpus.
The second extension to the dictionary is specification of the alternatives. The most important source of alternatives are the words occupying the same “slot” in the scenario, or the same role, such as subject, place, etc. For example, in the example 17, a the scenario for religious ceremony would include as possible participants both mullah and rabbi, and the presence of any one of them is obligatory, however, they can not appear both in the prototypic scenario. Thus, the scenario should specify a role that lists a curé, padre, pope, mullah, and rabbi. The role itself is marked as obligatory, but the word religious ceremony can be the antecedent for only one word of this list. Similarly, the role of place for eating would include table and forest, etc. Thus, we plan to group the words in the scenarios according to their mutual exclusivity in the situation.
The work done under partial support of CONACyT grant 26424-A, REDII-IPN, COFAA-IPN and SNI, Mexico.
Aone, Ch., and D. McKee. 1993. Language-independent anaphora resolution system for understanding multilingual texts. Proceedings of the 31st meeting of the ACL. The Ohio State University, Columbus, Ohio.
Ariel, M. 1988. Referring and accessibility. Journal of Linguistics, 24: 67-87.
Bosch, P. 1988. Representing and accessing focussed referents. Language and Cognitive Processes, 3: 207-231.
Carter, D. 1987. Interpreting anaphora in natural language texts. Ellis Horwood, Chichester.
Cornish, F. 1996. ‘Antecedentless’ anaphors: deixis, anaphora, or what? Some evidence from English and French. Journal of Linguistics, 32: 19-41.
Cowan, R. 1995. What are Discource Principles Made of? In P. Downing and M. Noonan (Eds.), Word Order in discource. Benjamins, Amsterdam/Philadelfia.
Chafe, W. 1976. Giveness, Contrastiveness, Definiteness, Subject, Topics, and Point of View. In Ch. N. Li (Ed.), Subject and Topic. Academic Press, New York. pp. 27-55.
Chafe, W. 1987. Cognitive Constraints in Information Flow. In R. Tomlin (Ed.), Coherence and Grounding in Discource. Benjamins, Amsterdam. pp. 21-51.
Chafe, W. 1994. Discource, Consciousness, and Time. The University of Chicago Press, Chicago – London. 327 pp.
Downing, P., and M. Noonan (Eds.). 1995. Word Order in discource. Benjamins, Amsterdam/Philadelfia. 595 pp.
Erku, F., and J. K. Gundel. 1987. The pragmatics of indirect anaphors. In J. Verschueren and M. Bertuccelli-Papi (Eds.), The pragmatic perspective: Selected papers from the 1985 International Pragmatics Conference. John Benjamins, Amsterdam. pp. 533-545.
Fox, B. A. 1987. Discourse structure and anaphora: written and conversational English. Cambridge University Press, Cambridge.
Fraurud, K. 1992. Processing noun phrases in natural discourse. Doctoral dissertation, Stockholm University, Stockholm.
Fraurud, K. 1996. Cognitive ontology and NP form. In T. Fretheim and J. K. Gundel (Eds.), Reference and referent accessibility (pp. 193-212). John Benjamins, Amsterdam.
Fretheim, T., and J. K. Gundel (Eds.). 1996. Reference and referent accessibility. John Benjamins, Amsterdam.
Gelbukh, A. F., G. Sidorov, and A. Guzmán-Arenas. 1999. Use of a Weighted Topic Hierarchy for Document Classification. Proceedings of the International Conference: Text, Solution, Dialogue (TSD’99). Prague. (to appear)
Gundel, J., N. Hedberg, and R. Zacharski. 1988. Giveness, Implicature and Demonstrative Expressions in English Discource. Proceedings of 25th meeting of Chicago Linguistic Society, Part II (Parasession on Language in Context). Chicago. pp. 89-103.
Guzmán-Arenas, A. 1998. Finding the main themes in a Spanish document. Journal Expert Systems with Applications, 14 (1, 2): 139-148.
Hahn, U., M. Strube, and K. Markert. 1996. Bridging textual ellipses. Proceedings of the 16th International Conference on Computational Linguistics. pp. 496-501.
Hellman, C. 1996. The ‘price tag’ on knowledge activation in discourse processing. In T. Fretheim and J. K. Gundel (Eds.), Reference and referent accessibility. John Benjamins, Amsterdam.
Hirst, G. 1981. Anaphora in Natural Language Understanding. Springer Verlag, Berlin.
Indirect Anaphora Workshop. 1996. Lancaster University, Lancaster.
Kameyama, M. 1997. Recognizing Referential Links: an Information Extraction Perspective. Proceedings of ACL’97/EACL’97 workshop on Operational factors in practical, robust anaphora resolution. Madrid.
Lambrecht, K. 1994. Information Structure and Sentence Form. Topic, Focus and the Mental Representation of Discource Referents. Cambridge University Press, Cambridge. 388 pp.
Mel’čuk, I. A. 1988. Dependency Synax: Theory and Practice. The State University of New York Press, Albany, New York. 428 pp.
Mel’čuk, I. A. 1999. Communicative Organization in Natural Language: The Semantic-Communicative Structure of Sentence. 380 pp. (to appear)
Mitkov, R. 1997. Factors in Anaphora Resolution: They are not the Only Things that Matter. A Case Study Based on Two Different Approaches. Proc. of the ACL’97/EACL’97 workshop on Operational factors in practical, robust anaphora resolution. Madrid.
Partee, B., and P. Sgall (Eds.). 1996. Discource and Meaning. Papers in Honour of Eva Hajičova. Benjamins, Amsterdam/Philadelphia.
Sanford, A. J., S. C. Garrod, A. Lucas, and R. Henderson. 1983. Pronouns without explicit antecedents? Journal of Semantics, 2: 303-318.
Shank, R. C., M. Lebowitz, and L. Birnbaum. 1980. An Integrated Understander. American Journal of Computational Linguistics, 6 (l): 13-30.
Tomlin, R. (Ed.). 1987. Coherence and Grounding in Discource. Benjamins, Amsterdam. 512 pp.
Ward, G., and B. Birner. 1994. Definiteness and the English existential. Language, 71: 722-742.
Yule, G. 1982. Interpreting anaphora without identifying reference. Journal of Semantics, 1: 315-322.