A Brief History of NLP — Part 1

illustrations illustrations illustrations illustrations illustrations illustrations illustrations
A Brief History of NLP — Part 1

Published on Jul 07, 2020 by Antoine Louis

This post provides a summary of the NLP history before the deep learning era (1950 – 2000).

1950 - 1960. It is generally agreed that Weaver’s memorandum 1 brought the idea of the first computer-based application related to natural language: machine translation. It subsequently inspired many projects, notably the Georgetown experiment,2 a joint project between IBM and Georgetown University that successfully demonstrated the machine translation of more than 60 Russian sentences into English. The researchers accomplished this feat using hand-coded language rules, but the system failed to scale up to general translation. Early work in machine translation was simple: most systems used dictionary-lookup of appropriate words for translation and reordered the words after translation to fit the target language’s word-order rules. This produced poor results, as the lexical ambiguity inherent in natural language was not considered. The researchers then progressively realized that the task was a lot harder than anticipated, and they needed an adequate theory of language. It took until 1957 to introduce the idea of generative grammar,3 a rule-based system of syntactic structures that brought insight into how mainstream linguistics could help machine translation.

1960 - 1970. Due to the development of parsing algorithms and the syntactic theory of language, the 1950s were flooded with over-enthusiasm. People believed that fully automatic high-quality translation systems would produce results indistinguishable from those of human translators and that such systems would be in operation within a few years. Given the then-available linguistic knowledge and computer systems, this thought was completely unrealistic. After years of research and millions of dollars spent, machine translations were still more expensive than manual human translations, and there were no computers that came anywhere near being able to carry on a basic conversation. In 1966, the ALPAC released a report 4 that concluded that MT was not immediately achievable and recommended the research community to stop funding it. This had the effect of substantially slowing down machine translation research and most work in other NLP applications.

Despite this significant slowdown, some exciting developments were born during the years following the ALPAC report, both in theoretical issues and in constructing prototype systems. Theoretical work in the late 1960s and early 1970s mainly focused on how to represent meaning. Researchers developed new grammar theories that were computationally tractable for the first time, particularly after introducing transformational generative grammars,5 which were criticized for being too syntactically oriented and not lending themselves easily to computational implementation. As a result, many new theories appeared to explain syntactic anomalies and provide semantic representations, such as case grammar,6 semantic networks,7 augmented transition networks,8 and conceptual dependency theory.9 Alongside theoretical development, this period also saw the birth of many exciting prototype systems. ELIZA 10 was built to replicate the conversation between a psychologist and a patient by merely permuting or echoing the user input. SHRDLU 11 was a simulated robot that used natural language to query and manipulate objects inside a very simple virtual micro-world consisting of some color blocks and pyramids. LUNAR 12 was developed as an interface system to a database containing information about lunar rock samples using augmented transition networks. Lastly, PARRY 13 attempted to simulate a person with paranoid schizophrenia based on concepts, conceptualizations, and beliefs.

1970 - 1980. The 1970s brought new ideas into NLP, such as building conceptual ontologies which structured real-world information into computer-understandable data. Examples are MARGIE,14 TaleSpin,15 QUALM,16 SAM,17 PAM 18 and Politics.19

1980 - 1990. In the 1980s, many significant problems in NLP were addressed using symbolic approaches,20 21 22 23 24 i.e., complex hard-coded rules and grammars to parse language. Practically, the text was segmented into meaningless tokens (words and punctuation). Representations were then manually created by assigning meanings to these tokens and their mutual relationships through well-understood knowledge representation schemes and associated algorithms. Those representations were eventually used to perform deep analysis of linguistic phenomena.

1990 - 2000. Statistical models 25 26 27 28 came as a revolution in NLP in the late 1980s and early 1990s, replacing most natural language processing systems based on complex sets of hand-written rules. This progress resulted from both the steady increase of computational power and the shift to machine learning algorithms. While some of the earliest-used machine learning algorithms, such as decision trees,29 30 produced systems similar in performance to the old school hand-written rules, statistical models broke through the complexity barrier of hand-coded rules by creating them through automatic learning, which led researchers to focus on these models increasingly. At the time, these statistical models were capable of making soft, probabilistic decisions.

  1. Claude E Shannon and Warren Weaver. The mathematical theory of information. Urbana: University of Illinois Press, 97, 1949. ↩︎

  2. Leon E Dostert. The georgetown-ibm experiment. Machine translation of languages. John Wiley & Sons, New York, pages 124–135, 1955. ↩︎

  3. Noam Chomsky. Syntactic structures. The Hague: Mouton, 1957. ↩︎

  4. John R Pierce, John B Carroll, Eric P Hamp, David G Hays, Charles F Hockett, Anthony G Oettinger, and Alan Perlis. Language and machines — computers in translation and linguistics. ALPAC report, National Academy of Sciences, National Research Council, Washington, DC, 1966. ↩︎

  5. Noam Chomsky. Aspects of the theory of syntax. Cambridge: M.I.T. Press, 1965. ↩︎

  6. Charles Fillmore. The case for case. Bach and Harms (Ed.): Universals in Linguistic Theory, 1968. ↩︎

  7. Allan M Collins, M Ross Quillian, et al. Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 8(2):240 – 247, 1969. ↩︎

  8. William A Woods. Transition network grammars for natural language analysis. Communications of the ACM, 13(10):591–606, 1970. ↩︎

  9. Roger C Schank. Conceptual dependency: A theory of natural language understanding. Cognitive psychology, 3(4):552–631, 1972. ↩︎

  10. Joseph Weizenbaum. Eliza—a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1):36–45, 1966. ↩︎

  11. Terry Winograd. Procedures as a representation for data in a computer program for understanding natural language. Technical report, Massachusetts Institute of Technology, Cambridge Project, 1971. ↩︎

  12. W Woods, R Kaplan, and B Nash-Webber. The lunar sciences natural language information system: Final report (bolt, beranek and newman, cambridge, ma). Woods discusses the LUNAR program which answers scientist’s questions about the moon rocks, 1972. ↩︎

  13. Kenneth Mark Colby. Ten criticisms of parry. ACM SIGART Bulletin, 48:5–9, 1974. ↩︎

  14. Roger C Schank and Robert P Abelson. Scripts, plans, and knowledge. In IJCAI, volume 75, pages 151–157, 1975. ↩︎

  15. James Richard Meehan. The metanovel: writing stories by computer. Technical report, Yale Univ. New Haven. Conn. Dept. of Computer Science, 1976. ↩︎

  16. Wendy G Lehnert. A conceptual theory of question answering. In Proceedings of the 5th inter- national joint conference on Artificial intelligence-Volume 1, pages 158–164, 1977. ↩︎

  17. Richard Edward Cullingford. Script application: computer understanding of newspaper stories. Technical report, Yale Univ. New Haven. Conn. Dept. of Computer Science, 1978. ↩︎

  18. Roger C Schank and Robert Wilensky. A goal-directed production system for story understanding. In Pattern-directed inference systems, pages 415–430. Elsevier, 1978. ↩︎

  19. Jaime Guillermo Carbonell. Subjective understanding: computer models of belief systems. Technical report, Yale Univ. New Haven. Conn. Dept. of Computer Science, 1979. ↩︎

  20. Eugene Charniak. Passing markers: A theory of contextual influence in language comprehension. Cognitive science, 7(3):171–190, 1983. ↩︎

  21. Michael G Dyer. The role of affect in narratives. Cognitive Science, 7(3):211–242, 1983. ↩︎

  22. Christopher K Riesbeck and C Martin. Direct memory access parsing. Experience, memory and reasoning, pages 209–226, 1986. ↩︎

  23. Barbara J Grosz, Douglas E Appelt, Paul A Martin, and Fernando CN Pereira. Team: an experiment in the design of transportable natural-language interfaces. Artificial Intelligence, 32(2):173–243, 1987. ↩︎

  24. Graeme Hirst. Semantic interpretation and ambiguity. Artificial Intelligence, 34(2):131 – 177, 1987. ISSN 0004-3702. ↩︎

  25. Lalit R Bahl, Peter F Brown, Peter V de Souza, and Robert L Mercer. A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(7):1001–1008, 1989. ↩︎

  26. Eric Brill, David Magerman, Mitchell Marcus, and Beatrice Santorini. Deducing linguistic struc- ture from the statistics of large corpora. In Proceedings of the 5th Jerusalem Conference on Information Technology, 1990.’Next Decade in Information Technology’, pages 380–389. IEEE, 1990. ↩︎

  27. Mahesh V Chitrao and Ralph Grishman. Statistical parsing of messages. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, Pennsylvania, June 24-27, 1990, 1990. ↩︎

  28. Peter F Brown, John Cocke, Stephen A Della Pietra, Vincent J Della Pietra, Frederick Jelinek, John Lafferty, Robert L Mercer, and Paul S Roossin. A statistical approach to machine translation. Computational linguistics, 16(2):79–85, 1991. ↩︎

  29. Hideki Tanaka. Verbal case frame acquisition from a bilingual corpus: Gradual knowledge acquisition. In Proceedings of the 15th conference on Computational linguistics-Volume 2, pages 727–731. Association for Computational Linguistics, 1994. ↩︎

  30. Ilussein Allmuallim, Yasuhiro Akiba, Takefumi Yamazaki, Akio Yokoo, and Shigeo Kaneda. Two methods for learning translation rules from examples and a semantic hierarchy. In COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics, 1994. ↩︎