FREE ELECTRONIC LIBRARY - Abstracts, online materials

Pages:   || 2 | 3 |

«RESPONSE Real bad grammar: Realistic grammatical description with grammaticality* JENNIFER FOSTER 1. Introduction Sampson (this issue) argues for a ...»

-- [ Page 1 ] --


Real bad grammar:

Realistic grammatical description

with grammaticality*


1. Introduction

Sampson (this issue) argues for a concept of “realistic grammatical description” in which the distinction between grammatical and ungrammatical sentences is irrelevant. In this article I also argue for a concept of

“realistic grammatical description” but one in which a binary distinction between grammatical and ungrammatical sentences is maintained. In distinguishing between the grammatical and ungrammatical, this kind of grammar differs from that proposed by Sampson, but it does share the important property that invented sentences have no role to play, either as positive or negative evidence.

Our propensity to make mistakes, and the fact that many people are forced to speak and write in a language that is not their native one means that sentences are produced which contain grammatical errors.

These naturalistic ungrammatical sentences, as opposed to the invented starred examples often used within the linguistics community, have been dismissed as uninteresting. Although I do not wish to give naturalistic ungrammatical sentences the prominence given by Carnie (2002) to invented ungrammatical sentences when he suggests that it is necessary to determine the ungrammatical sentences in a language in order to determine the grammatical ones (see Sampson, this issue), I do, however, think that naturalistic ungrammatical sentences are of interest to linguists studying language production, language loss and language learning, and that the grammatical/ungrammatical distinction cannot therefore be completely dismissed. Also, for grammar development within the field of natural language processing, the grammatical/ungrammatical distinction cannot be ignored or denied because this can lead to the development of grammars which do not accurately analyse ungrammatical sentences. This article focuses particularly on this second argument Corpus Linguistics and Linguistic Theory 3 1 (2007), 73 86 1613-7027/07/0003 0073 DOI 10.1515/CLLT.2007.005 Walter de Gruyter J. Foster in favour of maintaining the grammatical/ungrammatical distinction, and when I speak of grammar development, I am particularly thinking of those large scale natural language grammars which are used to automatically parse natural language.

Linguistic evidence in the form of grammaticality judgements can be used to distinguish grammatical sentences from ungrammatical ones but, crucially, these judgements should be made only on naturalistic data in context. Sampson (this issue) argues that Chomsky’s conception of language as a set of sentences, with the role of a linguist to establish which strings are in this set, is unhelpful because it focuses undue attention on the grammatical/ungrammatical distinction: I believe that an unfortunate consequence of this definition of language is that it places too much emphasis on the sentence as an isolated unit.

2. Grammars for natural language processing A fundamental debate within the linguistics community has concerned what it is a grammar is supposed to model: should a grammar model competence or performance? Should a grammar reflect a psychological reality or a social reality? Lamb (2000), for example, distinguishes between a “theory of the linguistic extension” which is a theory of the utterances produced by a speaker or community, and a “theory of the linguistic system” which is a theory of the human cognitive system which is capable of producing and understanding such utterances. In the practical domain of natural language processing, there is no such debate. The grammar of a computer parser which is to form part of a practical application must be a theory of the linguistic extension and must describe the productions of a speech community. In proposing the competence/ performance distinction, Chomsky remarked that the language produced by a speech community is rife with slips and imperfections (Chomsky 1961: 130 131). Therefore, if a computer parser has to accurately parse actual language, it will have to accurately parse imperfect language, in particular the kind of imperfect language that we routinely produce and are capable of understanding. It will be able to do this if it is equipped with some knowledge of deviant sentence structures.

A precision grammar distinguishes between the grammatical and the ungrammatical and purposely describes only grammatical sentences. An example is the English Resource Grammar (ERG) (Copestake and Flickinger 2000), a broad coverage HPSG grammar of English. Baldwin et al. (2004) make the point that, if a grammar is to form the basis of a natural language processing system which performs not just sentence parsing but also sentence generation, it should not be able to generate ungrammatical sentences. A parser using such a grammar will reject unReal bad grammar 75 grammatical sentences outright. However, a parser which gives the response “no” or “ungrammatical” to a sentence such as (1),1 may be capable of distinguishing between the grammatical and ungrammatical but of what practical use is this ability if it cannot hint at the meaning of an utterance whose ill-formedness is quite commonplace?

(1) Want to saving money?

Of course, one could argue that robust parsing techniques (such as con´ straint relaxation (Fouvry 2003) or parse-fitting (Penstein Rose and Lavie 1997)) could be employed to handle ungrammatical sentences but such techniques will be more effective if they are tailored to specific types of ungrammatical language a natural extension of this is then to actually let the grammar describe the structure of ungrammatical sentences in the same way that it describes the structure of grammatical sentences.

A parser whose grammar is derived automatically from a treebank of naturalistic sentences is unconcerned with whether or not a sentence is grammatical. Typically, grammaticality is assumed, and this assumption will be quite accurate if the treebank sentences come from a high-quality newspaper such as The Wall Street Journal. The fact that such grammars do not purposely set out to exclude ungrammatical sentences together with the fact that such grammars are generally based upon a large body of data means that parsers equipped with such grammars are quite likely to return a parse for an ungrammatical sentence. However, since such a parser does not have a concept of ungrammaticality, it will not be aware

–  –  –

that there is something deviant about the sentence, with the result that the parse it produces for the sentence will not necessarily be the correct one, that is, it will not necessarily reflect what the person who produced the ungrammatical sentence intended to express. For example, Charniak’s most recent parser2 (Charniak 2000) will provide the reasonable parse in Figure 1 for sentence (1) but it is less successful, for example, on the ungrammatical (2), returning the parse in Figure 2.

(2) The closure in computed breadth-first.

–  –  –

3. Grammar requirements The following are the requirements for the type of grammar which I believe should be developed by computational linguists and used by a

parsing system:

1. The grammar should have a component which describes the structure of the grammatical sentences that occur in language.

2. The grammar should have a component which describes the structure of the ungrammatical sentences that occur in language.

Like a treebank grammar, this grammar aims to be a direct reflection of language rather than an indirect inflection via linguistic intuition.

However, unlike a treebank grammar, this grammar does explicitly distinguish between the grammatical and the ungrammatical, and this distinction relies on linguistic intuition. This distinction is binary, but this does not mean that the rules in each component of the grammar cannot be probabilistic. A linguistic structure described in the first grammatical component of the grammar could be assigned a probability based on Real bad grammar 77 how frequently this structure appears in grammatical data. Similarly, a linguistic structure described in the ungrammatical component of the grammar could be assigned a probability based on how frequently this structure shows up in ungrammatical data. The grammatical component of the grammar is quite similar to a precision grammar which has been tested using corpus evidence. An example is the afore-mentioned ERG which has been tested using sentences from the British National Corpus (Baldwin et al. 2004). The ungrammatical component is, of course, not implemented by a precision grammar.

What kind of evidence is needed in order to develop and test the second component of the grammar, the part of the grammar which describes ungrammatical sentences? Since this grammar is to form the basis of a parsing system, its description of ungrammaticality must reflect the kind of ungrammaticality that actually occurs in language. This means that naturalistic ungrammatical sentences will be needed as evidence rather than imagined ones. Baldwin et al. (2004) argue that naturalistic ungrammatical sentences such as (1) or (2) constitute “haphazard noise” and are useful only to test that a grammar does not overgenerate. I am arguing that a grammar that is capable of generating the kind of ungrammatical sentences that people actually produce, is not guilty of overgeneration, provided that the grammar knows that these kinds of sentences are ungrammatical. Therefore, to test the second ungrammatical component of the grammar, which is essentially a theory of real ungrammaticality, it is necessary to collect a corpus of naturalistic sentences which are considered by speakers of the language to be ungrammatical.

How does this definition of grammar relate to the one suggested by Sampson (this issue)? The two are broadly in agreement since they aim to describe language as it is actually used and both reject the need for negative evidence in grammar development. According to Sampson (2001, and footnote 3, this issue), if a grammar is constructed so that it excludes sentences whose structure has not actually been observed, then negative evidence becomes irrelevant. In order to exclude a sentence from the grammar, it is not necessary to verify that it is ungrammatical.

It is enough not to have observed the sentence in practice. Once the sentence is observed, then this observation has the potential to count as a refutation of a grammar which excludes the sentence, and the grammar will need to be modified accordingly. As Sampson notes, this is Popper’s view of the nature of a scientific theory: it should maximize the number of statements it makes which are refutable by observable evidence.3 Where the two notions of grammar differ is in their treatment of situations when an ungrammatical sentence such as (1) (repeated for convenience as (3)) is actually observed.

J. Foster (3) Want to saving money?

(4) Want to save money?

(5) Want to start saving money?

Surely, if such a string is observed, it should serve as a refutation of any grammar which prohibits it and the grammar in question would need to be modified so that it no longer prohibits this sentence? Sampson (2001) argues that the grammar should not be changed to accommodate such an observation since our knowledge that people make mistakes in language (such as omitting a word, using the wrong verb form, etc.) should allow us to relate this sentence to another sentence such as (4) or (5), both of which are accommodated by the grammar, thereby discounting the observation as a genuine refutation. The ungrammatical sentence (3) would, however, be included in the grammar described here, although it would still be recognised as a different kind of observation to a sentence such as (4) or (5) and thus would be included in the second ungrammatical component of the grammar. Recognizing it as a different kind of observation is the same thing as making a grammaticality judgement, and a method to make this kind of judgement as reliably as possible is described in the next section.

The type of grammar suggested by Sampson could actually be used as the grammatical component of the grammar advocated here. It would include rare and odd constructions (Sampson’s Dunster constructions), and if it was a probabilistic grammar it could encode rareness, without linking this rareness in any way to grammaticality. In fact, because of the clear distinction between the two components of the grammar, the concepts of grammaticality and frequency are not conflated. This nonconflation is a positive thing, regardless of where one stands with respect to Sampson’s claim that frequency data cannot be used to predict grammaticality status (see Sampson’s discussion of noun phrase variability in the SUSANNE treebank, this issue).

4. Judging grammaticality The use of grammaticality judgements as linguistic evidence has always been controversial. A large body of literature spanning several decades casts doubt on the validity of grammaticality judgements (see for example Labov 1972; Derwing 1979; Schütze 1996). These critiques cover various problems with the judgement process: defining grammaticality, choice of informant, the measurement scale used to measure judgements and the role of sentential context. After concluding that the grammaticality of a sentence cannot be inferred from its frequency, Sampson (this Real bad grammar 79 issue) dismisses as scientifically suspect the alternative method of using grammaticality judgements. The fact that it is difficult to reliably infer grammaticality is one of the arguments he uses to support his claim that the concept of grammaticality should be more or less ignored in grammar development. I disagree to a certain extent: in building treebanks (which are now a fundamental ingredient for natural language processing), we rely on the linguistic intuition of the treebank annotators to parse sentences, and I think that it would be useful to view the grammaticality judgement process in a similar way as a necessary evil. Grammaticality judgements, although undoubtedly problematic, can be used to effectively carve out a grammatical/ungrammatical distinction (albeit not a particularly exciting one), and I focus for the rest of the article on how this might be achieved, dealing particularly with the problems of sentential context and defining what it means for a sentence to be ungrammatical.

Pages:   || 2 | 3 |

Similar works:

«CHAPTER 24-06 LOCAL ROAD IMPROVEMENTS 24-06-01. Board of township supervisors has supervision over township roads. The board of township supervisors of any township in the state has general supervision over the roads, highways, and bridges throughout the township. 24-06-02. Township may purchase road machinery Credit terms. The board of supervisors of any township may contract for and purchase, upon credit or otherwise, any road machinery, implements, or equipment for the use of such township....»

«Tinker Passenger Terminal Tinker Passenger Terminal 72nd Logistics Readiness Squadron Hours of operation: 0700-1600 Mon-Fri, or as mission requires, Closed all federal holidays Phone: Commercial 405-734-1985 DSN: 884-1985 Flight Recording: 405-739-4360 E-mail: spacea@tinker.af.mil 7485 Sentry Blvd. Bldg. 255, Tinker AFB, Oklahoma, 73145 General Information • Space-available travel is a privilege (not an entitlement) that accrues to Uniformed Services members as an avenue of respite from the...»

«ONS Oncology Clinical Trials Nurse Competencies — 1 Oncology Nursing Society Oncology Clinical Trials Nurse Competencies 2 — ONS Oncology Clinical Trials Nurse Competencies Oncology Clinical Trials Nurse Competencies Project Team Members Penny Daugherty, RN, MS, OCN®, Co-Leader David Leos, RN, BSN, MBA, OCN®, CCRA Director of Clinical Research Oncology Clinical Educator Southeastern Gynecologic Oncology Memorial Hermann Southwest Hospital Atlanta, Georgia Houston, Texas Linda Schmieder,...»

«1 The Peterkin Papers BY Lucretia P. Hale Published by Ichthus Academy Dedicated To Meggie (THE DAUGHTER OF THE LADY FROM PHILADELPHIA) TO WHOM THESE STORIES WERE FIRST TOLD The Peterkin Papers ©Ichthus Academy CONTENTS The Lady who put Salt in her Coffee 3 About Elizabeth Eliza’s Piano 7 The Peterkins Try to Become Wise 9 Mrs. Peterkin Wishes to go to Drive 12 The Peterkins at Home 15 Why the Peterins had a Late Dinner 17 The Peterkins’ Summer Journey 20 The Peterkins Snowed-up 24 The...»

«A TC Entertainment LLC Production Letter From Our Con Head Thank you for joining us for Sausomecon 2016 Hello everyone! My name is Mike Tinsley, the convention Head of Sausomecon. I wanted to sincerely thank each and every one of you for attending our second year of Sausomecon! Going from year 1 to year 2 there have been a lot of hurdles and many questions, such as “How do we surpass our first year?”, “Will our attendees like our new venue?”, “Neko or no Neko for the maid...»

«The Spatial Distribution of Poverty A geographically weighted regression —by Sumeeta Srinivasan Introduction Problem How can we explore the spatial distribution of poverty and determine its correlates? This exercise examines data from Sri Lanka. Many quantitative studies use ordinary least squares (OLS) regression to estimate the effect of variables such as ethnicity, proximity to urban areas, elevation, and other indicators of development on poverty rates. This exercise uses a more...»

«ANNUAL REPORT 2008-09 DEPARTMENT OF TELECOMMUNICATIONS MINISTRY OF COMMUNICATIONS & IT GOVERNMENT OF INDIA NEW DELHI CONTENTS I. Indian Telecom Sector: An Overview II. Telecom Commission III. Department of Telecommunications III. 1. Wireless Planning and Coordination III. 2. Telecom Engineering Centre III. 3. Universal Service Obligation Fund III. 4. Controller of Communication Accounts Offices III. 5. Vigilance Activities III. 6. Telecom Network Security III. 7. Empowernment of Women III. 8....»

«WP-FDG-84 March 2011 Phrasal alignment in Functional Discourse Grammar Marlou van Rijn Universiteit van Amsterdam, The Netherlands Phrasal alignment in Functional Discourse Grammar1 Abstract Although the term alignment is typically associated with morphosyntactic expression of arguments of the Clause, alignment is also relevant to units of the Phrase. In Functional Discourse Grammar a basic distinction is made between two kinds of dependency relations obtaining both within Phrases and within...»

«FTUV-03-0902 JaxoDraw: A graphical user interface for drawing Feynman diagrams D. Binosi and L. Theußl arXiv:hep-ph/0309015v1 1 Sep 2003 Departamento de F´ ısica Te´rica, Universidad de Valencia, o E-46100 Burjassot (Valencia), Spain September 1, 2003 Abstract JaxoDraw is a Feynman graph plotting tool written in Java. It has a complete graphical user interface that allows all actions to be carried out via mouse click-and-drag operations in a WYSIWYG fashion. Graphs may be exported to...»

«AnnuAl RepoRt TABLE OF CONTENTS Danaher Overview 1-6 Letter to the Shareholders 7-11 2013 Form 10-K 12-116 Supplemental Financial Information 117-118 Directors and Executive Officers 119 Shareholder Information 120 DANAHER FINANCIAL OPERATING HIGHLIGHTS (dollars in millions, except per share data and number of associates) Sales * $ 19,118 $ 18,260 Operating Profit* $ 3,275 $ 3,165 Net Earnings* $ 2,695 $ 2,299 Net Earnings Per Share (diluted)* $ 3.80 $ 3.23 Operating Cash Flow * $ 3,585 $ 3,502...»

«Phosphates Handling/Processing Identification of Petitioned Substances This report addresses the following phosphate salts allowed under the National Organic Program (NOP) regulations at 7 CFR 205.605(b): calcium phosphates (monobasic, dibasic and tribasic), potassium phosphate, sodium acid pyrophosphate, and sodium phosphates. Chemical identifications of these phosphates are included in Table 1. Table 1: Chemical Identification of the Phosphates Listed at 7 CFR 205.605(b). Chemical Names...»

«Videos for Teaching Macro-level Practice A Selected Videography Recommended by ACOSA Members Edited by Caroline Lanza, MSW, and Art Biagianti, MSSA, MLS March 2000 1 Videos for Teaching Macro-level Practice A Selected Videography Title: “The Adventures of a Radical Hillbilly” Contact Information: New Market, TN: The Highlander Research and Education Center, 1981. Highlander Center, 1959 Highlander Way, New Market, TN, 37820. Phone: (423) 933-3443, Fax: (423) 933-3424, Email:...»

<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.