4. STANDARDIZED TESTING ~ Language Teaching Evaluation Task

WHAT IS STANDARDIZATION?

A standardized test presupposes certain standard objectives, or criteria, that are held constant across one form of the test to another. The criteria in large-scale standardized tests are designed to apply to a broad band of competencies that are usually not exclusive to one particular curriculum. A good standardized test is the product of a thorough process of empirical research and development. It dictates standard procedures for administration and scoring and finally, it is typical of a norm-referenced test, the goal of which is to place test-takers on a continuum across a range of scores and to differentiate test-takers by their relative ranking.

The example of standardized achievement in United States are college entrance exams such as the Scholastic Aptitude Test (SAT^®) are part of educational experience of many high school seniors seeking further education. The Graduate Record Exam (GRE^®) is a required standardized test for entry into many graduate school programs. Test like the Graduate Management Admission Test (GMAT) and the Law School Aptitude Test (LSAT) specialize in particular disciplines.

ADVANTAGES AND DISADVANTAGES OF STANDARDIZED TESTS

Advantages of standardized scoring include a ready-most previously validated product that frees the teacher from having to spend hours creating a test. Administration to large groups can be accomplished within reasonable time limits. In the case of multiple-formats, scoring procedures are streamlined (for either scan able computerized scoring or hand-scoring with a hole-punched grid) for fast turnaround time. And for better or worse, there is often an air in face validity to such authoritative-looking instruments.

Disadvantages center largely on the inappropriate use of such tests, for example, using an overall proficiency test as an achievement test simply because of the convenience of the standardization. Another disadvantages is the potential misunderstanding of the difference between direct and indirect testing. Some standardized tests include tasks that do not directly specify performance in the target objective.

DEVELOPING A STANDARDIZED TEST

Three different standardized tests will be used to exemplify the process of standardized test design:

(A) The Test of English as a Foreign Language (TOEFL^®), Educational Testing Service (ETS)

(B) The English as a Second Language Placement Test (ESLPT), San Francisco State University (SFSU)

The first is a test of general language ability or proficiency. The second is a placement test at a university. And the third is a gate-keeping essay test that all prospective students must pass in order to take graduate-level courses.

Determine The Purpose And Objectives Of The Test

Most standardized test are expected to provide high practicality in administration and scoring without unduly compromising validity. The initial outlay of time and money for such a test is significant, but the test would be used repeatedly. It is therefore important for its purpose and objectives to be stated specifically. Let’s look at three tests.

(A) The purpose of TOEFL is “to evaluate English proficiency of people whose native language is not English” (TOEFL Test and Score Manual, 2001, p. 9). More specifically, the TOEFL is designed to help institutions of higher learning make “valid decisions concerning English language proficiency in terms of (their) own requirements” (p. 9). Most colleges and universities in the United States use TOEFL scores to admit or refuse international applicants for admission.

(B) The ESLPT is designed to place already admitted student at San Francisco State University in an appropriate course in academic writing, with the secondary goal of placing students into courses in oral production and grammar-editing. While the test’s primary purpose is to make placements, another desirable objective is to provide teachers with some diagnostic information about their students on the first day or two of class. The ESLPT is locally designed by university faculty and staff.

(C) The GET is given to prospective graduate students – both native and non-native speakers – in all disciplines to determine whether their writing ability is sufficient to permit them to enter graduate-level courses in their program. It is offered at the beginning of each term. Students who fail or marginally pass the GET are technically ineligible to take graduate courses in their field. Instead, they may elect to take a course in graduate-level writing of research papers. A pass in that course is equivalent to passing the GET.

The objectives of each of these tests are specific. The content of each test must be designed to accomplish those particular ends. This first stage of goal-setting might be seen as one in which the consequential validity of the test is foremost in the mind of the developer: each test has a specific gate-keeping function to perform; therefore the criteria for entering those gates must be specified accurately.

Design Test Specifications

Decisions need to be made on how to go about structuring the specifications of the test. Before specs can be addressed, a comprehensive program of research must identify a set of constructs underlying the test itself. This stage of laying the foundation stones can occupy weeks, months, or even years of effort. Standardized tests that don’t work are often the product of shot-sighted construct validation. Let’s look at three test again.

(A) Construct validation for the TOEFL is carried out by the TOEFL staff at ETS under the guidance of a Policy Council that works with a Committee of Examiners that is composed of appointed external university faculty, linguists, and assessment specialist. Dozens of employees are involved in a complex process of reviewing current TOEFL specifications, commissioning and developing test tasks and items, assembling forms of the test, and performing ongoing exploratory research related to formulating new specs,. Reducing such a complex process to a set of simple steps runs the risk of gross overgeneralization, but here is an idea of how TOEFL is created.

How you view language will make a difference in how you assess language proficiency. After breaking language competence down into subsets of listening, speaking, reading, and writing, each performance mode can be examined on a continuum of linguistic units: phonology (pronunciation) and orthography (spelling), words (lexicon), sentences (grammar), discourse, and pragmatic (sociolinguistic, contextual, functional, cultural) features of language.

(B) The designing of the test specs for ESLPT was somewhat simper task because the purpose is placement and the construct validation of the test consists of an examination of the content of the ESL courses. In fact, in a recent revision of the ESLPT (Imao et al., 2000; Imao, 2001), content validity (coupled with its attendant face validity) was the central theoretical issue to be considered. The major issue centered on designing practical and reliable tasks and item response formats. Having established the importance of designing ESLPT tasks that simulated classroom tasks used in the courses, the designers ultimately specified two writing production tasks (one a response to an essay that students read, and the other a summary of another essay) and one multiple-choice grammar-editing task. These specifications mirrored that reading-based, process writing approach used in the courses.

(C) Specifications for the arose out of the perceived need to provide a threshold of acceptable writing ability for all prospective graduate students at SFSU, both native and non-native speakers of English. The specifications for the GET are the skills of writing grammatically and rhetorically acceptable prose on a topic of some interest, with clearly produced organization of ideas and logical development. The GET is a direct test of writing ability in which test-takers must, in a two-hour time period, write an essay on a given topic.

Design, Select, And Arrange Test Tasks/Items

Once specifications for a standardized test have been stipulated, the sometimes never-ending task of designing, selecting, and arranging items begins. The specs act much like a blueprint in determining the number and types of items to be created. Let’s look three examples.

(A) TOEFL test design specifies that each item be coded for content and statistical characteristics. Content coding ensures that each examinee will receive test questions that assess a variety of skills (reading, comprehending the main idea, or understanding inferences) and cover a variety of subject matter without unduly biasing the content toward a subset of test-takers (for example, in the listening section involving an academic lecture, the content must be universal enough for students from many different academic fields of study). Statistical characteristics, including the IRT equivalents f estimates of item facility (IF) and the ability of an item to discriminate (ID) between higher or lower ability levels, are so coded.

Items are then designed by a team who select and adapt items and solicited from a bank of items that have been “deposited” by free-lance writers and ETS staff. Probes for reading section, for example, are usually excerpts from authentic general or academic reading that are edited for linguistic difficulty, culture bias, or other topic biases. Items are designed to test overall comprehension, certain specific information, and inference.

(B) The selection of items in the ESLPT entailed two entirely different processes. In the two subsections of the test that elicit writing performance (summary of reading; response to reading), the main hurdles were (a) selecting appropriate passages for test-takers to read, (b) providing appropriate prompts, and (c) processing data form pilot testing.

Make Appropriate Evaluations Of Different Kinds Of Items
Specify Scoring Procedures And Reporting Formats
Perform Ongoing Construct Validation Studies

STANDARDIZED LANGUAGE PROFICIENCY TESTING

Test language proficiency presuppose a comprehensive definition of the specific comprehensions that comprise overall language ability. The specifications for the TOEFL provided an illustration of an operational definition of ability for assessment purposes. Swain (1990) guistic traits (grammar, discourse, and sociolinguistics) that can be assessed by means of oral, multiple-choice, and written responses. Swain’s conception was not meant to be an exhaustive analysis of ability, but rather to serve as an operational framework for constructing proficiency assessments.

Another definition and conceptualization of proficiency is suggested by the ACTFL association. ACTFL takes a holistic and more unitary view of proficiency in describing four levels: superior, advanced, intermediate, and novice. Within each level, descriptions of listening, speaking, reading, and writing are provided as guidelines for assessment.

FOUR STANDARDIZED LANGUAGE PROFICIENCY TESTS

Use the following questions to help evaluating those four tests and their subsections :

What items types are include?
How practical and reliable does each subsection of each test appear to be?
Do the item types and tasks appropriately represent a conceptualization of language proficiency (ability)? Can we evaluate their construct validity?
Are the task authentic?
Is there some washback potential in the task?

Test of English as Foreign Language (TOEFL^®)

Michigan English Language Assessment Battery (MELAB)

International English Language Testing System (IELTS)

Test of English for International Communication (TOEIC^®)

Language Teaching Evaluation Task

Senin, 14 November 2016

4. STANDARDIZED TESTING

0 komentar:

Posting Komentar

Mengenai Saya