Senin, 02 Januari 2017

ASSESSING SPEAKING AND READING

Teachers are often asked to evaluate learner progress during courses, maybe by preparing progress tests. Teachers often feel unsure as to the best way to do this. Here are some ideas.

Teachers are often asked to evaluate learner progress during courses, maybe by preparing progress tests. It can seem straightforward enough to test grammar or vocabulary with pen and paper tests – but if our students’ work includes speaking – then it also seems necessary to assess their speaking skills. Teachers often feel unsure as to how they could do this. Here are some ideas.

Criteria rather than marks
What’s the aim of a progress test? Often it’s to give encouragement that something is being done well - or to point out areas where a learner’s not achieving as much as they could. With this kind of aim, giving 'marks' may not be the most effective way to assess. An interesting alternative option for progress tests is to base them around assessing if learners are successful when compared against some 'can do' criteria statements (i.e. statements listing things “I can do”), such as “I can describe what’s happening in a picture of town streets.” or “I can take part in a discussion and explain my point of view clearly and politely.” To prepare a criteria list think of about ten kinds of speaking that students have worked on over the course and turn them into criteria.

Too many students!A frequent problem for teachers is when there are so many learners in one class that it seems to make it unrealistic to assess speaking. With a list of criteria (such as those above) it now becomes considerably more straightforward to assess even a large group. Explain to your class what you will be doing, then, the next three or four times you set speaking tasks (i.e. where learners work in pairs or groups), walk around class with a list of names, listening in to various groups and noting successes, keeping track of individual 'can do’s'. Extend your assessment over a few lessons; keep listening and adjusting your evaluation over a variety of tasks.

Speaking tasksWhat are possible speaking tasks for assessment? Well, almost anything you do in normal class work – e.g. narrating a picture story; role-plays; pair work information gap exchanges; discussions etc. If you have a smaller class and enough time then a “three learners with one teacher” activity is a very good way to assess, i.e. setting a task that gets the three learners to interact together while you watch and evaluate.

Self-assessmentAlthough fear of bad marks can sometimes be motivating, it’s surprising to find the amount of power that students feel when assessing themselves. It can be a real awareness-raising activity. Distribute a list of criteria and ask students to first write a short line comparing themselves against each criterion (in English or in their own language) – a reflective view rather than just a 'yes' or 'no'. Encourage 'guilt-free' honest reflection. After the writing stage, learners can meet up in small groups and talk through their thoughts, explaining why they wrote what they did.

Assessing Student Writing

What does it mean to assess writing?

Assessment is the gathering of information about student learning. It can be used for formative purposes−−to adjust instruction−−or summative purposes: to render a judgment about the quality of student work. It is a key instructional activity, and teachers engage in it every day in a variety of informal and formal ways.

Assessment of student writing is a process. Assessment of student writing and performance in the class should occur at many different stages throughout the course and could come in many different forms. At various points in the assessment process, teachers usually take on different roles such as motivator, collaborator, critic, evaluator, etc., (see Brooke Horvath for more on these roles) and give different types of response.

One of the major purposes of writing assessment is to provide feedback to students. We know that feedback is crucial to writing development. The 2004 Harvard Study of Writing concluded, "Feedback emerged as the hero and the anti-hero of our study−powerful enough to convince students that they could or couldn't do the work in a given field, to push them toward or away from selecting their majors, and contributed, more than any other single factor, to students' sense of academic belonging or alienation" (http://www.fas.harvard.edu/~expos/index.cgi?section=study).

Source: Horvath, Brooke K. "The Components of Written Response: A Practical Synthesis of Current Views." Rhetoric Review 2 (January 1985): 136−56. Rpt. in C Corbett, Edward P. J., Nancy Myers, and Gary Tate. The Writing Teacher's Sourcebook . 4th ed. New York: Oxford Univ. Press, 2000.

Suggestions for Assessing Student Writing

Be sure to know what you want students to be able to do and why. Good assessment practices start with a pedagogically sound assignment description and learning goals for the writing task at hand. The type of feedback given on any task should depend on the learning goals you have for students and the purpose of the assignment. Think early on about why you want students to complete a given writing project (see guide to writing strong assignments page). What do you want them to know? What do you want students to be able to do? Why? How will you know when they have reached these goals? What methods of assessment will allow you to see that students have accomplished these goals (portfolio assessment assigning multiple drafts, rubric, etc)? What will distinguish the strongest projects from the weakest?

Begin designing writing assignments with your learning goals and methods of assessment in mind.

Plan and implement activities that support students in meeting the learning goals. How will you support students in meeting these goals? What writing activities will you allow time for? How can you help students meet these learning goals?

Begin giving feedback early in the writing process. Give multiple types of feedback early in the writing process. For example, talking with students about ideas, write written responses on drafts, have students respond to their peers' drafts in process, etc. These are all ways for students to receive feedback while they are still in the process of revising.

Structure opportunities for feedback at various points in the writing process. Students should also have opportunities to receive feedback on their writing at various stages in the writing process. This does not mean that teachers need to respond to every draft of a writing project. Structuring time for peer response and group workshops can be a very effective way for students to receive feedback from other writers in the class and for them to begin to learn to revise and edit their own writing.

Be open with students about your expectations and the purposes of the assignments. Students respond better to writing projects when they understand why the project is important and what they can learn through the process of completing it. Be explicit about your goals for them as writers and why those goals are important to their learning. Additionally, talk with students about methods of assessment. Some teachers have students help collaboratively design rubrics for the grading of writing. Whatever methods of assessment you choose, be sure to let students in on how they will be evaluated.

Do not burden students with excessive feedback. Our instinct as teachers, especially when we are really interested in students´ writing is to offer as many comments and suggestions as we can. However, providing too much feedback can leave students feeling daunted and uncertain where to start in terms of revision. Try to choose one or two things to focus on when responding to a draft. Offer students concrete possibilities or strategies for revision.

Allow students to maintain control over their paper. Instead of acting as an editor, suggest options or open-ended alternatives the student can choose for their revision path. Help students learn to assess their own writing and the advice they get about it.

Purposes of Responding We provide different kinds of response at different moments. But we might also fall into a kind of "default" mode, working to get through the papers without making a conscious choice about how and why we want to respond to a given assignment. So it might be helpful to identify the two major kinds of response we provide:

Formative Response: response that aims primarily to help students develop their writing. Might focus on confidence-building, on engaging the student in a conversation about her ideas or writing choices so as to help student to see herself as a successful and promising writer. Might focus on helping student develop a particular writing project, from one draft to next. Or, might suggest to student some general skills she could focus on developing over the course of a semester.
Evaluative Response: response that focuses on evaluation of how well a student has done. Might be related to a grade. Might be used primarily on a final product or portfolio. Tends to emphasize whether or not student has met the criteria operative for specific assignment and to explain that judgment.

Means of Responding

We respond to many kinds of writing and at different stages in the process, from reading responses, to exercises, to generation or brainstorming, to drafts, to source critiques, to final drafts. It is also helpful to think of the various forms that response can take.

Conferencing: verbal, interactive response. This might happen in class or during scheduled sessions in offices. Conferencing can be more dynamic: we can ask students questions about their work, modeling a process of reflecting on and revising a piece of writing. Students can also ask us questions and receive immediate feedback. Conference is typically a formative response mechanism, but might also serve usefully to convey evaluative response.
Written Comments on Drafts

Local: when we focus on "local" moments in a piece of writing, we are calling attention to specifics in the paper. Perhaps certain patterns of grammar or moments where the essay takes a sudden, unexpected turn. We might also use local comments to emphasize a powerful turn of phrase, or a compelling and well-developed moment in a piece. Local commenting tends to happen in the margins, to call attention to specific moments in the piece by highlighting them and explaining their significance. We tend to use local commenting more often on drafts and when doing formative response.
Global: when we focus more on the overall piece of writing and less on the specific moments in and of themselves. Global comments tend to come at the end of a piece, in narrative-form response. We might use these to step back and tell the writer what we learned overall, or to comment on a pieces' general organizational structure or focus. We tend to use these for evaluative response and often, deliberately or not, as a means of justifying the grade we assigned.
Rubrics: charts or grids on which we identify the central requirements or goals of a specific project. Then, we evaluate whether or not, and how effectively, students met those criteria. These can be written with students as a means of helping them see and articulate the goals a given project.

Rubrics: Tools for Response and Assessment

Rubrics are tools teachers and students use to evaluate and classify writing, whether individual pieces or portfolios. They identify and articulate what is being evaluated in the writing, and offer "descriptors" to classify writing into certain categories (1-5, for instance, or A-F). Narrative rubrics and chart rubrics are the two most common forms. Here is an example of each, using the same classification descriptors:

Example: Narrative Rubric for Inquiring into Family & Community History

An "A" project clearly and compellingly demonstrates how the public event influenced the family/community. It shows strong audience awareness, engaging readers throughout. The form and structure are appropriate for the purpose(s) and audience(s) of the piece. The final product is virtually error-free. The piece seamlessly weaves in several other voices, drawn from appropriate archival, secondary, and primary research. Drafts - at least two beyond the initial draft - show extensive, effective revision. Writer's notes and final learning letter demonstrate thoughtful reflection and growing awareness of writer's strengths and challenges.

A "B" project clearly and compellingly demonstrates how the public event influenced the family/community. It shows strong audience awareness, and usually engages readers. The form and structure are appropriate for the audience(s) and purpose(s) of the piece, though the organization may not be tight in a couple places. The final product includes a few errors, but these do no interfere with readers' comprehension. The piece effectively, if not always seamlessly, weaves several other voices, drawn from appropriate archival, secondary, and primary research. One area of research may not be as strong as the other two. Drafts - at least two beyond the initial drafts - show extensive, effective revision. Writer's notes and final learning letter demonstrate thoughtful reflection and growing awareness of writer's strengths and challenges.

A "C" project demonstrates how the public event influenced the family/community. It shows audience awareness, sometimes engaging readers. The form and structure are appropriate for the audience(s) and purpose(s), but the organization breaks down at times. The piece includes several, apparent errors, which at times compromises the clarity of the piece. The piece incorporates other voices, drawn from at least two kinds of research, but in a generally forced or awkward way. There is unevenness in the quality and appropriateness of the research. Drafts - at least one beyond the initial draft - show some evidence of revision. Writer's notes and final learning letter show some reflection and growth in awareness of writer's strengths and challenges.

A "D" project discusses a public event and a family/community, but the connections may not be clear. It shows little audience awareness. The form and structure is poorly chosen or poorly executed. The piece includes many errors, which regularly compromise the comprehensibility of the piece. There is an attempt to incorporate other voices, but this is done awkwardly or is drawn from incomplete or inappropriate research. There is little evidence of revision. Writer's notes and learning letter are missing or show little reflection or growth.

An "F" project is not responsive to the prompt. It shows little or no audience awareness. The purpose is unclear and the form and structure are poorly chosen and poorly executed. The piece includes many errors, compromising the clarity of the piece throughout. There is little or no evidence of research. There is little or no evidence of revision. Writer's notes and learning letter are missing or show no reflection or growth.

Assessing Listening Proficiency

You can use post-listening activities to check comprehension, evaluate listening skills and use of listening strategies, and extend the knowledge gained to other contexts. A post-listening activity may relate to a pre-listening activity, such as predicting; may expand on the topic or the language of the listening text; or may transfer what has been learned to reading, speaking, or writing activities.

In order to provide authentic assessment of students' listening proficiency, a post-listening activity must reflect the real-life uses to which students might put information they have gained through listening.

It must have a purpose other than assessment
It must require students to demonstrate their level of listening comprehension by completing some task.

To develop authentic assessment activities, consider the type of response that listening to a particular selection would elicit in a non-classroom situation. For example, after listening to a weather report one might decide what to wear the next day; after listening to a set of instructions, one might repeat them to someone else; after watching and listening to a play or video, one might discuss the story line with friends.

Use this response type as a base for selecting appropriate post-listening tasks. You can then develop a checklist or rubric that will allow you to evaluate each student's comprehension of specific parts of the aural text. (See Assessing Learning for more on checklists and rubrics.)

For example, for listening practice you have students listen to a weather report. Their purpose for listening is to be able to advise a friend what to wear the next day. As a post-listening activity, you ask students to select appropriate items of clothing from a collection you have assembled, or write a note telling the friend what to wear, or provide oral advice to another student (who has not heard the weather report). To evaluate listening comprehension, you use a checklist containing specific features of the forecast, marking those that are reflected in the student's clothing recommendations.

Assessing Reading Proficiency

Reading ability is very difficult to assess accurately. In the communicative competence model, a student's reading level is the level at which that student is able to use reading to accomplish communication goals. This means that assessment of reading ability needs to be correlated with purposes for reading.

Reading Aloud

A student's performance when reading aloud is not a reliable indicator of that student's reading ability. A student who is perfectly capable of understanding a given text when reading it silently may stumble when asked to combine comprehension with word recognition and speaking ability in the way that reading aloud requires.

In addition, reading aloud is a task that students will rarely, if ever, need to do outside of the classroom. As a method of assessment, therefore, it is not authentic: It does not test a student's ability to use reading to accomplish a purpose or goal.

However, reading aloud can help a teacher assess whether a student is "seeing" word endings and other grammatical features when reading. To use reading aloud for this purpose, adopt the "read and look up" approach: Ask the student to read a sentence silently one or more times, until comfortable with the content, then look up and tell you what it says. This procedure allows the student to process the text, and lets you see the results of that processing and know what elements, if any, the student is missing.

Comprehension Questions

Instructors often use comprehension questions to test whether students have understood what they have read. In order to test comprehension appropriately, these questions need to be coordinated with the purpose for reading. If the purpose is to find specific information, comprehension questions should focus on that information. If the purpose is to understand an opinion and the arguments that support it, comprehension questions should ask about those points.

In everyday reading situations, readers have a purpose for reading before they start. That is, they know what comprehension questions they are going to need to answer before they begin reading. To make reading assessment in the language classroom more like reading outside of the classroom, therefore, allow students to review the comprehension questions before they begin to read the test passage.

Finally, when the purpose for reading is enjoyment, comprehension questions are beside the point. As a more authentic form of assessment, have students talk or write about why they found the text enjoyable and interesting (or not).

Authentic Assessment

In order to provide authentic assessment of students' reading proficiency, a post-listening activity must reflect the real-life uses to which students might put information they have gained through reading.

It must have a purpose other than assessment
It must require students to demonstrate their level of reading comprehension by completing some task

To develop authentic assessment activities, consider the type of response that reading a particular selection would elicit in a non-classroom situation. For example, after reading a weather report, one might decide what to wear the next day; after reading a set of instructions, one might repeat them to someone else; after reading a short story, one might discuss the story line with friends.

Use this response type as a base for selecting appropriate post-reading tasks. You can then develop a checklist or rubric that will allow you to evaluate each student's comprehension of specific parts of the text. See Assessing Learning for more on checklists and rubrics.

4. STANDARDIZED TESTING

WHAT IS STANDARDIZATION?

A standardized test presupposes certain standard objectives, or criteria, that are held constant across one form of the test to another. The criteria in large-scale standardized tests are designed to apply to a broad band of competencies that are usually not exclusive to one particular curriculum. A good standardized test is the product of a thorough process of empirical research and development. It dictates standard procedures for administration and scoring and finally, it is typical of a norm-referenced test, the goal of which is to place test-takers on a continuum across a range of scores and to differentiate test-takers by their relative ranking.

The example of standardized achievement in United States are college entrance exams such as the Scholastic Aptitude Test (SAT^®) are part of educational experience of many high school seniors seeking further education. The Graduate Record Exam (GRE^®) is a required standardized test for entry into many graduate school programs. Test like the Graduate Management Admission Test (GMAT) and the Law School Aptitude Test (LSAT) specialize in particular disciplines.

ADVANTAGES AND DISADVANTAGES OF STANDARDIZED TESTS

Advantages of standardized scoring include a ready-most previously validated product that frees the teacher from having to spend hours creating a test. Administration to large groups can be accomplished within reasonable time limits. In the case of multiple-formats, scoring procedures are streamlined (for either scan able computerized scoring or hand-scoring with a hole-punched grid) for fast turnaround time. And for better or worse, there is often an air in face validity to such authoritative-looking instruments.

Disadvantages center largely on the inappropriate use of such tests, for example, using an overall proficiency test as an achievement test simply because of the convenience of the standardization. Another disadvantages is the potential misunderstanding of the difference between direct and indirect testing. Some standardized tests include tasks that do not directly specify performance in the target objective.

DEVELOPING A STANDARDIZED TEST

Three different standardized tests will be used to exemplify the process of standardized test design:

(A) The Test of English as a Foreign Language (TOEFL^®), Educational Testing Service (ETS)

(B) The English as a Second Language Placement Test (ESLPT), San Francisco State University (SFSU)

The first is a test of general language ability or proficiency. The second is a placement test at a university. And the third is a gate-keeping essay test that all prospective students must pass in order to take graduate-level courses.

Determine The Purpose And Objectives Of The Test

Most standardized test are expected to provide high practicality in administration and scoring without unduly compromising validity. The initial outlay of time and money for such a test is significant, but the test would be used repeatedly. It is therefore important for its purpose and objectives to be stated specifically. Let’s look at three tests.

(A) The purpose of TOEFL is “to evaluate English proficiency of people whose native language is not English” (TOEFL Test and Score Manual, 2001, p. 9). More specifically, the TOEFL is designed to help institutions of higher learning make “valid decisions concerning English language proficiency in terms of (their) own requirements” (p. 9). Most colleges and universities in the United States use TOEFL scores to admit or refuse international applicants for admission.

(B) The ESLPT is designed to place already admitted student at San Francisco State University in an appropriate course in academic writing, with the secondary goal of placing students into courses in oral production and grammar-editing. While the test’s primary purpose is to make placements, another desirable objective is to provide teachers with some diagnostic information about their students on the first day or two of class. The ESLPT is locally designed by university faculty and staff.

(C) The GET is given to prospective graduate students – both native and non-native speakers – in all disciplines to determine whether their writing ability is sufficient to permit them to enter graduate-level courses in their program. It is offered at the beginning of each term. Students who fail or marginally pass the GET are technically ineligible to take graduate courses in their field. Instead, they may elect to take a course in graduate-level writing of research papers. A pass in that course is equivalent to passing the GET.

The objectives of each of these tests are specific. The content of each test must be designed to accomplish those particular ends. This first stage of goal-setting might be seen as one in which the consequential validity of the test is foremost in the mind of the developer: each test has a specific gate-keeping function to perform; therefore the criteria for entering those gates must be specified accurately.

Design Test Specifications

Decisions need to be made on how to go about structuring the specifications of the test. Before specs can be addressed, a comprehensive program of research must identify a set of constructs underlying the test itself. This stage of laying the foundation stones can occupy weeks, months, or even years of effort. Standardized tests that don’t work are often the product of shot-sighted construct validation. Let’s look at three test again.

(A) Construct validation for the TOEFL is carried out by the TOEFL staff at ETS under the guidance of a Policy Council that works with a Committee of Examiners that is composed of appointed external university faculty, linguists, and assessment specialist. Dozens of employees are involved in a complex process of reviewing current TOEFL specifications, commissioning and developing test tasks and items, assembling forms of the test, and performing ongoing exploratory research related to formulating new specs,. Reducing such a complex process to a set of simple steps runs the risk of gross overgeneralization, but here is an idea of how TOEFL is created.

How you view language will make a difference in how you assess language proficiency. After breaking language competence down into subsets of listening, speaking, reading, and writing, each performance mode can be examined on a continuum of linguistic units: phonology (pronunciation) and orthography (spelling), words (lexicon), sentences (grammar), discourse, and pragmatic (sociolinguistic, contextual, functional, cultural) features of language.

(B) The designing of the test specs for ESLPT was somewhat simper task because the purpose is placement and the construct validation of the test consists of an examination of the content of the ESL courses. In fact, in a recent revision of the ESLPT (Imao et al., 2000; Imao, 2001), content validity (coupled with its attendant face validity) was the central theoretical issue to be considered. The major issue centered on designing practical and reliable tasks and item response formats. Having established the importance of designing ESLPT tasks that simulated classroom tasks used in the courses, the designers ultimately specified two writing production tasks (one a response to an essay that students read, and the other a summary of another essay) and one multiple-choice grammar-editing task. These specifications mirrored that reading-based, process writing approach used in the courses.

(C) Specifications for the arose out of the perceived need to provide a threshold of acceptable writing ability for all prospective graduate students at SFSU, both native and non-native speakers of English. The specifications for the GET are the skills of writing grammatically and rhetorically acceptable prose on a topic of some interest, with clearly produced organization of ideas and logical development. The GET is a direct test of writing ability in which test-takers must, in a two-hour time period, write an essay on a given topic.

Design, Select, And Arrange Test Tasks/Items

Once specifications for a standardized test have been stipulated, the sometimes never-ending task of designing, selecting, and arranging items begins. The specs act much like a blueprint in determining the number and types of items to be created. Let’s look three examples.

(A) TOEFL test design specifies that each item be coded for content and statistical characteristics. Content coding ensures that each examinee will receive test questions that assess a variety of skills (reading, comprehending the main idea, or understanding inferences) and cover a variety of subject matter without unduly biasing the content toward a subset of test-takers (for example, in the listening section involving an academic lecture, the content must be universal enough for students from many different academic fields of study). Statistical characteristics, including the IRT equivalents f estimates of item facility (IF) and the ability of an item to discriminate (ID) between higher or lower ability levels, are so coded.

Items are then designed by a team who select and adapt items and solicited from a bank of items that have been “deposited” by free-lance writers and ETS staff. Probes for reading section, for example, are usually excerpts from authentic general or academic reading that are edited for linguistic difficulty, culture bias, or other topic biases. Items are designed to test overall comprehension, certain specific information, and inference.

(B) The selection of items in the ESLPT entailed two entirely different processes. In the two subsections of the test that elicit writing performance (summary of reading; response to reading), the main hurdles were (a) selecting appropriate passages for test-takers to read, (b) providing appropriate prompts, and (c) processing data form pilot testing.

Make Appropriate Evaluations Of Different Kinds Of Items
Specify Scoring Procedures And Reporting Formats
Perform Ongoing Construct Validation Studies

STANDARDIZED LANGUAGE PROFICIENCY TESTING

Test language proficiency presuppose a comprehensive definition of the specific comprehensions that comprise overall language ability. The specifications for the TOEFL provided an illustration of an operational definition of ability for assessment purposes. Swain (1990) guistic traits (grammar, discourse, and sociolinguistics) that can be assessed by means of oral, multiple-choice, and written responses. Swain’s conception was not meant to be an exhaustive analysis of ability, but rather to serve as an operational framework for constructing proficiency assessments.

Another definition and conceptualization of proficiency is suggested by the ACTFL association. ACTFL takes a holistic and more unitary view of proficiency in describing four levels: superior, advanced, intermediate, and novice. Within each level, descriptions of listening, speaking, reading, and writing are provided as guidelines for assessment.

FOUR STANDARDIZED LANGUAGE PROFICIENCY TESTS

Use the following questions to help evaluating those four tests and their subsections :

What items types are include?
How practical and reliable does each subsection of each test appear to be?
Do the item types and tasks appropriately represent a conceptualization of language proficiency (ability)? Can we evaluate their construct validity?
Are the task authentic?
Is there some washback potential in the task?

Test of English as Foreign Language (TOEFL^®)

Michigan English Language Assessment Battery (MELAB)

International English Language Testing System (IELTS)

Test of English for International Communication (TOEIC^®)

Language Teaching Evaluation Task