DEVELOPING AND VALIDATING EFL READING TEST

Received June 9, 2020 Revised Oct 7, 2020 Accepted Dec 30, 2020 EFL students need to learn about reading comprehension because, it is one of the essential abilities in learning English. To measure the students reading comprehension, a reading test is needed. Nevertheless, many students find difficulties in working the reading test. It happens because the reading tests which are given by teachers who do not analyze item difficulty, discrimination, distractor, and reliability. Those analyses are needed to prove that the reading test has met criteria of good tests. Criteria of good tests are needed in making the reading test because the reading test is used for measuring the achievement of EFL students. This study purposes at developing the reading test based on Indonesia National Curriculum basic competence, learning indicators, English lessons, and it examines to meet the criteria of good test. The researcher adapted Borg and Gall's (1983) theory in developing the reading test.

test (Emaliana, I., Tyas, P.A., Widyaningsih, G.E.N., Khotimah, S.K., 2019) The criteria of a good test are needed because the reading test is used for measuring students' achievement. A good reading test is created by considering the criteria of the language test. There are five criteria for making a good test. The first criterion is the validity, it means that the reading test should be valid to which the reading test can be measured what it should be measured (Heaton, J.B., 1990) validity is conclusions made from the results of the test are appropriate, meaningful, and useful in terms of the purpose of the assessment. The test measuring what should be measured. The validity of a test is the extent, to which it exactly measures what it is supposed to measure (Heaton, J.B., 1990) A test has to aim to provide the actual size of a particular skill intended to measure not to the extent of measuring external knowledge and other skills at the same time.
The second criterion is the reliability, it that means the reading test should be reliable as a measuring instrument by verifying the consistency of the test results (Hughes, 2003). Reliability means the ability of assessment tools to produce stable and consistent results (Hughes, 2003). A test can be relied on if the teacher gives the same test to the same student on two different chances, the test must produce similar results. Therefore, reliability refers to the consistency of scores obtained by students.The third criterion is the discrimination, it means that the reading test can discriminate the students' performance (Heaton, J.B., 1990). (Heaton, J.B., 1990) stated that the test must have the capacity to discriminate individual understanding in the learning group. (Heaton, J.B., 1990) divided the difficulty of a test into ten levels: extremely easy items, very easy items, fairly easy items, items below average difficulty level, items of average difficulty level, items with above average difficulty level, fairly difficult items, difficult items, very difficult items, and extremely difficult items The fourth criterion is the practicality, it means that the reading test should have practical administration and efficiency test (Brown, H.D., 2004) Practicality is an effective practical test. This means that the test is not excessively expensive, stays within an appropriate time constraint, is relatively easy to administer, has a scoring or evaluation procedure that is specific and has time-efficient (Brown, H.D., 2004) Furthermore, details of the test administration must be clearly defined before the test. Next, all materials and equipment for the test must be ready. Moreover, test fees must be within the budgeted limit. Next, the evaluation or evaluation system must be appropriate within the teacher's time frame. Students must be able to complete the test naturally within the specified time frame. Practicality can be simply defined as the relationship between available resources for the test, human resources, material resources, time, and resources which will be required in the design, development, and the use of the test. The last criterion is the authenticity, it means that the reading test should be authentic (Brown, H.D., 2004) authenticity is the degree of correspondence of the characteristics of a given language test task to the features of a target language task (Brown, H.D., 2004) Authenticity of a test may be present in the following ways: the language in the test as natural as possible; items are contextualized rather than isolated; topics are meaningful (relevant, interesting) to the learners; some thematic organization to items is provided, such as through a story or episode; and tasks represent, or closely approximate, real-world tasks.
In reality, based on the need analysis conducted by the researcher by interviewing English teacher in SMAK Frateran Malang, the main problem was the process of making the reading test. In making the reading test, the teacher has not been analyzed the item difficulty, item discrimination, effective distractor and item reliability of the reading test because of the time limitation. However, analyzing the difficulty, discrimination, effective distractor, and reliability of the reading test is needed because the reading test is used for measuring achievement for grade ten.
Considering existing problem having no standardized English reading test for the tenth grade students, the researcher designed and developed the reading test that emphasizes on the criteria of a good reading test. The researcher adopted the theory from

Participants
Participants were 27 students from some senior high school in Malang. The students were 15 or 16 years old. The students at the intermediate level in TOEFL score (425-450).

Procedure
In the procedure of Research and Development, the researcher applied the cycles proposed by

Need Analysis
Need analysis as the starting point of research and development is done in order to find out the discrepancy between the factual condition and a desired set of condition (Borg, W.R. & Gall, J.P., 2003) This step is profoundly prominent since the result becomes the foundation of the whole research. To collect the information needed in research and development design, there are numerous ways that can be chosen, such as: interview, questionnaires, observation, data collection, informal consultation with practitioner or expert, and others. In this research, the data from need analysis were obtained by applying interview.
Interview with the teacher was intended to gather information about the current reading test and the expected product of the reading test. The interview was conducted to the English teacher by using the interview guideline in collecting the data. From the result of conducting interview, the researcher had the concept in making the plan to write the reading test.

Planning
The researcher planned to make the reading test specification and blueprint. First, the researcher identified the objective of the reading test. Identifying the objectives of the course is important to do to ensure the validity content of the test. The test should meet the objectives of the course. Therefore, the researcher have to identify the objectives and then determine the objective of the test, the micro skills tests, and type of the texts. The objectives of the course for the students were designed to improve the students' reading skills. The specific instructional objectives evaluate students' understanding of the topic, main ideas, and word meanings. Second, the researcher made the types of tests, number of texts in the tests, number of items, time for the tests, equipment involved, and scoring method of the reading test. In scoring, the correct answer gets a score of 1, while the wrong answer gets a score of 0. Last, the researcher designed the blueprint and made indicators based on the Indonesia National Curriculum basic competence number 3.9 and 4.13 curriculum 2013 that are suitable for the needs of the students.

Writing the Test
After making the test specification and blueprint of the reading test, the next step was writing the reading test. The researcher wrote the reading test covering writing the instruction and the items. The instructions have to be clear to ensure that the students are not confused in reading the instructions. The question items are based on reading micro skills. The distribution of test item was as follows. There are fourteen implied detail questions. Furthermore, there are five words meaning questions. Last, there is one main idea question.

Expert Validation
The validation on reading test was done by one lecturer in English Language Teaching Program on Universitas Brawijaya. There are ten checklist to validate the reading test. The checklist consists of the construct of the texts, the content of the texts, the length of the texts, the difficulty of the texts, the number of items, the number of distractors, the micro skills, the questions, the directions, and the time allotment. She approved that the reading test is appropriate to be used with minor revision on word choices and grammar in the sentence.

First Revision
The researcher received feedbacks whether the reading test needed to revise or not. The main purpose of the revising is to improve the reading test before implementing it to the students. Based on that evaluation, the researcher revised the word choices and grammar in the sentences in the reading test. After the revision, the reading test is ready to be used for the students.

Main Field Testing
Due to the limited amount of time, tryout was once in main field testing. In the opening the activity, the researcher informed the students the purpose of the conducting the try-out. In the tryout activity, the researcher asked the students to work the reading test. They were asked to take a pen and keep the English text book and dictionary. The students were permitted to ask question if they met a problem in working the reading test. At the end of the activity, the students were asked to collect the reading test.

Analyzing the Results of the Main Field Testing
Item Difficulty Analysis The researcher used the classification from (Djiwandono, M.S. (1996)., 1996) to find the difficulty level of the reading test. The classification is as follows:

Item Discrimination Analysis
The discrimination of the test items tells how well the items in test perform in separating the higher group and the lower group. The researcher used the classification from (Djiwandono, M.S. (1996)., 1996) to get the discrimination score of the reading test. The classification is as follows:  11,12,13, 14 and 20. The score is higher than 0.400.

Effective Distractors Analysis
The data show that ten items of the reading test have effective distractors and ten other items have ineffective distractors. The researcher used the classification from (Djiwandono, M.S. (1996), to get the distractor score of the reading test. The classification is as follows: The researcher used the classification from (Djiwandono, M.S. (1996)., 1996) to get the reliability score of the reading test. The classification is as follows: Based on the Item and Test Analysis Program (ITEMAN) version 4.3 results, it is found that the score of reliability of the reading test is 0.603. The range score is from 0.401 to 0.700, this means that the score is moderate. Based on the score it can be concluded that the reading test is reliable.

Final Revision
The researcher revised the test items based on the results of the item analysis because of the item difficulty, item discrimination and distractor did not meet the required standard. Based on the item analysis results, there are three items that can be used directly, but fourteen others have to be revised, and three other items replaced with by items. Revised items are the items that have a low level of discrimination and distractors who have poor qualifications. For the level of difficulty is not revised because it has met the curve of the test difficulty. The revised items are explained as follows: first, the item number 1 has to be revised in term of the questions. Second, the item number 2 has to be revised in term of the questions and option E. Third, the item number 3 has to be revised in term of the questions and option C. Fourth, the item number 4 is replaced by new item. Fifth, the item number 5 is replaced by new item. Sixth, the item number 6 has to be revised in term of the questions and option D. Seventh, the item number 7 is replaced by new item. Eighth, the item number 8 has to be revised in term of the questions. Ninth, the item number 9 has to be revised in of term the questions and option D and E. Tenth, the item number 10 has to be revised in term of the questions. Eleventh, the item number 11 has to be revised in term of the option E. Twelfth, the item number 12 has to be revised in term of the option D. Thirteenth, the item number 13 has to be revised in term of the option D and E. Fourteenth, the item number 14 has to be revised in term of the option E. Fifteenth, the item number 15 can be used directly. Sixteenth, the item number 16 can be used directly. Seventeenth, the item number 17 has to be revised in term of the questions option C. Eighteenth, the item number 18 has to be revised in term of the questions. Nineteenth, the item number 19 has to be revised in term of the questions and option E. Last, the item number 20 can be used directly.

Final Reading Test
After going through and completing all development processes, those produced the final reading test. The final reading test consisted of 20 items that focused on implied details, main idea, and word meaning. Furthermore, the final reading test is created by considering the criteria of the language test. They are validity, reliability, discrimination, practicality, and authenticity (Heaton, 1990;Hughes, 2003;and Brown, 2004). Moreover, the final reading test has been evaluated based on theories from Emaliana et al. (2019) and Djiwandono (1996). It can be concluded that the final reading test as the product of this research is ready to be applied for testing English reading to EFL students.

Discussion
Generally, the reading test items development and validation appears to be feasible. Novice reading test items writer will find the development and validation procedures intuitive and easy to use. By starting deciding test specification, especially test blueprint, finding it useful for structuring the writer's plan for developing the reading test. The test writer produced 35 items in roughly 3 days of at-home-working time, plus 6 hours of meeting review time. After the validation, some of the items were not usable, therefore, the final test consists of 20 items. The results of this reading test items development and validation are encouraging, although they are limited by several factors. First, the try out was done once only after the test items were made and consulted to the expert for language assessment due to limited time and permission grant from the school. Second, the test developed is for formative test than summative test, therefore, future researchers need to consider this for their future research. Finally, judgments about the quality of the selected test items were made by only one time measurement because the try out was only done once.

Conclusion
As a Research and Development design, this study develops a reading test for EFL students. The reading test was designed through stages of development proposed by (Borg, W.R. & Gall, J.P., 2003)The stages are conducting a need analysis, planning, developing a preliminary form of a product, preliminary field testing, main product revision, main field testing, operational product revision, and final product.
The reading test is designed by referring to the Indonesia National Curriculum basic competence number 3.9 and 4.13 curriculum 2013. Furthermore, the reading test is created by considering the criteria of the language test. They are validity, reliability, discrimination, practicality, and authenticity (Heaton, J.B., 1990); Hughes, 2003; and (Brown, H.D., 2004) ). Moreover, the reading test has been evaluated based on theories (Emaliana, I., Tyas, P.A., Widyaningsih, G.E.N., Khotimah, S.K., 2019) and (Djiwandono, M.S. (1996)., 1996). It can be concluded that the reading test as the product of this research is ready to be applied for testing English reading for EFL students.
After becoming the final reading test, the reading test is ready to be applied for testing reading comprehension. For the teacher, this reading test is expected to help the teacher to find out the reading test that is appropriate with the standard reading test to be applied in testing reading comprehension. For the further development of the reading test, hopefully it can be developed for other materials, with other various techniques, and to other schools. The material can be changed related to the different competences that will be learned based on the syllabus.