Reliability and Validity of the DISCOVER Assessment 

     This section provides a summary of studies that have been completed to show reliability and validity for the DISCOVER Assessment.  To accommodate non-technical readers of this material, many terms and procedures, common to researchers, are explained in an easy-to-read format, and some conclusions are generalized, based on underlying data.  If you are a researcher interested in the statistical particulars of the various studies, click the “Details” graphic where available.

     The DISCOVER Assessment was developed and refined over a 13-year period, supported by the Office of Bilingual Education and Minority Languages Affairs and the Javits Gifted and Talented Education Program.  It has been used with varied multicultural populations in the United States and abroad, and with students from varied economic levels.  To learn more, click DISCOVER Assessment.  To learn more about proposed future development of the Assessment, click DISCOVER—Personalized Education—The Assessment.

     Reliability:  Reliability is the measurement of how consistently an instrument accomplishes its intended purpose.  For example, an instrument with high reliability will produce consistent results when implemented under similar circumstances by different individuals.  Reliability studies for the DISCOVER Assessment analyzed consistency in several categories, to see what happened when the Assessment was implemented either by DISCOVER staff, or by non-DISCOVER staff—either possessing varying levels of experience.

     The studies focused on Observers, the trained individuals who observe problem solving strategies during the Assessment.  We wanted to know if different Observers would reach the same conclusions on ratings independently, after viewing the same Assessment.  As it turned out, the difference between DISCOVER staff and trained non-DISCOVER personnel was very little, but a sizeable variation did occur between levels of experience.

     Observers were categorized as Novice (having observed less than 10 Assessments), Experienced (10-29 Assessments) and Expert (30 or more Assessments).  Agreement between Novice Observers varied considerably, with agreement occurring anywhere from 47% to 92% of the time.  However, agreement between Expert Observers was between 92% and 100%.  Interestingly enough, experience did not seem to be a factor when looking specifically at agreement on the “Definitely a Superior Problem Solver” rating, the rating used by most schools as a criterion for placement in special programs.  Observers across all experience levels agreed on this rating 95% of the time.

     Recent studies, across all levels of experience, have shown agreement amongst DISCOVER personnel to average 81%, with 100% agreement on the “Definitely” rating.  Also, overall agreement between DISCOVER Personnel and school district teams averaged 85%, with 82% overall agreement among members of the district teams.

     Because Assessment reliability is dependent, to some degree, upon experience, we stress the importance of sufficient training and practice for new observers, as well as continual use of the skills learned.  We also now require that Observers be certified by DISCOVER trainers, with re-certification and supplemental training necessary on a yearly basis.

     Validity:  Validity, in a broad sense, is a determination of whether an instrument actually measures what it is intended to measure.  Several categories of validity exist but here we will focus on three specifically, as related to the DISCOVER Assessment:  Theoretical, Convergent, and Predictive.

     Theoretical Validity addresses the extent to which results align with expectations, given the underlying theory.  Dr. Maker incorporated into the design of DISCOVER the belief that all races have roughly an equal percentage of exceptional or “gifted” individuals…not necessarily gifted in the same ways but, when considering the different intelligences, gifted in equal numbers.  Therefore one would expect that if a school were to use the DISCOVER Assessment as a placement mechanism in programs for the gifted, the ethnic balance in these programs would then approximate the overall ethnic balance in the school.  This, in fact, does occur.  The percentage of students who receive the highest ratings (relative to their group size) is similar across ethnic, cultural, language, and economic groups (Nielson, 1994; Maker, 1997; Sarouphim, 1999). 

     Concurrent Validity compares a new instrument with those more established, that supposedly measure the same things.  Concurrent validity for the Assessment has proven difficult to pin down, because of a lack of similar instruments.  As discussed in the Problem Solving section, most other tests look at problem types one and two only.  DISCOVER collects information on all five problem types, creating a proverbial “apples and oranges” situation.  Nevertheless, we would expect some degree of correlation between certain aspects of the Assessment with other established instruments.  With a couple of exceptions, this has been the case.  We are using these studies to fine-tune Assessment procedures to raise (or lower as the case may be) these correlations.  Click the details button for research data.

  Details
 

     Predictive Validity addresses the issue of the ability of a test to predict who will do well at certain activities, and whether or not similar results will be achieved next year, or five years from now.  This type of validity is difficult to determine as well, because of the dynamics of intellectual growth.  Because the strengths measured by DISCOVER are not fixed in an individual, these abilities can increase or decrease, depending on how much they are developed.  As a result, ratings can fluctuate from year to year and it is difficult to isolate whether the fluctuation occurs because the individual changes or because of the Assessment design.  We currently are analyzing an 8-year longitudinal study to isolate patterns as much as possible.  So far fluctuations are relatively small, but worthy of attention nonetheless.  Another study that has been completed compared results over a three-year period.  In this study, Romanoff (1999) examined a problem solving assessment (PSA) containing three of the DISCOVER assessment activities.  The PSA was used with students referred by their teachers as showing promise; the students referred, tested, and selected were compared with those referred, tested and not selected.  Scores in reading and math, on North Carolina end of grade tests, averaged across grades 3, 4, and 5 were significantly higher for all students identified as gifted (M=84.03 for gifted and 58.37 for non-gifted). She also found that the differences between gifted African American and Caucasian students were not as great as those between non-gifted African American and Caucasian students.

Visit Us Back to Home Page Contact Us