Originally Authored by Jacobus Donders, Brainne Elzinga, David Kuipers, Emily Helder, & John R. Crawford in Child Neuropsychology
This study evaluated the degree to which an 8-subtest short form of theWechsler Intelligence Scale for Children—Fourth Edition would yield acceptable estimates of the long-form Full-Scale IQ index while clarifying the underlying factor structure in a sample of 100 children and adolescents with traumatic brain injury. The short-form Full-Scale IQ had sufficient (i.e., at least two thirds) nonerror covariance with its full-length counterpart. In addition, a sufficient proportion (i.e., > 80%) of these short-form estimates fell within the 90% confidence interval of the respective full-length scores. Importantly, the elimination of 2 subtests, and in particular the Picture Concepts subtest, resulted in a factor structure where each remaining subtest was fairly specifically associated with its intended scale. It is concluded that this short form can be used clinically in children with traumatic brain injury without sacrificing reliability and with more straightforward interpretability at the level of the factor index scores.
Keywords: Assessment; Intelligence; Psychometrics; Short form
The Wechsler Intelligence Scale for Children—Fourth Edition (WISC-IV; Wechsler, 2003) is a widely used test of general intelligence with excellent psychometric properties. The four-factor structure of this instrument has been replicated in various clinical samples, although it has also been pointed out that the influence of general ability level or g should not be underestimated when interpreting the factor index scores (Bodin, Pardini, Burns, & Stevens, 2009; Watkins, 2010). Previous research (Allen, Thaler, Donohue, & Mayfield, 2010; Donders & Janke, 2008) has provided support for the criterion validity of the Processing Speed index of this instrument as being sensitive to the effects of traumatic brain injury (TBI) in children and adolescents.
One practical concern about the test pertains to lengthy administration duration (75–90 minutes), which can become problematic when considering issues ranging from patient fatigue to time constraints. This would not be a problem if a shorter IQ test were used, like the Wechsler Abbreviated Scale of Intelligence—Second Edition (Wechsler, 2011), but that would mean sacrificing the Working Memory and Processing Speed subtests, which is not desirable when working with pediatric populations where those constructs are of particular interest. An at least equally important concern is that some of the WISC-IV subtests appear to be quite insensitive to severity of TBI, which may lead to an overestimate of children’s true ability levels postinjury. For example, Donders and Janke (2008) found that the Picture Concepts subtest actually yielded higher mean scores in children with moderate-to-severe TBI than in demographically matched controls. This raised concern about the interpretability of the Perceptual Reasoning index in such children.
Research with an earlier version of this test demonstrated that a short form, based on fewer subtests but maintaining the overall structure, could be applied reliably and validly after pediatric TBI (Donders & Warschausky, 1996). Until recently, such a short form was not available for the WISC-IV. However, Crawford, Anderson, Rankin, and MacDonald (2010) developed a short form of the WISC-IV that was based on 7 instead of 10 subtests while preserving the four-index structure of the instrument. This short form had strong psychometric properties, including reliabilities ranging.87–.96, and uncorrected correlations with full-length indices ranging.86–.99. In addition, an attractive feature of this short form was that the authors provided confidence intervals, actuarial information about abnormality of differences, and the base rates for specific numbers of abnormal scores. At the same time, a potential concern about this short form was that the Working Memory index was estimated on the basis of only one subtest (Digit Span), which could potentially compromise the reliability of that short-form index in clinical samples.
Crawford and colleagues (2010) developed their short form utilizing the standardization sample of the WISC-IV but it has not yet been validated in specific clinical samples. Given the recommendation that new psychometric instruments should be validated for use with specific populations (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999), the first goal of the current study was to evaluate the potential clinical utility of the Crawford et al. short form of the WISC-IV in children with TBI. A second goal was to see to what extent modifying this 7-subtest short form to an 8-subtest one (i.e., with two subtests for each of the four indices) would improve (a) the clinical utility of the Full-Scale IQ estimate and (b) the interpretability of the underlying factor structure. Full-Scale IQ was considered to be important because of the known strong influence of g in WISC-IV data whereas clarification of the factor structure was considered pivotal to interpreting the various factor indices with sufficient confidence.
Consistent with previous research (Donders & Axelrod, 2002), which followed recommendations from Nunnally (1978, pp. 245–246) and Kaufman (1994, p. 99), the following a priori criteria were set to evaluate whether the short-form Full-Scale IQ estimate would be acceptable for clinical application: (a) Short- and long-form Full-Scale IQ indices should share at least two thirds of nonerror variance, as reflected in part-whole correlations that were corrected for shared error variance ≥ .82; and (b) the vast majority (i.e., ≥ 81%) of the short-form Full-Scale IQ estimates should fall within the 90% confidence intervals of their full-length counterparts. With regard to the second goal of this investigation, it was expected that the eight-subtest short form should yield a factor structure that was more clearly interpretable than the full-length version. It was determined a priori, based on sample size and standard errors, that this would be reflected by specific subtests loading ≥ 0.50 on their assigned factor index and ≤ 0.25 on any of the other three indices (Stevens, 2002).
Following institutional review board approval, participants were selected for this investigation from an approximately 7-year series of consecutive outpatient referrals for pediatric neuropsychological evaluation at a Midwestern rehabilitation facility, according to the following criteria: (a) age between 6 and 16 years; (b) diagnosis of TBI, defined as an acute external force to the head with associated alteration of consciousness; (c) evaluation with the WISC-IV within 1–12 months after injury; (d) no premorbid history of special education placement, treatment for attention deficit/hyperactivity disorder (ADHD) or other psychiatric disorder, neurological illness, substance abuse, or personal abuse; and (e) performance in the valid range on a test of effort and motivation (Tombaugh, 1996). Data collection continued until there were 100 participants who met these criteria. The 40 participants from the Donders and Janke (2008) investigation were also included in the current sample. During the time period of data collection, the WISC-IV had been included on a routine basis in neuropsychological assessments of pediatric patients with TBI at the organization where this research was completed, except under circumstances that would have invalidated some of the results, such as English not being the primary language or an orthopedic injury to the dominant hand that would interfere with the manipulation of a pencil.
The final sample included 58 boys and 42 girls. They were evaluated at an average of 125.04 days postinjury (SD = 82.44), at a mean age of age of 12.50 years (SD = 3.07). Parent-identified ethnicities included Caucasian (n = 76), Latino/a (n = 9), African (n = 8), Asian (n = 2), and Other (n = 5). Mean parental level of education was 13.61 years (SD = 2.33). The majority of the participants (n = 57) had sustained a TBI in a motor vehicle accident, with other causes of injury including primarily falls and recreational activities (n = 36), with a small group (n = 7) of various other circumstances.
Several measures of injury severity were considered. Exact Glasgow Coma Scale scores and/or prospective estimates of posttraumatic amnesia were unavailable for about a fifth of the sample (n = 19). However, information on duration of coma, defined as the time to reliably follow verbal commands, and on the presence or absence of acute intracranial findings on neuroimaging, was available for all participants. Mean duration of coma was 1.62 days (Mdn = 0, SD = 4.13, range = 0–25), with 60 participants having coma ≤ 30 minutes, and 31 participants remaining in coma ≥ 24 hrs. Fifty-seven participants had positive diffuse (e.g., edema, shear injury; n = 23) and/or focal (e.g., hemorrhage, contusion; n = 53) abnormalities on neuroimaging, with some participants having both focal and diffuse lesions. Forty of the participants met conventional criteria for uncomplicated mild TBI, including coma ≤ 30 minutes and negative findings on neuroimaging.
In order to make sure that there was no confounding of the results because of any potential association between injury severity on the one hand and the time interval between the TBI and the psychometric assessment on the other hand, a few preliminary analyses were performed. The average number of days since injury of the 40 children with uncomplicated mild TBI (M = 137.05, SD = 89.44) was not statistically different from that of the remaining 60 children with relatively more severe injuries (M = 117.03; SD = 77.16), t(98) = 1.19, p > .23. In the complete sample, there was also no statistically significant correlation between interval time since injury and any of the full-length or short-form WISC-IV indices (p > .25 for all variables). Thus, time since injury was not used as covariate in any of the analyses.
The WISC-IV was administered according to standardized procedures to participants on an outpatient basis when they were medically stable and could recall meaningful information from day to day. In the case that a child had more than one neuropsychological evaluation after their TBI, only the first one was used. Scaled (M = 10, SD = 3) subtest scores and standard (M = 100, SD = 15) index scores from the WISC-IV were used for all of the analyses.
In order to compute the short-form estimates of the index scores, the formulas described by Crawford et al. (2010) were applied, which were based on the WISC-IV standardization sample summary statistics as reported in the test manual (i.e., subtest means, standard deviations, correlations, and reliability coefficients). The revised computer scoring software (SF_WISC4_8.exe), which is based on eight subtests, can be downloaded from http://www.abdn.ac.uk/~psy086/dept/sf8_wisc4.htm. The original short form was based on seven subtests, including Block Design, Similarities, Digit Span, Coding, Vocabulary, Matrix Reasoning, and Symbol Search. The revised short form used the same seven subtests, along with Letter-Number Sequencing.
Paired samples t-tests were used to compare the means of the long- and short-form indices in the clinical sample. The procedure described by Levy (1967) was used for the correction of spurious part-whole correlations between short and long forms due to shared error variance. An alpha level of .05 was maintained for statistical significance, but the stepdown Bonferroni correction was used to balance the risk of Type I and Type II errors when making multiple comparisons. This method, also known as the Holm correction, is a sequentially rejective approach that controls the error rate in a somewhat less conservative manner but without assuming independence of comparisons (for more detail, see Millis, 2003). In addition, a medium effect size was specified a priori as required for clinical significance. For comparisons of mean factor index scores, this would be reflected in a minimum value of 0.33 for Cohen’s d, equivalent to about one third of a standard deviation, which is comparable to the standard error of measurements for these scores.
With regard to the second goal of the study, exploratory factor analysis was performed because the sample size was not considered large enough for confirmatory factor analysis. Principal factor analysis was used instead of principal component analysis because of the fact that the interest for this particular goal was not primarily in data reduction but in clarifying the latent structure of the WISC-IV in children with TBI. Furthermore, because there was good reason to believe that the factors would be correlated to some extent, as opposed to measuring completely independent constructs, an oblique rotation was preferred over an orthogonal one. Two separate analyses were performed; one using all 10 standard WISC-IV subtests and one using only the eight of the revised short form. In both cases, a four-factor solution was specified a priori.
The average full-length and short-form WISC-IV standard scores of the clinical sample are presented in Table 1. The associated subtest profile is presented in Figure 1 for illustrative purposes. Inspection of Table 1 suggests that, although there were statistically significant differences between several of the respective short-form and full-length standard scores, these differences were quite small (i.e., consistently less than two points) and did not meet the a priori established level of clinical significance, as reflected in the effect sizes.
Table 2 presents the primary data of interest from the clinical sample for the first goal of this study. Inspection of this table suggests that the Full-Scale IQ short-form estimates of both the original and the revised Crawford et al. formula met the a priori established criteria of (a) error-corrected part-whole correlations ≥ .82 as well as (b) ≥ 81% of the estimates falling within the 90% confidence interval of the corresponding full-length values. The revised estimate was slightly more robust, likely because of the inclusion of the additional subtest Letter-Number Sequencing. Although the majority of the original shortform Working Memory estimates (i.e., without Letter-Number Sequencing) fell within the required 90% interval, this estimate had only modest covariance with its full-length counterpart after correction for shared error variance. Importantly, this proportion was much better for the revised short form; a difference that was highly statistically significant, z = 4.59, p < .001.
It is noteworthy that the Perceptual Reasoning and General Ability short-form estimates fell outside of the 90% confidence interval of their full-length counterpart in about one out of every four cases, but this was anticipated and not considered to be problematic because the elimination of the Picture Concepts subtest was expected to reduce the suspected overestimation of ability levels, as reflected in these indices. It should also be noted that, although the revised Crawford et al. formula yields slightly different standard scores for Working Memory and Processing Speed than the WISC-IV manual, virtually all of them fell within the associated 90% confidence interval.
We used data from the WISC-IV manual to calculate the reliability of the eightsubtest short form in the U.S. standardization sample. The resulting summary statistics are presented in Table 3. Inspection of this table suggests that all short-form indices were very reliable. Corrected correlations with the full-length counterparts were also high. These findings confirm the robustness of the eight-subtest short form.
We then conducted principal factor analyses with oblique rotation to determine the latent structure of the WISC-IV in this sample. These findings are presented in Table 4, whereas the factor intercorrelations are included in Table 5. Inspection of Table 4 suggests that, in the 10-subtest solution, Picture Concepts had low loadings across all factor indices. In the eight-subtest solution, all of the remaining subtests met the a priori specified criteria of loading ≥ .50 on their assigned scale and ≤ .25 on any of the other three indices. The eight-subtest solution clearly preserved the original WISC-IV four-factor model, with a sufficiently clear association of each subtest with its corresponding factor index. The findings from Table 5 suggest that the amounts of shared variance between the factors in the eight-subtest solution ranged from a low of 3% between Verbal Comprehension and Processing Speed to a high of 37% between Verbal Comprehension and Perceptual Reasoning. These findings suggest acceptable degrees of overlap between nonredundant constructs.
This study investigated the potential clinical utility of an eight-subtest short of the WISC-IV, modified from Crawford et al.’s (2010) seven-subtest version, in children and adolescents with TBI. Based on the a priori criteria, the eight-subtest short-form estimate of Full-Scale IQ was found to be unequivocally acceptable. It had more than adequate covariance, corrected for shared error, with its full-length counterpart, and the vast majority of these estimates also fell within the 90% confidence interval of the long-form version.
The results from the exploratory factor analysis of the complete (10-subtest) WISCIV suggested that Picture Concepts failed to be meaningfully associated with any of the four factor index scores and clearly not specifically with the Perceptual Reasoning index. Keith, Goldenring-Fine, Taub, Reynolds, and Kranzler (2006) have suggested that the Perceptual Reasoning subtests actually assess two related but distinct latent constructs; that is, fluid reasoning and visual processing. Since one of each is retained in the short form (i.e., Matrix Reasoning as a measure of the former, and Block Design as representing the latter), no meaningful content validity is lost with the revised short form. In fact, the factor analysis of the eight-subtest short form suggests a more apparently straightforward factor structure, which is an important consideration for clinical neuropsychological assessment.
It should be noted that the purpose of the development of this eight-subtest short form was not exclusively or even primarily to deal with situations where time constraints or patient fatigue are at stake. If only a quick and reliable estimate of Full-Scale IQ is required, then there are other and much shorter alternatives, such as the WASI-II. However, in clinical populations with specific neuropsychological impairment, it is highly desirable to have the differentiation that is provided by the four-factor indices of the WISC-IV. This opportunity is preserved in this short form. In addition, the available software automatically provides confidence intervals and actuarial information about abnormality of differences, without the need to look this up in any paper materials. Importantly, the software also provides information about the base rates for specific numbers of abnormal scores and abnormal score differences, which is not readily available from the WISC-IV manual. These base rates are obtained via Monte Carlo simulation using methods developed by Crawford, Garthwaite, and Gault (2007). It also provides a multivariate index of the overall level of abnormality of a child’s index score profile, which is only practical via a computer as the necessary calculations involve matrix inversion (see Crawford et al., 2010, for further details). Thus, although the time savings in terms of test administration time are modest (i.e., median of about 20 minutes, based on the estimates provided by Ryan, Glass, and Brown, 2007), the availability of free scoring software that provides incremental information is likely to be attractive to pediatric neuropsychologists.
Limitations of this investigation must also be considered. We used a referred convenience sample that included relatively more patients with severe TBI than would have been found with consecutive emergency room admissions. However, this also prevented restriction of range effects on variables such as coma and intracranial findings. Most of the children with uncomplicated mild TBI were referred because they had other clinical indicators for potentially more significant trauma, such as presence of multiple fractures, any episode of altered hemodynamic status with concurrent abdominal injury, and/or concern about comorbid posttraumatic stress disorder. However, all children in this sample had unequivocal evidence for head trauma in the absence of any significant premorbid health or academic concerns, and this is therefore not a group of widely diverse etiologies, which would otherwise have been a concern from a factor-analytic point of view (Delis, Jacobson, Bondi, Hamilton, & Salmon, 2003). A more important consideration is that this sample was limited to patients with TBI. Validation of the eight-subtest short form in other clinical samples is still needed. With these reservations in mind, it is concluded that this revised eight-subtest short form is acceptable for determination of WISC-IV standard scores in the clinical assessment of children and adolescents with TBI.
Allen, D. N., Thaler, N. S., Donohue, B., & Mayfield, J. (2010). WISC-IV profiles in children with traumatic brain injury: Similarities and differences from the WISC-III. Psychological Assessment, 22, 57–64.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Bodin, D., Pardini, D. A., Burns, T. G., & Stevens, A. B. (2009). Higher order factor structure of the WISC-IV in a clinical neuropsychological sample. Child Neuropsychology, 15, 417–424.
Crawford, J. R., Anderson, V., Rankin, P. M., & MacDonald, J. (2010). An index-based short-form of the WISC-IV with accompanying analysis of the reliability and abnormality of differences. British Journal of Clinical Psychology, 49, 235–258.
Crawford, J. R., Garthwaite, P. H., & Gault, C. B. (2007). Estimating the percentage of the population with abnormally low scores (or abnormally large score differences) on standardized neuropsychological test batteries: A generic method with applications. Neuropsychology, 21, 419–430.
Delis, D. C., Jacobson, M., Bondi, M. W., Hamilton, J. M., & Salmon, D. P. (2003). The myth of testing construct validity using factor analysis or correlations with normal or mixed clinical populations: Lessons from memory assessment. Journal of the International Neuropsychological Society, 9, 936–946.
Donders, J., & Axelrod, B. N. (2002). Two-subtest estimations of WAIS-III factor index scores. Psychological Assessment, 14, 360–364.
Donders, J., & Janke, K. (2008). Criterion validity of the Wechsler Intelligence Scale for Children—Fourth Edition after pediatric traumatic brain injury. Journal of the International Neuropsychological Society, 14, 651–655.
Donders, J., & Warschausky, S. (1996). Validity of a short form of the WISC-III in children with traumatic head injury. Child Neuropsychology, 2, 227–232.
Kaufman, A. S. (1994). Intelligent testing with the WISC–III. New York, NY: Wiley.
Keith, T. Z., Goldenring-Fine, J., Taub, G. E., Reynolds, M. R., & Kranzler, J. H. (2006). Higher order, multisample, confirmatory factor analysis of the Wechsler Intelligence Scale for Children—Fourth Edition: What does it measure? School Psychology Review, 35, 108–127.
Levy, P. (1967). The correction for spurious correlation in the evaluation of short-form tests. Journal of Clinical Psychology, 23, 84–86.
Millis, S. (2003). Statistical practices: The seven deadly sins. Child Neuropsychology, 9, 221–233.
Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
Ryan, J. J., Glass, L. A., & Brown, C. N. (2007). Administration time estimates for Wechsler Intelligence Scale for Children-IV subtests, composites, and short forms. Journal of Clinical Psychology, 63, 309–318.
Stevens, J. P. (2002). Applied multivariate statistics for the social sciences. Mahwah, NJ: Erlbaum.
Tombaugh, T. N. (1996). Test of Memory Malingering. Toronto, ON: Multi Health Systems.
Watkins, M. W. (2010). Structure of the Wechsler Intelligence Scale for children—Fourth Edition among a national sample of referred students. Psychological Assessment, 22, 782–787.
Wechsler, D. (2003). Wechsler Intelligence Scale for Children (4th ed.). San Antonio, TX: Psychological Corporation.
Wechsler, D. (2011). Wechsler abbreviated scale of intelligence (2nd ed.). Bloomington, MN: Pearson.