Study Results
Sample
A total of 293 articles were used in the analysis. Among the articles reviewed, 41 originated in MIS Quarterly, 38 in Information Systems Research, 119 in Information & Management, 78 in Journal of Management Information Systems, and 17 in Management Science. Whereas most (68%) were field studies, coded works also included laboratory experiments (22%), case studies (5%), and field experiments (5%). In only 23% of the sampled articles, students (undergraduate or graduate) filled out the instrument. In the remainder of the articles, workers (sometimes in conjunction with students) were the ones to whom the instrument was administered. As far as the different techniques for data collection, it is worth mentioning that the majority (85%) of our sampled studies have collected data through the use of surveys (questionnaires). The conduct of an interview was the second most used technique, with 17% of the sampled articles using it. The use of more than one technique to capture data occurred in 31% of the studies.
Validation of Coding
Inter-rater was assessed to verify that our coding was reliable (Miles & Huberman, 1994). A second, independent coder thus coded a subset of the sampled articles. For the 11 coded attributes, the following percentages of agreement were obtained: type of research—77%; research method—85%; pretest—82%; pilot test—95%; content validity—90%; construct validity—85%; reliability—85%; manipulation check—95%; nature of the instrument—82%; instrument validation section—92%; and use of second-generation statistical technique—95%.
Cohen's (1960) kappa coefficient was also calculated. This coefficient is known to be a more stringent measure than simple percentages of agreement. For all criteria, the average kappa was 0.76, which is above the recommended 0.70 inter-rater minimum reliability (Bowers & Courtright, 1984; Landis & Koch, 1977; Miles & Huberman, 1994). As usual, disagreements between coders were reconciled before further analysis was performed.
Overview of Findings
Table 1 clearly shows that, over the past 13 years, instrument validation has improved in all the categories we have assessed. In addition, in two specific categories (pretest/pilot and reliability), the proportion of published studies validating their instruments is now greater than the proportion of published studies not validating their instruments. When comparing with Boudreau et al. (2001), the most important improvement is for construct validity, which was then assessed in 37% of the studies compared to 45% of the studies today. Improvement up to an additional 5% is noticeable in all other categories. Overall, although improvement in instrument validation is modest when comparing Boudreau et al.'s results to the current study's results, it is still comforting to observe a consistent increase in the use of all validation techniques.
|
|
|
|
---|---|---|---|
Pretest | 13% | 26% | 31% |
Pilot | 6% | 31% | 31% |
Pretest or Pilot [i] | 19% | 47% | 51% |
Previous Instr. Utilized | 17% | 42% | 43% |
Content Validity | 4% | 23% | 26% |
Construct Validity | 14% | 37% | 45% |
Reliability | 17% | 63% | 68% |
|
Reliability is the validation criterion that was the most frequently assessed, when compared to all other validation criteria taken singly, in both previous studies as well as this one. As was the case in Boudreau et al. (2001), a majority of studies assessing reliability of their instruments have done so through the standard coefficient of internal consistency, i.e., Cronbach's α (84%). The second most popular technique to assess reliability was inter-coder tests, which was reported by 15% of the studies that appraised the reliability of their instrument. Moreover, the use of more than one reliability method is still rare, as it was done by only 10% of the studies assessing reliability.
A closer look at the studies that assessed construct validity reveals that diverse approaches were used for that purpose. More specifically, convergent, discriminant, and nomological validity were determined, respectively, in 50%, 58%, and 6% of these studies. As to predictive and concurrent validity, they were reported in 7% and 1.5% of these studies. Construct validity, in itself (and not in one of its five components), was recounted in 80% of the studies that assessed this kind of validity.
Table 1 shows that the utilization of previously existing instruments has more than doubled over the last 13 years. Also, as detailed in Table 2, it appears that studies using existing instruments were sometimes more inclined to validate their instrument than studies developing their own instrument from scratch. Indeed, construct validity and reliability were more frequently assessed in studies using a previously utilized instrument than those that did not (50% vs. 42%; 74% vs. 63%). However, with regard to the use of pretest or pilot studies and content validity, these validities were assessed more often within studies creating a new instrument than within studies using an existing instrument (55% vs. 46%; 28% vs. 24%). This table reveals another interesting fact: Over the past two years, research articles that created their own instrument improved their validation practices to a greater extent than research articles that used a previously utilized instrument.
� |
|
| ||
---|---|---|---|---|
|
|
|
|
|
Pretest or Pilot | 43% | 46% | 50% | 55% |
Content Validity | 20% | 24% | 25% | 28% |
Construct Validity | 44% | 50% | 32% | 42% |
Reliability | 74% | 74% | 54% | 63% |
It is interesting to observe how confirmatory studies (133 articles, or 45% of total) compare to exploratory studies (160 articles, or 55% of total). The present survey indicates that, for all criteria except for the use of pretest or pilot studies, exploratory studies showed less interest in validating their instruments than confirmatory studies (see Table 3). Indeed, the extent to which content validity, construct validity, and reliability were assessed was more frequent among confirmatory studies than among exploratory studies. This represents the same trend that was observed in Boudreau et al. (2001).
� |
|
| ||
---|---|---|---|---|
|
|
|
|
|
Pretest or Pilot | 47% | 49% | 47% | 53% |
Content Validity | 35% | 35% | 17% | 19% |
Construct Validity | 53% | 61% | 29% | 33% |
Reliability | 69% | 75% | 60% | 62% |
The extent to which a research method has bearing on instrument validation constitutes an interesting observation. In Straub's (1989) original study, it was argued that experimental and case researchers were less likely to validate their instruments than field study researchers. Boudreau et al.'s (2001) study showed a similar trend when comparing field studies to experimental studies, but not to case studies. The additional data used in the present study demonstrates that Straub's initial inference still holds true today on all of the previously introduced validity criteria (see Table 4). Indeed, field study researchers from our sample were more inclined to validate their instrument than experimental and case researchers. The most notable difference was for the use of construct validity, where a gap of 31% existed between experimental and field study research.
|
|
|
|
---|---|---|---|
Pretest or Pilot | 59% | 36% | 31% |
Previous Inst. Utilized | 47% | 38% | 23% |
Content Validity | 32% | 15% | 15% |
Construct Validity | 55% | 24% | 38% |
Reliability | 69% | 65% | 62% |
The inclusion of an Instrument Validation section, as originally suggested in Straub (1989), was tallied as frequently in the current study as it was in Boudreau et al.'s (2001) study. Indeed, only 24% of the surveyed articles included such a section. For this minority of articles, there was a greater extent of reporting a pilot or pretest study (80% vs. 42%), content validity (52% vs. 18%), construct validity (82% vs. 34%), and reliability (88% vs. 61%). These percentages are hardly surprising since if one feels compelled to include a specific section on instrument validation, it is because efforts have been done in this area. However, it is disappointing not to observe an increase in the percentage of studies that included a special section reporting their endeavor in instrument validation.
Noticeable improvement has occurred in the use of manipulation check in the past few years. As indicated in Table 5, among the field and laboratory experiments in our sample, 30% performed one or several manipulation checks of the treatments, compared to 22% in Boudreau's et al.'s (2001) study. Moreover, percentages have particularly increased in two journals, that is, MIS Quarterly (increase of 21%) and Information Systems Research (increase of 12%). The absence of manipulation checks in the experimental studies of Management Science may be due to the tendency for articles of this journal to use directly observable measurements, such as time, rather then latent constructs.
|
|
|
---|---|---|
| 24% | 25% |
| 38% | 50% |
| 29% | 50% |
| 17% | 19% |
| 0% | 0% |
All Five Journals | 22% | 30% |
A greater percentage of studies from our sample used second-generation statistical techniques (e.g., structural equation modeling) rather than first-generation statistical techniques (regression, ANOVA, LOGIT, etc.). From 15% in Boudreau et al. (2001), this percentage increased to 19% in the present study (see Table 6). However, the extent of instrument validation did not change much when comparing first-to second-generation techniques in the two studies. As was the case in Boudreau et al., studies making use of SEM techniques scored higher in all categories, particularly for construct validity and reliability. Among the studies using second-generation statistical techniques, the most commonly used tools were PLS (42%), LISREL (21%), and EQS (18%).
� |
|
| ||
---|---|---|---|---|
|
|
|
|
|
Pretest or Pilot | 44% | 64% | 48% | 63% |
Previous Inst. Utilized | 42% | 46% | 43% | 46% |
Content Validity | 19% | 43% | 23% | 39% |
Construct Validity | 29% | 82% | 36% | 86% |
Reliability | 57% | 96% | 61% | 93% |
A possible reason for this difference is that SEM analyzes both the structural model (the assumed causation) and the measurement model (the loadings of observed items). As a result, validity assessment is an integral part of SEM. The validity statistics appear explicitly in the output, and the degree of statistical validity directly affects the overall model fit indexes. In first-generation statistical techniques, on the other hand, validity and reliability are performed in separate analyses that are not related to the actual hypothesis testing and, thus, do not determine the overall fit indexes.
Summary of Key Points
It should be considered good news that, in the short period of two years since the last study assessing instrument validation practices, IS researchers have improved the validation of their instrument. Granted, such an improvement is certainly not as significant as what had been observed when using Straub's (1989) study as the baseline, but this is understandable given that the time period was then much longer. Although better, the current validation practices are far from perfect, and it is still necessary to state that IS researchers need to achieve greater rigor in the validation of their instruments and their research. More particularly, the following nine key findings should engage further reflection and action:
Over the past two years, instrument validation practices have steadily improved.
In two specific categories (pretest/pilot and reliability), the proportion of published studies validating their instruments is now greater than the proportion of published studies not validating their instruments.
The assessment of construct validity has improved the most over the past two years.
Published studies are increasingly using preexisting instruments; while doing so, reliability and construct validity are being more frequently assessed.
Confirmatory studies are more likely to assess reliability, content validity, and construct validity than exploratory studies.
Laboratory experiments, field experiments, and case studies lag behind field studies with respect to all validation criteria.
Although the inclusion of an Instrument Validation subsection warrants greater reporting of validation practices, it appears infrequently in empirical studies.
There has been a noticeable improvement in the use of manipulation check in the past few years; however, in some publications outlets, manipulation check is only done by a minority of IS experimenters.
Published studies making use of second-generation statistical techniques (SEM) are much more likely to validate their instruments than published studies making use of first-generation statistical techniques.
No comments:
Post a Comment