Copyright (c) 2013 John L. Jerz

Reliability and Validity Assessment (Carmines, Zeller, 1979)
Home
A Proposed Heuristic for a Computer Chess Program (John L. Jerz)
Problem Solving and the Gathering of Diagnostic Information (John L. Jerz)
A Concept of Strategy (John L. Jerz)
Books/Articles I am Reading
Quotes from References of Interest
Satire/ Play
Viva La Vida
Quotes on Thinking
Quotes on Planning
Quotes on Strategy
Quotes Concerning Problem Solving
Computer Chess
Chess Analysis
Early Computers/ New Computers
Problem Solving/ Creativity
Game Theory
Favorite Links
About Me
Additional Notes
The Case for Using Probabilistic Knowledge in a Computer Chess Program (John L. Jerz)
Resilience in Man and Machine

Quantitative Applications in the Social Sciences

CarminesZeller.jpg

This guide demonstrates how social scientists assess the reliability and validity of empirical measurements. This monograph is a good starting point for those who want to familiarize themselves with the current debates over "appropriate" measurement designs and strategies.
 
16 of 16 people found the following review helpful:
5.0 out of 5 stars A must read for scale development, May 14, 2002
By Vong Tze Ngai (Macau)
 
This is a must read for those laymen who would like to get started with developing their own measurement scales. Where many references in scale development drop a chapter in scale reliability and validity testing, they are far from comprehensive. Here is a monograph that discusses this critical issue in detail. It is written in a easily understood manner, with good balance between theories and applications. The use of factor analytic techniques (exploratory) for testing scale validity though assumes readers a prerequisite understand of this multivariate technique.

p.5,7 Reliability and Validity Assessment by Edward G. Carmines and Richard A. Zeller presents an elementary and exceptionally lucid introduction to issues in measurement theory... RELIABILITY AND VALIDITY ASSESSMENT is merely the first step toward understanding the complex issues of measurement in theoretical and applied research settings... The Carmines and Zeller paper provides an excellent basis for understanding some of the more complex issues in measurement theory. - John L. Sullivan, Series Editor
 
p.10 measurement is most usefully viewed as the "process of linking abstract concepts to empirical indicants" (Zeller and Carmines, forthcoming) [JLJ - from Zeller, Carmines, Measurement in the Social Sciences: the link between theory and data, 1980, p.2], as a process involving an "explicit, organized plan for classifying (and often quantifying) the particular sense data at hand - the indicants - in terms of the general concept in the researcher's mind" (Riley, 1963: 23).
 
p.10 From an empirical standpoint, the focus is on the observable response... Theoretically, interest lies in the underlying unobservable (and directly unmeasurable) concept that is represented by the response... Measurement focuses on the crucial relationship between the empirically grounded indicator(s) - that is, the observable response - and the underlying unobservable concept(s)
 
p.12 But an indicator must be more than reliable if it is to provide an accurate representation of some abstract concept. It must also be valid... An indicator of some abstract concept is valid to the extent that it measures what it purports to measure... Indeed, strictly speaking, one does not assess the validity of an indicator but rather the use to which it is being put.
 
p.17 "One validates, not a test, but an interpretation of data arising from a specified procedure" (Cronbach, 1971: 447)... one validates not the measuring instrument itself but the measuring instrument in relation to the purpose for which it is being used.
 
p.17 Nunnally has given a useful definition of criterion-related validity. Criterion-related validity, he notes, "is at issue when the purpose is to use an instrument to estimate some important form of behavior that is external to the measuring instrument itself, the latter being referred to as the criterion" (1978: 87). For example, one "validates" a written driver's test by showing that it accurately predicts how well some group of persons can operate an automobile... The operational indicator of the degree of correspondence between the test and the criterion is usually estimated by the size of their correlation.
 
p.17-18 for some well-defined group of subjects, one correlates performance on the test with performance on the criterion variable... Obviously the test will not be useful unless it correlates significantly with the criterion; and similarly, the higher the correlation, the more valid is this test for this particular criterion.
 
p.18 "if it were found that accuracy in horseshoe pitching correlated highly with success in college, horseshoe pitching would be a valid measure for predicting success in college" (Nunnally, 1978: 88)." [JLJ - Nunnally implies that whatever metric we find that correlates highly with a promising position, such as piece mobility, has potential for use in the evaluation function of a computer chess program. If our metric correlates highly we can be more severe in our pruning.]
 
p.18 concurrent validity is assessed by correlating a measure and the criterion at the same point in time... Predictive validity, on the other hand, concerns a future criterion which is correlated with the relevant measure. Tests used for selection purposes in different occupations are, by nature, concerned with predictive validity. Thus, a test used to screen applicants for police work could be validated by correlating their test scores with future performance in fulfilling the duties and responsibilities associated with police work.
 
p.19 As we have seen, the logic underlying criterion validity is quite simple and straightforward. It has been used mainly in psychology and education for analyzing the validity of certain types of tests and selection procedures. It should be used in any situation or area of scientific inquiry in which it makes sense to correlate scores obtained on a given test with performance on a particular criterion or set of relevant criteria.
 
p.20 Fundamentally, content validity depends on the extent to which an empirical measurement reflects a specific domain of content... obtaining a content-valid measure of any phenomenon involves a number of interrelated steps. First, the researcher must be able to specify the full domain of content that is relevant to the particular measurement situation... Second, one must sample... from this collection since it would be impractical to include all [domain content] in a single test.
 
p.22-23 As Cronbach and Meehl observe, "Construct validity must be investigated whenever no criterion or universe of content is accepted as entirely adequate to define the quality to be measured" (1955: 282). Construct validity is woven into the theoretical fabric of the social sciences, and is thus central to the measurement of abstract theoretical concepts.
 
p.23 Construct validation involves three distinct steps. First, the theoretical relationship between the concepts themselves must be specified. Second, the empirical relationship between the measures of the concepts must be examined. Finally, the empirical evidence must be interpreted in terms of how it clarifies the construct validity of the particular measure.
  It should be clear that the process of construct validation is, by necessity, theory-laden. Indeed, strictly speaking, it is impossible to "validate" a measure of a concept in this sense unless there exists a theoretical network that surrounds the concept. For without this network, it is impossible to generate theoretical predictions which, in turn, lead directly to empirical tests involving measures of the concept.
 
p.27 The social scientist can assess the construct validity of an empirical measurement if the measure can be placed in theoretical context. Thus, construct validation focuses on the extent to which a measure performs in accordance with theoretical expectations. Specifically, if the performance of the measure is consistent with theoretically derived expectations, then it is concluded that the measure is construct valid.

Enter supporting content here