References

Anderson, Daniel, and Christopher M. Loan. 2022. Exirt: Analyze Data from the Oregon Extended Assessment. https://github.com/datalorax/exirt.
Association, American Educational Research et al. 2018. Standards for Educational and Psychological Testing. American Educational Research Association.
Campbell, Donald T, and Donald W Fiske. 1959. “Convergent and Discriminant Validation by the Multitrait-Multimethod Matrix.” Psychological Bulletin 56 (2): 81.
Carmichael, Sheila Byrd, Gabrielle Martino, Kathleen Porter-Magee, and W Stephen Wilson. 2010. “The State of State Standards–and the Common Core–in 2010.” Thomas B. Fordham Institute.
Cizek, Gregory J. 2012. Setting Performance Standards: Foundations, Methods, and Innovations. Routledge.
Hallgren, Kevin A. 2012. “Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial.” Tutorials in Quantitative Methods for Psychology 8 (1): 23.
Hambleton, Ronald K, and Mary J Pitoniak. 2006. “Setting Performance Standards.” Educational Measurement 4 (1): 433–70.
Holland, PW, and DT Thayer. 1988. “Differential Item Performances and the Mantel-Haenszel Procedure.” Test Validity, 129–45.
Kamata, Akihito, and Brandon K Vaughn. 2004. “An Introduction to Differential Item Functioning Analysis.” Learning Disabilities: A Contemporary Journal 2 (2): 49–69.
Lathrop, Quinn N. 2015. cacIRT: Classification Accuracy and Consistency Under Item Response Theory. https://CRAN.R-project.org/package=cacIRT.
Messick, Samuel. 1989. “Meaning and Values in Test Validation: The Science and Ethics of Assessment.” Educational Researcher 18 (2): 5–11.
Robitzsch, Alexander, Thomas Kiefer, and Margaret Wu. 2022. TAM: Test Analysis Modules. https://CRAN.R-project.org/package=TAM.
Rudner, Lawrence M. 2005. “Expected Classification Accuracy.” Practical Assessment, Research, and Evaluation 10 (1): 13.
Scott, Neil W, Peter M Fayers, Neil K Aaronson, Andrew Bottomley, Alexander de Graeff, Mogens Groenvold, Chad Gundy, et al. 2009. “A Simulation Study Provided Sample Size Guidance for Differential Item Functioning (DIF) Studies Using Short Scales.” Journal of Clinical Epidemiology 62 (3): 288–95.
Tindal, Gerald, Marilee McDonald, Marick Tedesco, Aaron Glasgow, Pat Almond, Lindy Crawford, and Keith Hollenbeck. 2003. “Alternate Assessments in Reading and Math: Development and Validation for Students with Significant Disabilities.” Exceptional Children 69 (4): 481–94.
Webb, Norman L. 2002. “Depth-of-Knowledge Levels for Four Content Areas.” Language Arts 28 (March).
Yen, Wendy M, Anne R Fitzpatrick, and RL Brennan. 2006. “Educational Measurement.” Westport, CT: Praeger Publishers.