Announcing the GenAI Evidence Insights Hub

Contributing Authors: Melanie Kurimchak | John Whitmer
AI Disclosure: Claude Sonnet v4.5 was used in the initial drafting of this post, human editing and review conducted throughout.

Welcome to the GenAI Evidence Insights Hub.

In this blog, we will openly document the complex, imperfect, and often challenging work of building a robust evidence base for Generative AI (GenAI) in education assessment. We are doing this work in public because we believe that GenAI has tremendous promise for these applications. The flexibility of this new technology and speed at which innovations can be created calls for increased measurement – to make sure that learners & teachers have high-quality materials and to distinguish research-backed results from “AI slop.” There is a proliferation of research being conducted to help us to use methods that work and identify new areas of practice.

What This Blog Is

This blog is the behind the scenes record of the GenAI Evidence Hub for Educational Assessment, a philanthropically supported project focused on systematically analyzing more than 250 research studies and developing shared standards for evaluating the validity, reliability and fairness of AI in assessment contexts - specifically, automated scoring, formative feedback, item generation and multimodal applications.

Here, you will find:
  • Deep dives into methodological problems we are actively working through, including foundational questions like what actually counts as a model
  • Candid progress updates on what is working, what is not, and why
  • Results from AI-supported research experiments, paired with honest analysis of where AI adds value and where it does not
  • Open questions where we need informed critique and community input
  • Lessons learned from failures, course corrections, and false starts, not just successes

What This Blog Isn't

This is not:
  • A polished showcase of finished work
  • A marketing channel for AI hype
  • A space where we pretend the research process is clean or linear
  • Content hidden behind a paywall or restricted access

Why We're Doing This in Public

We are open science advocates with a deep commitment to advancing education R&D using new technologies. For us, we believe that the field advances faster when people show their work. Some of us have been in the field for a while, and have learned that behind every clear result and impactful finding there are usually many ideas that didn’t pan out as we had anticipated.

We are showing the messy parts on purpose.

Who This Is For

This work is primarily intended for people who need to evaluate or build evidence in fast moving AI driven domains, including:
  • Researchers developing evidence bases in areas where the technology changes faster than the literature
  • Developers building AI based assessment tools who need to understand what rigorous evidence actually looks like
  • Educators and institutional decision makers evaluating claims about AI effectiveness
  • Methodologists interested in AI assisted research synthesis and its real world limitations

How to Follow Along

Contributors

Core Team 

  • John Whitmer – Conceptualization, Supervision, Project Administration, Funding Acquisition, Outreach, Writing – Review & Editing, Dissemination and Community Engagement, Investigation, Data Curation, Reviewer Training
  • Alexis Andres – Methodology, Software, Data Curation, Formal Analysis, Visualization, Writing – Original Draft, Writing – Review & Editing, AI/LLM Integration, Study Design, Practitioner Guidance Development, Investigation, Reviewer Training
  • Melanie Kurimchak – Project Administration, Coordination, Resources, Outreach/Communication, AI Analysis & Experimentation, Visualization, Advisory Committee Management, Workflow Design, Writing – Original Draft, Writing – Review & Editing
  • Maggie Beiting-Parrish – Methodology, Investigation, Data Curation, Validation, Reviewer Training, Practitioner Guidance Development, Dissemination and Community Engagement
  • Aaron Wong – Investigation, Data Curation, Methodology, Validation, Reliability Testing, Pilot Studies, Reviewer Training

Supplemental Coders

  • Heeryung Choi – Investigation, Data Curation, Validation
  • Chris Steadman – Investigation, Data Curation, Validation
  • Alexander White – Investigation, Data Curation, Validation
  • Nidhi Nasiar – Investigation, Data Curation, Validation

Advisory Committee

  • Kristen DiCerbo, Khan Academy (Chair)
  • Brandon Olszewski, ISTE and ASCD
  • Susan Lottridge, Pearson
  • Amy Hendrickson, College Board and NCME
  • Michael Feldstein, 1EdTech
  • Nancy Otero, Bill and Melinda Gates Foundation, ex officio

Comments