Editor’s note: This essay is an entry in Fordham’s 2023 Wonkathon, which asked contributors to answer this question: “How can we harness the power but mitigate the risks of artificial intelligence in our schools?” Learn more.
How can we harness the power but mitigate the risks of AI in our schools? To paraphrase mega-wonk James Carville, “it’s about the data!”
Although AI has a sixty year history in education, today’s meaning of AI is heavily influenced by just one year of experiences with ChatGPT. This essay, however, is not about harnessing the power and mitigating the risks of ChatGPT or a concept of generative AI that is conceived narrowly in the image of ChatGPT. That would be shortsighted.
Instead, a wise policy should be grounded in sound, long-lasting foundations. The foundation of today’s AI is data. AI works by finding patterns in data, abstracting these to a model, and then applying the model to produce novel information that reproduces characteristics of the original data. Thus, the core problem in either harnessing the power or mitigating the risks of using today’s generative AI in education is the data.
(ChatGPT was not built using educational data nor information about educational processes. Although it can produce outputs similar to those produced in educational settings, it cannot predict the processes that enable humans to learn. This is because the bots that crawl the web to assemble the data for building today’s large language models don’t have access to learning process data.)
Sufficient data is also not available to today’s researchers, developers, or innovators who are experts in applying AI to education. Indeed, those who are best positioned to build education-specific “foundational models” complain of the difficulty of obtaining large, relevant data sets. Consequently, the path to powerful, low-risk AI in education must start with the problem of safely providing access to enough data to abstract out models specific to educational problems. It has to be data about the processes of teaching and learning.
When talking about big data in education, it is necessary to acknowledge the fallout of the well-publicized failure of InBloom, which was an attempt to centralize ed-tech data. The public will not give carte blanche for the centralization of private student data. But massive centralization is not necessary.
Instead, an AI model could be built using a privacy-protecting data sharing mechanism, such as a secure enclave. In an enclave, the data remains local. An API allows local servers to run a computation and return statistical, aggregate information about the data—importantly, not disclosing the data itself. Although untested, it would be possible for an AI-model-builder thereby to develop a model without ever having access to complete data, but instead only asking for certain statistical properties of local data to be reported back. The first component of my policy proposal is: (1) fund research into building AI models via privacy protecting mechanisms so that local communities could participate in building relevant models without disclosing all their data.
When AI starts by building extremely general models and then attempting to apply them to specific educational situations, risks abound. Thus, a second aspect of my proposal suggests that our efforts towards powerful, safe AI should begin with well-bounded problems. One that seems well suited to today’s AI is determining how to provide optimal supports for learners with disabilities to progress in mathematics problem solving. Although I believe parents are not willing to share their students’ data in general, I can imagine a collective of parents becoming highly motivated to share data if it might help their specific neurodiverse student thrive in mathematics. Further, only limited personal data might be needed to make progress on such a problem. Thus a second element of my proposal is (2) energize nonprofits that work with parents on specific issues to determine how to achieve buy-in to bounded, purpose-specific data sharing. This could involve a planning grant stage, which if successful, would result in money needed to establish a local privacy-protected method of sharing data.
The third part of my proposal would be to fund teams to build new education-specific models around targeted problems and privacy-protected data. There are already sufficient numbers of talented AI researchers who would step up to the challenge of crafting education-specific AI models given an important societal problem and access to enough relevant data. However, there would initially need to be support to (a) negotiate mutually agreeable limitations and benefits with parent and school groups who control the data and (b) determine ethical policies so that the benefits of the resulting AI models are available to those who participated in the research and so that they experience additional favorable returns should the AI models prove useful outside their originating collective. I believe existing federal grants (at multiple agencies) could support this. Thus, the third part would use existing grant funding mechanisms to build powerful and low-risk AI models via researcher-parent partnerships that focus on targeted uses of AI in education.
In summary, I argue against trying to domesticate the wild capabilities of today’s general-purpose generative AI tools. Instead, I argue for an alternative regime based around three commitments:
- Enabling local owners of student and teacher data to share statistical properties of their data, without sharing the data itself, via emerging privacy-protecting technologies.
- Developing buy-in for creating new education-specific AI models by forming collectives of parents who are motivated to share data to solve targeted problems.
- Funding partnerships, for example, of talented AI researchers with parent groups, in order to use data to build models to solve targeted problems that the parents and researchers mutually care about.
Harnessing powerful AI while mitigating risks would thus be achieved by privacy-protecting mechanisms for sharing data; building buy-in for limited data sharing; and forming collectives among scientists and members of the public who agree to mutually engage in solving a targeted educational problem.