Artificial Intelligence as jurassic technology

Unpacking some thoughts from a recent IMLS convening on AI in museums has me thinking about issues of trust and authority. So naturally, I’m thinking about one of my favorite museum experiences, the Museum of Jurassic Technology. It’s really the only museum I know of that should come with a spoiler alert. If you’ve not had the pleasure of a visit, this article gives a nice sense of the experience, while Lawrence Weschler’s Mr. Wilson’s Cabinet of Wonder, about the Museum and its MacArthur Genius Award-winning founder, is well worth the deeper dive.

But to boil it down, the MoJT takes on the “form” of a museum – objects in cases with wall labels, organized into themed exhibitions–while playing fast and loose with the underlying facts that typically support that form. Some of the objects are “real,” and supported with factual interpretive text, but some of the objects are made up completely, and some of the interpretive text about even the “real” objects on display is pure fiction. But all of it is played completely deadpan–nothing about the presentation gives away the game. All of the material, and the way it is presented, feels completely plausible, even if a lot of it is not necessarily accurate or true.

MoJT makes literal the negotiated nature of the relationship between a museum and its visitors. A visitor has to essentially submit to the museum’s authority on the subject matter being presented, whether that authority be an assertion of the importance of certain artists, or even just the names of key physicists that the average visitor might not know. But MoJT makes every effort to sound authoritative without actually asserting authority. Which is fine at MoJT, because that is the experience. It is an experiment in exploring exactly this relationship.

Plausible isn’t the same as true

But what if you walked into a “normal” museum, and found yourself asking the same questions? If every tenth wall label you read has an obvious factual error in it? How quickly would a museum’s authority evaporate?

Participants at the IMLS convening (representing some of the best minds in museum technology) were focused on how AI could be used in an augmentative context, we all feared that the hype surrounding this technology would inevitably prompt its use in an automated context. Meaning, that AI (or at least Generative AI as we currently think of it) would be used to replace key functions in the museum stack rather than supplement them.

The appeal to leadership of museums here is undoubtedly tempting. For a cash-strapped institution, the labor-saving promises of AI (you can run your whole museum with 25 people instead of 50!) will be irresistible. Using AI to replace jobs will no doubt look to some museum boards like financial prudence rather than malfeasance.

But for an institution to truly automate away any of its critical functions, the thing doing the automating must behave predictably 100% of the time. This means that with the same set of inputs, the system will reliably, and repeatedly, produce the same outputs. LLMs, and the transformer applications based on them, do not operate this way. Instead, they will get 95% of things absolutely correct, but the last 5% will be incorrect in unpredictable ways. Glyph Lefkowitz articulates this issue well:

It gets most things right, but it consistently makes mistakes in the places that you are least likely to notice. In places where a person wouldn’t make a mistake. Your brain keeps trying to develop a theory of mind to predict its behavior but there’s no mind there, so it always behaves infuriatingly randomly.

(Glyph Lefkowitz, I Think I’m Done Thinking About genAI for Now)

Reducing cost, introducing risk

And so a museum that seeks to reduce labor costs by, say, generating wall labels, is introducing a lot of risk to a museum’s authority in that deal. So let’s say that one out of every 20 labels produced contains a clear factual error, which would be obvious to anyone reading the label (like asserting that Abraham Lincoln once hosted The Beatles at the White House), but which would not be easily detectable to an underpaid assistant/accountability sink tasked with reviewing hundreds of automatically-generated wall labels.

You now have a situation in which roughly 5% of the labels in a museum contain obvious factual errors, which raises a question: how many errors would a visitor need to find before that visitor would start questioning every factual statement? Once that occurs, the trust the visitor places in the museum, and the authority the museum has asserted, collapse. This is the experience a visitor has at MoJT–that is sort of the point, after all–but potentially catastrophic for just about any other museum that asserts its authority on a foundation of rigorous and factual analysis.

I’ve always been somewhat skeptical of the “museums are the most trusted institutions in the universe” trope, but here is a situation in which a loss of authority in a very small situation has the potential to lead to a real collapse in the purpose of the institution.

What we don’t yet know

What I’m aiming at here is not to enumerate the reasons why museums shouldn’t use AI, or why they should. What’s important here is that the dynamics of how this technology will play out, when implemented in an environment as complex as a museum, are unpredictable, and that technology deployed to solve a problem on the balance sheet could potentially create a bigger existential issue from which it might be impossible to recover.

That said, museums are a fascinating and somewhat controlled environment for exploring these questions. If I were to think of a museum visit as a usability test, I would structure an experiment in which a visitor is ushered into a gallery in which one label is created entirely with AI, no human in the loop. Then a gallery with two such labels. Then three. At what point does the visitor’s relationship to the museum break down completely? How much AI you can introduce to the museum experience before you torch the relationship forever?

And probably “the visitor” is not a constant. Maybe the average visitor to a science museum is more tolerant of AI “tone,” but less tolerant of factual errors than the average visitor to an art museum. Maybe AI tolerance is different when broken down by John Falk’s visitor types: perhaps “explorers” are less tolerant of AI than “rechargers.”

These are things we don’t yet know, but that are certainly worthy of study, particularly if an enlightened funder like IMLS or Mozilla (or hey, maybe even the Anthropic Institute!) would be willing to support this kind of knowledge-gathering. Because there are real stakes here. A museum that deploys AI to solve a budget problem and inadvertently becomes the Museum of Jurassic Technology (without the winking self-awareness) has not saved itself. It has quietly hollowed out the thing that makes it worth saving.