The Story Behind Gannett's AI Debacle

Download
A Nike Air Vapor football is seen on the field before an NCAA football game on Saturday, Sept. 16, 2023, in Morgantown, W. Va.
( Gregory Payan / AP Photo )

Micah Loewinger: This is On The Media. I'm Micah Loewinger. This week, two top executives were fired from Sports Illustrated's publisher. The news comes a little over a week after the tech publication Futurism noticed that something was off with certain author profiles on the Sports Illustrated site.

Male Speaker 8: Authors like Drew Ortiz, who, according to a since-deleted bio, grew up in a farmhouse, or there's Sora Tanaka, who loves to try different foods and drinks. The problem is both their photos were reportedly found on a website that sells AI-generated headshots.

Micah Loewinger: In a recent company-wide call, the majority stakeholder reportedly told employees to, "Stop doing dumb stuff." Sports Illustrated has said the dismissals this week were unrelated to the AI scandal. The outlet is one of several media companies that have come under scrutiny for their alleged or stated use of artificial intelligence. In August, the country's largest newspaper company, Gannett, rolled out a new non-generative AI service that would provide automated high school sports coverage in a number of states but readers quickly discovered that bizarre phrases like close encounters of the athletic kind had shown up in hundreds of local news stories.

Jay Allred: Our client had a PR problem on their hands.

Micah Loewinger: Jay Allred is the CEO of Source Media Projects, which includes Richland Source, a local news organization in Ohio. He's also the co-founder of Lede AI, the company that built the technology that Gannett was using to automate some of its coverage.

Jay Allred: Gannett put an indefinite pause on the project of reporting high school sports results using AI with us.

Micah Loewinger: In September, Jay agreed to speak to me about what happened, his first extensive interview since his deal with Gannet blew up. He told me that his team began building and using Lede AI in his own newsroom at Richland Source a few years ago after they learned that they could draw on high school sports results from a service called ScoreStream, which collects game results often recorded by fans.

Jay Allred: If we're looking at a football game, we're trying to figure out, was it a close game? Was it an overtime? Was it a blowout? Was it a come-from-behind win in the fourth quarter? We've grouped those different outcomes into scenarios, and then we're going to pull from a library of pre-written templates, plug those variables into those pre-written templates for the customer.

Micah Loewinger: The goal was that you could basically be offering, let's say, fairly rudimentary coverage of high school sports all across Ohio, or wherever, that your writers and editors wouldn't necessarily have to be solely on the hook for producing, and then they could go and do more meaningful coverage.

Jay Allred: We're a small newsroom. There's only 10 of us, and there's only one full-time sports reporter. There's well over 20 high schools in our region. What this lets us do is be able to provide coverage to communities that we wouldn't have been able to be at that game at all. Our sports reporter covers the A game or the number one game, we'll cover the B and the C game with our two other reporters, and then Lede AI will be in to write the briefs for us for those other three games. From that standpoint, our editor that's on the desk that night can call coaches, flesh out that Lede AI story, combining the technology that Lede AI provides with the actual journalism our newsroom provides.

Micah Loewinger: How do you communicate to readers that what they're reading was not written by a human?

Jay Allred: Every single story that publishes on Richland Source has an author, and that author is called Otto Newsdesk. If you click on Otto Newsdesk, it identifies itself as an AI tool right out of the gate. At the bottom of the article, we are disclosing that it's an AI tool that we're using. We're actually linking to Lede AI's website, we have a feedback form that publishes with every piece of content that we publish.

Micah Loewinger: How do people react?

Jay Allred: In general, the readers understand it's information, it's not journalism. Of course, a lot of times readers want the content to be longer and then to include player names and photos, and video.

Micah Loewinger: They want it to be a reported article. [laughs]

Jay Allred: Exactly.

Micah Loewinger: How exactly do you attempt to make Lede AI produce a human-sounding article? I mean, I know that with some of these large language models, they require lots of data, and this has led to a lot of controversy around AI start-ups scraping enormous parts of the web, including books, entire news outlets, entire forums like Reddit. Explain to me how you feed language and templates to Lede AI.

Jay Allred: Every single word, every comma, every semicolon in our database has been written by a person and then it's been checked by another person and checked by a person after that. It's what allows us to be confident in all cases that if we're using our standard data set that the content that we're producing is accurate as long as the data is accurate and it's very accurate.

Micah Loewinger: Okay, that's interesting because in late August people on social media began posting some of the really awkward phrases that Lede AI has put into some local news sources, the one that caught a lot of attention on Twitter for instance was a piece in the Columbus Dispatch and some other Gannett owned papers. Readers finding examples of Lede AI using phrases like, quote-unquote, ''Close encounters of the athletic kind.'' There were a lot of articles referring to high school sports action, or how one team, ''Took victory away from another team.'' These are phrases that most human journalists would consider ranging from awkward to poor writing, so how did that happen?

Jay Allred: I knew you were going to go there and I'm glad you did. In mid-August, our technology powered a really big launch with Gannett across, I think, six or seven major markets in the US. We had written some custom code for that particular customer, and the code had bugs in it, Micah. Some of those things that showed up in those Gannett articles, especially the errors were the result of a small company working really, really hard to get ready for a launch with a very big company. As far as the awkwardness of the phrasing and the now infamous close encounters of the athletic kind. A human being wrote that, Micah.

[laughter]

Jay Allred: A person wrote that, and we got called out on a few phrases and they are no longer in our database. It was as simple as taking them out.

Micah Loewinger: I was curious to know if this was a feature or a bug. I actually just searched some of these phrases on the Richland Source and I counted over 140 articles on the Richland Source from this year that featured the phrase, close encounters of the athletic kind, or similar phrases, including 50 articles from this year that featured the phrase close encounter of the winning kind in the headline, so I don't really buy that it was just a fluke that happened with launching a new service with Gannett like you have been publishing these sentences for years. [laughs]

Jay Allred: No, and I appreciate you calling that out because that phrase has been in our code for years. The things that were unique to the Gannett launch were some other great, he says sarcastically, some other unfortunate stuff. For example, there were a couple of leads that published in some of the papers where we had plugged in a variable where there should have been a mascot name. There were instances where we published two very similar lead paragraphs, they said exactly the same thing in terms of factual information, but they said it slightly differently. Those were bugs that were built into that custom code, but those awkward phrases that the internet called out, that's been there for years.

Micah Loewinger: I guess this is what sends a shiver down the spine of media critics and journalists and editors because we're talking about high school sports, this is not the highest stakes beat in all of journalism, but it seems like it does speak to the risk of automation where one small mistake when automated becomes 150 small mistakes all across the country.

Jay Allred: Yes, absolutely. What if this had been crime reporting? What if these had been arrest reports? Real harm could have been done. As leaders in the industry, I think it should give us all pause, it's why I'm having this conversation with you.

Micah Loewinger: I appreciate that. I appreciate, your vulnerability and your openness to introspection. Are you at all concerned that local newsrooms would see the promise of Lede AI maybe think that it's capable of doing more than it is and kill entry-level jobs?

Jay Allred: I think about that every single day. In three years of talking to news leaders around the country, I've never once heard one of them say I'm super excited for AI because I get to reduce my head count.

Micah Loewinger: Well, no one says those things, they say we would like to be more efficient.

Jay Allred: I agree with you and with all of those things said I still lose sleep over it at night.

Micah Loewinger: What do you lose sleep over?

Jay Allred: Is the intention to use your euphemism to find efficiency and to do that through less people, or is the intention to create more value for consumers so that we can get the nose of this airplane pointed up and we can start to create a future where local news entrepreneurs can think of local news as a good small business. I think that there's lessons to be learned here and we can grow as an industry and get better because the reality is this stuff is, it's not coming, it's here.

Micah Loewinger: That's what I've heard in some of your answers is still this implicit belief that the rushing river of technology is coming no matter what. I wonder if this is a moment in time to say like, there might be some uses for AI, but we don't just have to see it to its logical conclusion just because technology is great, bro.

Jay Allred: Yes, I agree with you. I think we should use techs like Lede AI to report unreported stories that would never go reported otherwise. We should interrogate that technology vigorously and make sure that it can be trusted and be accurate and I know that there are ways to do that.

Micah Loewinger: You've been forthcoming about the mistakes your team made and the limitations of the technology. Do you feel that any of the backlash to AI within the media has been unfair? Like, has any of it, you think, missed the mark?

Jay Allred: I think that our industry has a tendency to respond to stuff like AI from a very defensive position. It's super understandable, our industry has done nothing but cut newsrooms to the bone for going on two decades now. I wish we could get into spaces where we understood that we were more all in this together and that we are trying to figure it out. I think we as an industry need to be able to hold multiple things to be true at the same time, which is malevolent deployment of AI inside of our industry is going to hurt our industry. Intentional thoughtful deployment for the benefit of readers and communities and reporters can benefit our industry. Both things might happen. I hope it's the second that's going to be the work I continue to do.

Micah Loewinger: Jay, thank you very much.

Jay Allred: Thank you, Micah. I was glad to be invited onto your program.

Micah Loewinger: Jay Allred is the CEO of Source Media Properties.

[music]

Micah Loewinger: That's it for this week's show. On the Media is produced by Eloise Blondiau, Molly Rosen, Rebecca Clark-Callender, and Candice Wang, with help from Shaan Merchant. Our technical director is Jennifer Munson, our engineer this week was Brendan Dalton. Katya Rogers is our executive producer, On the Media is a production of WNYC Studios. Brooke will be back next week, I'm Micah Loewinger.

[music]

 

Copyright © 2023 New York Public Radio. All rights reserved. Visit our website terms of use at www.wnyc.org for further information.

New York Public Radio transcripts are created on a rush deadline, often by contractors. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of New York Public Radio’s programming is the audio record.

Hosted by Micah Loewinger
WNYC Studios