The Future of Voice and the Implications for News

Executive Summary

Voice-activated speakers powered by intelligent assistants, such as Amazon Alexa and Google Assistant, are growing faster than the smartphone and tablets at a similar stage.1 But how are these devices and assistants being used and what is the potential for news? There are very few data available about the usage of news content on these platforms, or on publisher and platform strategies around news. This report aims to address these gaps by combining in-home research with focus groups, surveys, and interviews to provide a snapshot of current behaviours in the United States, United Kingdom, and Germany. We have also interviewed more than 20 publishers, and other experts to understand more about current perceptions and future potential.

This report is focused on a new set of devices – smart speakers – but the technological changes that lie behind them are much more profound, as intelligent assistants like Amazon Alexa, Google Assistant, Apple Siri, and Samsung Bixby will ultimately sit inside many other devices we rely on in our everyday life, from phones to cars and beyond.

Here are some of our key findings.

Penetration of voice-activated speakers is growing rapidly and is now reaching mainstream audiences, but currently most usage is at a basic level with much consumer frustration around more complex tasks.

  • More than one in ten US adults (14%) regularly use these devices equating to around 34m people and 17m homes. Usage in the UK (10%) and Germany (5%) is a little lower but has roughly doubled in the last year.
  • For heavy users, voice is now the first and final contact point with technology (often replacing the smartphone or radio in the bedroom). This suggests that voice could become a critical gateway to media going forward.
  • Most users report high levels of satisfaction with their smart speakers. Almost a third of owners (32% in the UK) have bought additional devices. Two-thirds (69%) say they will replace or upgrade their speaker when this becomes necessary. They find them convenient and fun, but usage is today largely confined to a small set of basic ‘command and control’ tasks such as accessing music, asking for the weather, or setting timers.
  • Wider use of functionality is limited by lack of awareness, poor voice recognition, and the difficulties of remembering more than a few simple commands. This is leading to consumer frustration and abandonment of complex tasks. The platform companies behind smart speakers and voice systems know they need to rapidly address these issues if early promise is to be realised.
  •  Smart speakers are most popular with those aged 35–44 and they have also proved a surprise hit with much older groups and the disabled due to the simplicity of operation.
  • Smart speakers are mostly replacing radios in the home, particularly in living rooms, kitchens, and bedrooms. Some regular users also say they spend less time with the television and with other screens. Consumers see voice as a chance to de-clutter. Within a few years, many expect voice to have largely replaced remote controls, simplifying access to a range of TV, radio, and other devices.

Despite the rapid growth and strong promotion of voice technology, news consumption on these devices is currently lower than might be expected, with most usage focusing on very short news briefings. Many users are unaware of the wider range of options around news, including how to access their favourite brand. Others are underwhelmed by existing content, which is mostly reversioned from radio or print.

  • Although around half of smart speaker users say they use the device for news, only around one in five (21% UK, 18% US) use the news briefing functionality daily.
  • Just one in a hundred (1%) say news is the most important function on the device compared with 61% who cite playing music, 6% answering general queries, and 4% getting weather updates (UK data).
  • Despite this, regular users of news updates say they like the brevity, the control, and the focus. Around half of those who use briefing functionality say that they feel better informed as a result (56% US, 45% UK). The majority of usage is in the mornings, where new habits are emerging, and last thing at night.
  • For those who are not using these devices for news, the main reason cited was the ease of accessing news on other devices (52% US, 51% UK). Only around one in ten (14% US, 10% UK) said they didn’t know how to access the news. This illustrates how critical the development of more device-specific content might be – along with better user interfaces.
  • Very few people bother to personalise or change their news settings on the Amazon or Google platforms. As a result, news providers that are suggested as part of platform defaults – BBC in the UK and ARD in Germany – currently have a significant advantage. In the UK, two-thirds of usage is for BBC News (64%), followed by Sky News at 19%. Usage of other brands is extremely limited. The combination of the importance of defaults and the limitations of voice as output suggest that this environment will be characterised by heavy winner-takes-all dynamics.
  • There is a problem of attribution. Around a quarter in the UK (23%) and nearly one in ten in the US (7%) could not remember the brand that produced their daily news update. Having said that, strong branding within the audio itself may explain figures that are higher than previous attribution studies in search and social media, where over half failed to recall the brand.2
  • Many users complain about the quality of news briefings, about how often they are updated, and about production quality. Some users complain briefings are too long and would prefer updates of no longer than a minute.
  • We also find that podcasts do not yet attract significant usage on smart speakers (15% UK, 22% US). Some consumers are unaware that podcasts are available or how to ask for them, while others prefer to consume them on the go (e.g. while commuting, exercising). Speakers are often in the wrong room, they say, or podcasts are too personal to share with family.

News publishers are pursuing a variety of strategies around voice, with broadcasters generally more proactive than newspapers. While some remain to be convinced about the need to invest heavily today, most believe that voice will significantly affect their business over the next decade.

  • Broadcasters (particularly those with strong radio heritage) see voice as an existential threat. They have moved fast to secure early mover advantage on the platform but recognise that there is a danger of disruption as their privileged position in audio comes under threat. The focus has been on repackaging radio news bulletins and making live streams and podcasts accessible, but now many are starting to create bespoke content and are experimenting with new formats.
  • Newspaper groups have generally been more cautious. They have been burned in the past by investing resources in creating content for new platforms, without a path to monetisation. Many complain about the lack of visibility of their brands in end-user discovery processes. A number are investing in daily current affairs podcasts or niche briefings that are less expensive to produce than round the clock news updates. Others are looking at cost-effective audio services such as text to speech, to provide additional value to consumers.
  • Publishers are concerned about the extra power platforms could hold in the voice environment where typically only one answer/brand can be given in response to a command or question. They recognise that they will need to learn how to optimise their content and programmes for voice, but fear that platforms will become choke points, making it even harder to build direct connections with users.
  • Publishers want to see better tools to make it easier and quicker to integrate content with the growing number of voice platforms. They also want (a) fair and transparent processes to ensure a level playing field, (b) more common commands across news to reduce consumer confusion, (c) better data from platforms on usage, and (d) a much clearer plan for the monetisation of different types of services.

Technology platforms are developing extremely quickly with the introduction of new devices (with screens) and integrating assistants into cars and headphones. Over the next few years voice technologies are likely to move beyond the home, becoming increasingly embedded in every part of our lives.

Although usage today remains limited, we should remember that we are still at a very early stage of development. Voice recognition is improving rapidly, as is the quality of the synthesised voices/responses that can be returned.

Technology platforms will improve discovery, on-boarding, identity management, data, and tooling – but the tensions with publishers are likely to increase as the platforms become critical gateways for accessing media. In the next year we are likely to see voice platforms themselves playing a bigger role aggregating news automatically, while issues of branding/attribution, data access, and monetisation are likely to be the key flashpoints.

1. Methodology and Approach

This report uses a combination of qualitative, quantitative, and interview-based methodologies to gain a holistic understanding of news production and usage on voice platforms.

Working with Differentology, a UK-based market research agency, we conducted in-home depth interviews and focus groups in three countries (UK, US, and Germany) during August and September 2018. The aim was to understand the behaviour of early adopters who had been using news on these devices for six months or more, to understand how media habits had changed and how voice speakers fitted with a range of other devices. We ran a second series of groups with those who had not yet bought the device (but were open to the idea) to understand barriers to purchase. Both groups were exposed to a range of voice news experiences.

  • Eight depth in-home interviews – US and UK.
  • Six focus groups – US, UK, and Germany. Half the focus groups were early adopters of voice technology, 50% using news and 50% not using voice tech for news. Spread of gender and ages. The other focus groups were for non-users of voice technology but with an interest in news.

Working with YouGov, an international market research agency, we conducted nationally representative online surveys in the UK and US where we asked specifically about different kinds of news usage and attitudes to news. Samples of 3,000 in each country were boosted by an extra 1,000 smart speaker owners in the UK and 500 extra in the United States. This allowed us to drill down – with robust numbers – into demographics and explore differences in behaviour between platforms. In most cases we quote the nationally representative numbers but make clear where we are using the smart speaker boosts, which are not subject to the same quotas.

The author interviewed a range of key players from the news publishing industry. The companies that took part were

  • United Kingdom: BBC, Sky News, the Guardian (×2), Telegraph Media Group (×2), Financial Times (×2), The Economist, Reuters
  • United States: Wall St Journal (×2), New York Times (×2), Washington Post, National Public Radio (NPR), CNN
  • Germany: Der Spiegel (×2), T Online, Die Zeit, ARD
  • Rest of the World: Swedish Radio (SR) (×2), Australian Broadcasting Corporation (ABC) (×2)

Finally, we asked the three main platforms, Amazon, Google, and Apple, to share key data points and answer questions about their existing and future news plans.

All three platforms declined our requests for data on the numbers of devices sold in the UK, US, and Germany. They also declined to answer specific questions about the frequency and amount of news consumed by smart speaker owners. Amazon did tell us that tens of millions of devices had been sold worldwide – but not how many tens of millions or in which countries. Amazon, Google, and Apple also provided basic information about country availability, about the type and number of news publishers currently available on their platforms, and about how on-boarding worked. Neither Amazon nor Apple was prepared to discuss future plans or offer an interviewee to respond to the issues raised by publishers in this report. Google did provide a background briefing on future developments and access to a senior member of the US product team, who is quoted in this report.

A full list of interviewees is included in the report as an appendix.

2. What is Voice?

This report is nominally about a new category of devices (Amazon Echo, Google Home, Apple HomePod) that offer new ways for publishers to distribute content, but the technological changes that lie behind these voice-activated speakers are much more profound. Intelligent assistants (Amazon Alexa, Google Assistant, Apple Siri, and Samsung Bixby) will ultimately sit inside many other devices – smartphones, cars, televisions, even microwaves and fridges. We will increasingly use our voice to control devices and access media, because it is a quicker and more convenient input for many purposes than touchscreens or remote controls. The output is a more complex story and depending on the context may often need to involve screen display of some kind.

In the second half of 2018 there has been a stream of new product launches that may indicate the direction of travel. The Google Hub (October 2018) is a screen-based home assistant that adds a visual layer to voice-driven experiences, competing with the existing Amazon Show. Facebook also entered the market in October with Portal, a new screen-based device, which contains Alexa voice functionality as well as its proprietary voice recognition for video calling. BMW has announced its own voice assistant for its fleet of cars worldwide while Amazon is also targeting drivers with a $50 Auto Echo, a credit card-sized box that sits on your dashboard. The latest Bose headphones now come with both Amazon Alexa and Google Assistant inside, allowing users to call up podcasts and instant answers directly into the ear.

This not just about smart speakers

Voice Platforms and their Penetration

Amazon was the first tech company to develop smart speakers in late November 2014, promising a new easy way to control your music with your voice. Since then the Alexa assistant that powers Echo devices has evolved to manage a wider range of tasks from weather, news, and traffic updates to ordering goods and services. Google was almost two years behind with its Google Assistant, launching the first Google Home speaker in November 2016, but long-standing investment in automatic language translation has helped it roll out to more markets (19 compared with 12 for Amazon as of October 2018). The Apple HomePod, a premium speaker powered by Siri, was released in 2018 and is available in eight markets.

Smart speaker launches by country

Amazon Echo etc

Google Home/Mini etc

Apple HomePod

2014

US pre launch (Nov 2014)

US public launch (June 2015)

2016

UK & Ireland, Germany (Sept)

US launch (Nov)

2017

India, Canada, Japan (Nov)

UK (Apr), Canada (June), Australia (July), France (Aug), Germany (Aug), Japan (Oct)

2018

Australia, New Zealand (Feb), France (June), Italy, Spain (Oct)

Italy (Mar), India, Singapore (Apr), Spain, Mexico, Ireland, Austria (June), South Korea (Sept), Sweden, Norway, Denmark, Netherlands (Oct)

US, UK, Australia (Feb), Canada, France, Germany (May), Spain, Mexico (Oct)

In Asia, a number of other devices are popular including Line Clova (Japan), SK Nugu, Naver Friends, Naver Wave, and the KaKao Mini (South Korea).

China launched its first AI driven smart speakers in 2017 with Alibaba and Xiaomi early market leaders. Chinese smart speakers are responsible for around a third of global sales.3 Samsung has demonstrated a Galaxy Home speaker powered by its smart assistant Bixby, which is due to launch before the end of 2018. In terms of wider access, Google says that its Assistant is already available on more than 400 million devices, including phones, headphones, TVs, and watches. It will be available in 30 markets by the end of 20184 including Hindi, Indonesian, and Thai. Apple says Siri is used on more than 500 million active devices and helping with over 2 billion requests each week. It supports 21 languages and is customised for 36 markets.

Growth of Smart Speakers in the US, UK, and Germany

Smart speaker penetration by country and age

Our study focuses on the United States, the United Kingdom, and Germany where both Amazon Echo and Google Home devices have been available for two years or more. In all three markets we have seen an extremely rapid take up of devices – more than doubling between 2017 and 2018 and then increasing again between January and September 2018. One in ten (10%) of our nationally representative sample now has one or more smart speakers in the UK, with the figure even higher in the United States (14%). We did not survey in Germany in September 2018 but penetration in January was already 5%. Usage peaks between the ages of 30 and 45 in both the UK and the US, though our UK sample shows strong usage with older groups too.

Our data, which ask about devices that are ‘owned and used nowadays’, indicate slightly lower levels of ownership than some other surveys, but industry analysts believe that low cost and increased functionality will drive considerable growth in developed markets at least. Juniper Research predicts that smart speakers will be found in 55% of US households by 2022 (70 million homes). Irrespective of the precise pace, it seems clear that smart speakers specifically and voice systems more broadly will see mass adoption in the near future.5

25639.png

25647.png

25656.png

2017/Jan 2018 Digital News Report Q. Which, if any, of the following devices do you ever use (for any purpose)? Showing smart speaker code. Base: All (approx. 2000 in each country). Smart speaker survey (Sept 2018) Q. Which of the following devices do you own and use nowadays? Showing smart speaker code. Base: All, UK=2104, US=3288.

Platform Wars – Amazon vs Google vs Apple

In all three markets studied, Amazon’s first move has enabled it to take a dominant position, even if this is starting to be eroded by other players.

Market share of smart speakers – UK

25710.png

Smart Speakers Survey (Sept 2018). Base: All who own a smart speaker in UK (213). NB: Share is based on model owned OR model used most often if more than one model of smart speaker is owned.

Our YouGov poll (September 2018) shows Amazon with three-quarters of the market (74%) in the UK and almost two-thirds (63%) in the United States. Google has over one in ten of devices (14%) in the UK and a quarter (26%) in the United States. Sonos One (which is powered by Alexa) is a significant player in the UK (5%), while the Apple HomePod has 2% in the US.6

Market share of smart speakers – US

25720.png

Smart Speakers Survey (Sept 2018). Base: All who own a smart speaker in US (474). NB: Share is based on model owned OR model used most often if more than one model of smart speaker is owned.

This pattern may not be reflected in other markets going forward. In Australia, where Amazon did not have a first mover advantage, Google has emerged as the main provider. Indeed, the Australian Broadcasting Corporation (ABC) told us that it was hard to find Amazon Echo users for its recent testing. In the tech savvy markets of Norway, Sweden, Denmark, and the Netherlands, Google currently has an open playing field, with other platforms still to launch.

Future Plans

Technology companies are extremely reluctant to talk about business strategy or future roadmaps but it is clear that Amazon sees voice supporting its e-commerce business, while Apple is interested in selling more premium devices to its loyal customers. With more searches transferring to voice each year, this is also potentially a huge area of disruption for Google’s core advertising business, which depends on data collection at scale. Tech companies are looking at ways to reach the next billion internet users, many of whom are in the developing world with growing access to mobile technology. The hope is that voice interfaces could be a simpler, cheaper, and more natural way to interact than traditional computing – even if barriers around cost and language remain. ‘We see voice as the ubiquitous “always with you” platform that allows you to do things in the real world’, says Steve McLendon, news product lead for voice at Google. Amazon’s Dave Limp talks in similar terms about the Alexa platform: ‘We think of it as ambient computing, which is computer access that’s less dedicated personally to you but more ubiquitous.’7The platforms see voice as a critical component of this new ambient era, where technology fades into the background until you need it.

3. How Voice is Being Used Today

In this section we explore how people are using voice-activated speakers today and where news fits into that.

The chart below shows the features that are most regularly used by UK smart speaker users compared with the features that are most valued. In both cases the ability to play music comes out on top, with four in five (84%) saying they use this feature and almost two-thirds (61%) saying it is their most valued feature. News is used about half as much (46%) as music, with just 1% saying it is the most important feature for them. A range of information-based tasks, such as asking general questions (64%) and checking the weather (58%), are more widely used than news updates.

Top/most valued features on smart speakers (UK)

25745.png

Q. Which, if any, of the following features do you use/is most important on your speaker? Base: UK All that own a smart speaker & are aware of its features = 185.

In our focus groups and in-home interviews, it also became clear that these devices are mainly being used for simple, straightforward tasks. They have taken the friction out of activating Spotify, Amazon, or Apple music, reducing the need to press buttons or pair devices. This in turn has increased the frequency and volume of usage.8 Switching on the radio is easier too, while setting alarms and turning lights on and off also fits into a ‘command and control’ usage pattern.

We also found a thrill for some older users in feeling part of the future. We came across one taxi driver in his seventies who had never been able to master a computer, a smartphone, or a tablet but learnt how to interact with his Amazon Echo within a few days. The simplicity of these devices and the lack of the need for fine motor skills has made them a surprise hit with older groups and those with disabilities.

Many respondents also found there was a social and fun element to these devices. Often they played games together, or asked questions to settle an argument, for example, the year of a film release or the age of a politician. Many appreciated features that introduced whimsy, such as ‘tell me a joke’. Alexa, in particular, was often treated as a member of the family, brought into conversations, and asked for ‘her’ opinions.

Early adopters really love their smart speakers

25754.png

Personality test – Echo versus Google Home

25764.png

Our research suggests that Alexa’s engaging personality has helped voice devices be welcomed into homes and get over some of the natural concerns about a new technology. But as voice-activated technologies become more entrenched, Google’s more functional approach and commitment to blend voice across a range of existing services could prove equally valuable to consumers.

Picking up on these trends, it is striking that many media companies have focused on building social and family-based experiences. The BBC launched an interactive app (or skill) for younger children in September 2018, allowing them to interact (dance) with characters from popular CBeebies TV shows. Mukul Devichand, executive editor, voice & AI, says the key aim at this stage is to learn what works:

The technology is still new and we’re experimenting with what our audiences really want on these new platforms. High quality content for children is key to the BBC’s public service goals.

ABC and Swedish Radio are also looking at content for children, while at least one publisher with a print background, the New York Times, is exploring a news quiz to capitalise on the same insights around the interactive and social nature of these devices.

Usage Throughout the Day

One task we set respondents was to understand how they used voice throughout the day, and how this fitted in with other media use. We were keen to explore what habits had changed and what had stayed the same.

The next chart sets out the general picture, aggregating the results of our research. It was striking how quickly new patterns of usage had developed and how easy they were to recall.

In the early part of the day, users were often in information-seeking mode, looking for news, weather, and travel updates before the morning commute. For a minority, this routine included an audio check of their diary for the day. While bespoke news updates were sometimes part of this routine, many were just using the device to play the radio. Later in the day, usage was geared more towards entertainment – playing music, games, or relaxing with a podcast while cooking.

Smart speaker use peaks early and late in the day

25785.png

For the most part, we found that smart speakers were not replacing other media but were used to access the same media in a different way. Our in-home research showed that the location of the speaker had a significant impact on usage, as did family situation (e.g. living alone or with children).

Case Study 1: Elaine (58) married, grown up children (UK)

  • Occupation: Part-time optician
  • News use: Medium, traditional
  • Location of speaker(s): One – living room

Elaine watches the TV in the bedroom first thing and typically asks her Amazon Echo to turn on the radio as she comes downstairs. She often asks for a news bulletin before leaving the house when she reverts to a car radio. She would ideally like Alexa to be integrated into her car and also in her TV at home. She finds the BBC News bulletin on her Echo device too long and would prefer if it was just one minute.

25795.png

In Elaine’s case, the arrival of the Alexa led to a significant change within a week. ‘I just got rid of the radio because there was no need for it anymore’, says Elaine who thinks she listens to more music nowadays and says she is less likely to turn on the TV during the day.

Case Study 2: Jeremy (40s) divorced with three children (UK)

  • Occupation: Political consultant
  • News use: Power user, broadcast, and digital
  • Location of speaker(s): Two – bedroom and living room

Jeremy starts his day with an alarm call from the Google Mini in the bedroom and then asks for a news update, which he has configured to include the BBC News and a current affairs podcast from Monocle 24. In the living room he switches to his larger Google Home device, which he uses to access the radio (BBC Radio or LBC). Out and about he mainly used his laptop or smartphone. In the evenings he uses a wider range of functions such as shopping lists and music. He ends the day with a check on his diary and by setting his wake-up call.

25804.png

Jeremy says that he rarely watches TV these days. He is also a self-declared early adopter of technology. He often dictates voice notes into his speaker, has started using more voice searches on his smartphone, and frequently shares voice messages now with his children via WhatsApp.

He thinks that news usage on his smart speaker has helped him become better informed and his discovery of the Monocle podcast (US-based) has introduced him to new perspectives. ‘It brings me a huge degree of convenience and encourages me to search out more diverse sources of information than I might ordinarily use,’ he says.

Case Study 3: Adam (40) wife and child with twins on the way (US)

  • Occupation: Works for technology start-up
  • News use: Digital, sometimes avoidant
  • Location of speaker(s): One in flat

Adam purchased a smart speaker to make life easier and also because he feels he needs to keep up with technology for his work. He uses Alexa every day, mostly for weather updates and fun activities with his son.

He is critical of the current news bulletins available through the device and of American media in general. He is interested in a wider international agenda but hasn’t configured his device to deliver this. He finds bulletins much too long and frequently not updated enough. As a result, he rarely uses the news.

25817.png

Case Study 4: Karissa (37), New York with son and partner (US)

  • Occupation: Studying for a Masters in theology.
  • News use: Digital, traditional
  • Location of speaker(s): One in studio flat

Karissa has always been a ‘gamer’ and uses technology easily. She is an audio learner who prefers to ask questions and hear responses. She does this for everything from studying for her masters to playing games as a family. Karissa is highly politically engaged (Democrat) and uses the device to listen to news but also to fact check what she’s hearing. She mostly listens to National Public Radio (NPR) when on her own but once the family are home, lack of space dictates group listening/viewing only. She is an Amazon fan – and even uses Dash buttons to order goods – but she is using a Google device because it was bought as a present. Her son set the Google speaker up and often tends to dictate how they use it.

25827.png

Karissa says she consumes roughly the same amount of news as previously – but finds audio a more convenient format than previous options.

Overwhelmed by Technology and by Screens

With the exception of early adopters, one striking finding from our interviews and focus groups was how frustrated many people feel by technology. Visiting homes for this report we often saw people battling with up to 10 different remote controls and other complex interfaces for accessing media and controlling devices. Although we were talking to respondents specifically about smart speakers, the qualitative research suggests that their underlying need was not for another device; rather they wanted their existing devices to be easier to use. In this respect they see voice input as a way to de-clutter and simplify their lives and make interaction far more natural and intuitive.

We have 5 Bose speakers and we barely use them – the quality is nowhere near as good [on the Alexa] but it’s just so much more convenient.

I don’t even have a radio anymore, it’s great – it totally declutters.
(In-Home Interviews, US)

A second theme was the desire – almost universally expressed – to spend less time with screens. Respondents felt overwhelmed, assaulted by technology and often by news as well. Many spend all day at work on screens or looking at their smartphone. Some resent the way in which the internet can distract and waste time by taking people down ‘rabbit holes’. Part of the appeal of voice devices is they act differently. They provide focused information when summoned and, for the moment at least, the lack of a screen means less distraction.

I quite like that it doesn’t have a screen actually. One less thing to … too many screens all the time.(Focus Group, UK)

Early adopters we spoke to felt that voice was enabling them to control technology rather than the other way around.

It’s the technology I own which doesn’t ask for my attention all the time.
(Focus Group, Germany)

It makes me feel much more in control.
(Depth Interview, UK)

Many see voice as a chance to de-clutter

shutterstock_38617564.jpg shutterstock_125374271.jpg
Most want to spend less time with screeens

These findings about screens do raise questions about the new product releases for the home – and how successful they might be. If Alexa and Google Assistant are embedded within televisions and provide easier to access via smartphones and tablets, is there really a consumer need for a new set of screen-based products? It may benefit technology companies today to add screens to enable advertising and other promotional opportunities, but so far consumers are underwhelmed. Screen-based devices (Echo Show/Echo Spot) currently make up just 8% of the total market in our UK survey and 6% in the United States.

Barriers to the Use of Voice Activated Devices

While millions of smart speakers have been sold, many still doubt the long-term value of these new technologies for mainstream audiences. Is the hype justified? Is growth sustainable?

To understand these factors, we explored (a) the value existing users felt they were getting from their devices, and (b) the barriers to further growth from those who have not yet bought them.

In terms of existing users, our survey suggests relatively high levels of satisfaction. Around a third (32%) of users in the UK have bought at least one additional device. Almost one in ten (8%) have bought at least two additional devices. Overall, around two-thirds (69%) say they would be likely to replace or upgrade their speaker when this became necessary. Around a quarter (23%) said they would not do so, suggesting that not everyone is getting value from these devices.

Concerns about Platform Motivations and Privacy

Some clues about the barriers to growth came from our three focus groups where we talked to those who had not yet bought a device. A critical concern for this group was privacy. Widespread concerns were expressed about how giant tech companies could now listen to every conversation in your home.

I’d be a little afraid to acquire it because I’d fear for my own security.
(Focus Group – potential voice customer, Germany)

A story about Alexa letting out a creepy laugh without being prompted was mentioned in a number of focus groups, even if no one had heard this directly themselves. Having said that, most felt they would be prepared to trade off some privacy for convenience, as they already have done in the case of Facebook.

It’s not nice to know you’re being listened in on all day, but in the end I don’t give a shit.
(Focus Group – potential voice customer, Germany)

You do wonder, but then I’ve got nothing to hide.
(Depth Interview, UK)

Linked to privacy there was also some concern about the role and motivation of the tech platforms. There was discussion about why Google and Amazon were developing these technologies and how it might link to collecting data to sell advertising or goods. This is a barrier for some (and will prevent the most opposed from adopting these technologies), but most understand the core trade off; namely providing more data than they’re entirely comfortable with in return for useful free services.

But, you’re already being monitored when you post certain things on Facebook, for example.
(Focus Group – non-speaker user, Germany)

User perceptions of tech company motivations
Summarised view from focus groups

25927.png

 

Feeling Foolish?

Other participants raised a related but fundamental concern. How comfortable do we really feel about talking to a computer rather than a human? For many, this still feels unnatural.

I think what’s still weird for me is when I get an answer. It’s weird, because I’m actually talking to my mobile, and it’s totally unfamiliar.
(Focus Group – non-speaker user, Germany)

But respondents who had been using smart speakers for some time said the discomfort tended not to last very long. They also told us that talking to Alexa or Google Assistant had made them more likely to use voice on their smartphone. These qualitative observations are supported by a recent poll9 that shows that three-quarters (72%) of smart speaker owners are happy talking to a voice assistant in front of others compared with under a third (29%) of those who do not own the devices.

It doesn’t feel weird anymore. It just feels normal.
(Focus Group – voice user, US)

Many participants talked about how they had incorporated the smart assistant into their family conversations. It may also be that confidence is built up in more personal contexts (such as the car or via a phone in a bedroom) and then spreads to more shared spaces.

Conclusions

The shift to voice clearly requires a significant change of mindset for consumers. It will take some time and even then not everyone will be convinced. Users and non-users across all three markets seem to be carefully weighing the upsides and downsides of voice technologies. Although they often reach different conclusions, the factors under consideration are remarkably consistent.

Benefits include making life easier, enabling new and useful behaviours, and reducing clutter. The downsides include concerns about privacy, discomfort with talking to machines, poor voice recognition for complex queries, and limited functionality.

The evidence from early adopters is that the simplicity, control, and convenience of voice could be tipping the balance most of the time, even if they still have frustrations and concerns. Voice-activated speakers are physically replacing radios – but not necessarily radio listening. Indeed, there is evidence that over all they are increasing access to audio of all types. Beyond that we are starting to see emerging behaviours such as query answering and a range of social activities, such as family games and children’s education, perhaps not envisaged even by those who created the technology.

One of the most significant insights is how many of these respondents start and end their day with voice technology. In this specific respect, it is the smartphone that is being displaced and that may have profound implications for media owners looking to distribute content as well as for the platforms that manage to forge a dominant and trusted position with consumers.

4. News Usage in Detail

In this section we look in more detail at the type of news experiences that are currently available via smart speakers. Our focus groups and depth interviews indicate that news usage is not yet as deep or valued as we might hope – but what is working for publishers and consumers?

The limitations of both technology and content have confined publisher activity today to four early formats.

  1. News briefings: The three main platforms have created a specific format designed to provide a quick update on the news. Amazon calls this a Flash Briefing format, Google defines this as Narrative News and Apple uses the term Audio News Briefings. Updates are triggered by user commands, such as ‘play me the news’, ‘what’s the latest’, or ‘give me the headlines’. Each platform has a different length limit. Amazon allows a maximum of 10 minutes.
  2. Live streams and podcasts: Existing news radio channels like National Public Radio in the United States or Radio 5 Live in the UK can be accessed by name. Music-based radio stations can also be accessed and often include news bulletins at the top of each hour. On demand programmes (or podcasts) can also be called up.
  3. Question and answers: Examples include queries such as ‘What’s the temperature in Rome at the weekend?’, ‘What’s the latest Manchester United score?’, ‘How old is Donald Trump?’ At this stage, most of these queries are answered directly by Google or Amazon rather than a news organisation.
  4. Interactive experiences: There has been limited experimentation with a range of more innovative uses such as quizzes, recipe exploration for cookery, and virtual travel.

We will now explore more about the content and usage in each of these areas.

News Briefings

Platforms have encouraged publishers to create content that provides a quick update on a broad or narrow news subject. They have also promoted commands such as ‘play the latest news’ as part of their marketing strategies.

NPR.psd

But how many people use these news briefings? Platforms have declined to share data about news usage publicly but our YouGov survey in the United States and United Kingdom suggests that owners of smart speakers are not using this core functionality with great enthusiasm or frequency. Only around one in five (21% UK, 18% US) access daily, with the majority of smart speaker owners not using news update functionality at all in the last month.

25986.png

Default Status Matters

Broadcasters like the BBC and NPR in the US have had a significant advantage. Because of their leading offline position they were awarded prominent status on these platforms at the start. Our research shows that these default starting points tend not to be changed (see survey data below) and although additional providers can be added to create a longer sequence of news briefings, this option is rarely exercised.

25995.png

Though a list of options is presented during set up, qualitative interviews also suggest that most users don’t know how to change these via the Google, Amazon, or Apple apps. In the UK, we also found little desire to change the default setting, which is BBC News on all three platforms. Given this, it is not surprising that the BBC dominates our survey with 64% of all news update usage, followed by Sky News, which offers a business and showbiz briefing in addition to basic news. Other providers struggle for attention.

26004.png

Q. When playing the news headlines from your smart speaker, which brand(s) do you hear? Select all that apply. Base: All those who have accessed news briefings Nat Rep 109.

Broadcast brands tend to attract the majority of usage in general. This is partly because they can deliver regular audio news updates without adding too much extra cost and partly because users trust these brands already for audio news. This trend also holds true in the United States, but here we see a much more even usage split between a number of national and international broadcasters. This can be explained by a recent platform decision to change the most prominent option shown (NPR) with the arrival of screen-based devices like the Echo Show.

This led to Reuters TV, CNN, and US TV networks receiving greater promotion, although NPR remains the default option on Apple devices. NPR says this change led to a significant fall-off in new users, although existing users have remained loyal. More US users have changed or added brands than in the UK, which may relate to a combination of the different default options and the more polarised nature of news provision in the United States.

26013.png

Q. When playing the news headlines from your smart speaker, which brand(s) do you hear? Select all that apply. Base: All US Adults who have accessed news briefings Nat Rep 177.

Improvements to News Updates

In general we found that news updates were not greatly loved – even if they are the most actively used news feature. Content is mostly an offcut from broadcast or print output, with tone and length rarely adapted for smart speakers. This leads to a wide range of user complaints:

  • Overlong updates – the typical duration is around five minutes, but many wanted something much shorter.
  • They are not updated often enough. News and sports bulletins are sometimes hours or days out of date.
  • Some bulletins still use synthesised voices (text to speech), which many find hard to listen to.
  • Some updates have low production values or poor audio quality.
  • Where bulletins from different providers run together, there is often duplication of stories.
  • Some updates have intrusive jingles or adverts.
  • There is no opportunity to skip or select stories.

Adam, one of our New York based in-home interviewees, was particularly critical. ‘When someone asks for an update on something, they are asking for a summary. Don’t give me something that is longer than a minute,’ he says.

He has selected the New York Times – a brand he respects – on his Amazon Echo device, but this plays its podcast, The Daily, which drills down into one subject in great detail. ‘So I don’t use it. I get pretty frustrated with it.’

The New York Times has done its own research and has heard many of the same criticisms. Partly as a result, it is introducing a new more ‘native’ briefing to replace The Daily in that slot. The Economist has also replaced its Economist radio current affairs segments in the US with a more focused Espresso update for similar reasons.

Reduced length does not necessarily mean lower impact though. It was striking that regular users of news briefings felt better informed (56% in the US and 45% in the UK – only 3% felt less well informed in the US and 1% in the UK). This may be a reaction to the fragmented nature of much news consumption nowadays, which means it can be easy to miss the overview of what is generally important.

26077.png

The BBC is also looking to refresh its UK-based news update to make it more bespoke and personalised. The broadcaster is experimenting with functionality to allow users to ‘skip’ stories as well as ‘diving’ into greater depth on others. Currently voice interfaces don’t make this type of navigation easy, which is where screens may play an increasingly important role.

The tech platforms also recognise that the consistency and discovery of news updates remains problematic – and are working hard to improve relevance.

Publishers expect that at least one of the big platforms will introduce an aggregated and personalised news service in the near future, which allows users to select specific stories and automatically compile their own bulletins from multiple providers. This won’t replace branded news updates but could increase tension with publishers who are concerned about attribution and maintaining their own presence on the platforms.

Live Streams and Podcasts

As previously noted, many people are using their smart speakers as a way of playing live radio stations or programmes. This often means music radio, but even this typically contains some kind of news. In the UK, where there is a strong tradition of radio listening, almost two-thirds of speaker owners (60%) said they had done this in the last month. There was less live radio listening in the United States (41%) but it is still a significant growth area.

‘Almost a fifth (19%) of all online listening to NPR’s member stations’ live radio streams now comes from smart speakers’, says Joel Sucherman, VP of new platform partnerships, and almost all of this is additional. Listening from computers and smartphones has remained steady, he says, as smart speaker use has grown. To achieve this, NPR has had to invest in systems to make the right streams available and to build an app – known as a ‘skill’ by Amazon and as an ‘action’ by Google – to make it easy for listeners to find their local station. This has now been integrated at the operating system level so this works automatically for new users.

26094.png

Older groups (over 55s) are much more likely to listen to live radio on smart speakers. Younger groups (especially 25–34s) are more likely to access podcasts. NPR is trying to create technology that achieves the best of both worlds. The NPR One skill (app) automatically curates a longer on demand news experience that mixes local news, national news, and current affairs. NPR has invested in a three-person ‘curation’ team to tag audio segments with extra metadata so they can be reassembled in the most logical order. Users can rate and skip stories, which helps to train the algorithm. In this way NPR hopes to create a seamless flow of media that better fits each individual’s needs and interests.

Joel Sucherman says this kind of innovation is critical to future proofing NPR: ‘What happens if people don’t want to consume in a linear fashion is something that we’ve been thinking about for the last few years’. The NPR One experience is trying to give users control at the same time as offering a recommendation system that takes account of NPR’s editorial judgement.

Live is still a critical component of listening to news radio. However, in an on demand world in which listeners expect the kind of control afforded by platforms like Spotify and Pandora, NPR must be able to satisfy the expectations of younger listeners.

Swedish radio (SR) is taking a similar approach looking at personalised news through an app, which will ultimately carry through to voice platforms. Audio production can then be augmented with graphics and pictures that can be displayed on other devices to provide extra context. ‘This is about enhancing the audio, building functions that will make life more interesting for the listener and take radio to the next level’, says head of voice products Tomas Granryd. ‘If we are not in the lead of this, we are really not doing our job.’

Podcasts Not a Big Hit on Smart Speakers

Podcasts are used less in smart speakers than one might expect and much less than live radio. Our survey shows that just 15% of smart speaker owners in the UK and 22% in the US say they have accessed a podcast in the last month. Publishers told us that typically only around 1–5% of podcast listening was typically coming from smart speakers.10

When conducting in-home interviews it became apparent that there were a number of reasons for this relatively low podcast use.

  • Smart speakers are currently disproportionally owned by older people, podcasts are much more likely to be used by under 35s.
  • The discovery of podcasts can be challenging. Users are not aware of the commands to use or whether they need to install a special app.
  • Podcasts are often niche and personal. They don’t always work within a shared space at home.
  • The majority of podcast listening is out of home (when commuting, exercising, etc.) where smart speakers are not relevant.
  • Even at home, the smart speaker is often in the wrong room for podcasts (e.g. the bedroom rather than kitchen). They are not portable.
  • The speaker is often not of a good enough quality when compared with the hi-fi already in the living room.

I’m not sure how much I would listen to it there [on a smart speaker], because podcasts are the sort of thing I would listen to on a train.
(Focus Group, UK)

As smart assistants become integrated into bluetooth headphones, it is possible that voice access to podcasts and other on demand audio will grow substantially. Google is also looking to improve podcast discovery mechanisms for both its mobile and voice platforms and this may eventually replace existing aggregators like Tune In, which currently comes preinstalled on both Google and Amazon devices. But the need to discover content through aggregators is not ideal for publishers and broadcasters that are looking to build a direct relationship through their own apps (‘skills’ and ‘actions’). This may give publishers more control over recommendations and onward journeys – as well as potentially attribution and branding. Most of the broadcasters we talked to (BBC, NPR, ABC, and Swedish Radio) had developed – or were planning – special apps that contained all their podcasts as well as live streams. The BBC’s radio skill had 1 million downloads in the first six weeks of operation and today is accessed by around 500,000 weekly browsers.

Not all publishers will have the market power of the BBC and NPR to persuade users to install these apps or to persuade platforms to pre-install them. As with mobile there are likely to be a few big winners with a long tail of other publishers struggling for attention.

Questions and Answers

As we discovered earlier, one of the most popular uses of smart speakers is to answer everyday questions. This generally works well for queries about the weather, as well as sports scores and cinema listings, because there are set of ways of asking about these subjects and structured data that can be used to return answers. Even so, some queries still go wrong:

Yesterday I said to her [Alexa] ‘What’s the weather forecast for tomorrow in Barnet?’ and she kept getting Barnet, Australia [rather than London].
(In-Home Interview, UK)

Every day the biggest platforms (Amazon and Google) are learning what queries are working and which are not, perfecting their algorithms through advanced machine learning (ML). Over time we can expect these simple queries to get much better. The same is unlikely to apply to news, however, where the data are messy and the number of potential queries almost infinite.

In our focus groups, where we asked participants to try a number of news voice queries, the majority failed due to the complexity of the query or problems of phrase recognition.

Maybe it’s just learning. But if it’s a question that could have [multiple] meanings, it makes me ask it again and again [until it has enough detail] By which time I forgot what the original question was.
(In-depth Interview UK)

In other cases, answers were inconsistent. As one example, we asked for ‘the number of people who died in the Grenfell Fire’. Google and Alexa gave slightly different numbers because they drew the result from two different news articles published at different times. One platform (Google) made clear what the source was (the Independent) and also gave the date. But this answer was also a rather complex and longwinded way of getting the number that we were after.

A few weeks later, the answers given to this query had changed. Alexa successfully and precisely returned the officially recognised number (72) – but with no additional information about the source. Google had changed its answer to select a relevant part of a Wikipedia entry that also contained the exact number (72).

This example shows the difficulty in returning news results in voice without the kind of extra context that is often visible in a screen-based result. It also shows how quickly machine learning is improving the accuracy of results in cases where people have asked a question before.

Google’s Steve McLendon recognises that many voice queries are ‘not perfect’ right now. He also thinks that voice sharpens the tension between efficiency and choice: ‘If we simplify interactions, we get stuff done more efficiently. [But] if we give the voice equivalents of blue links it breaks that. That is the tension we are grappling with.’

One way of developing this space is to ask publishers to create content in response to specific news questions. Platforms are on record as wanting to develop an ‘answer engine’ but they don’t currently have the right kind of news content to feed this. Google has recently released a specification for a new metadata field for articles that can contain text that is designed to be read out.11 Effectively publishers are being asked to develop new content and news skills around Answer Engine Optimisation (AEO) through adding voice-optimised text. But what is in it for them?

Publishers we interviewed were extremely wary of helping Google (or Amazon) build a huge global ‘answer engine’, without compensation – and it is difficult to see how advertising or sponsorship could work around such short pieces of content. Instead, one publisher suggested that ‘news answers’ could be developed as premium service (e.g. bundled with Amazon Prime) in conjunction with a number of interested news organisations. Each would be paid in proportion to the number of queries that were considered most relevant and then read out.

Publishers and platforms will be experimenting with different question and answer formats over the next few years. While new factual answer formats is one possibility (e.g. 72 people died at Grenfell Tower – says BBC News, sourcing the official inquiry), many would like to enable more complex conversations on broader topics like Brexit or about the choice people face in elections. The US publisher Quartz has been experimenting with short question and answer text formats in its mobile app, but it may be that voice will ultimately be a more satisfying – or just an additional – output. Ultimately these formats could be applied to any number of news and current affairs subjects but success will require platforms and publishers to work together over time both to create valuable experiences and to educate users.

Interactive Experiences and Experimentation

These are early days, and few of those interviewed for this study claimed to know how these voice devices and assistants might be used in the future. This is why there is a significant focus on more open-ended experimentation. Some of this is being funded by platforms in a similar way to early VR and Live Video content. Others are reluctant to take money but still keen to use these platforms to innovate and learn.

The Guardian is using Google funding to run a Voice Lab for six months (from November 2018).12 A product and editorial team of four will test a range of propositions around synthesised voices and interactive formats. The team will manage a weekly blog where they share learning with the rest of the industry and useful code that emerges will be open-sourced. With limited funding available for innovation, this platform-funded approach is one way in which the Guardian can dip its toe in the water, says its global head of audio and video, Christian Bennett:

We have to think about how to innovate in a way that will work for us and doesn’t just help the platforms. We have a finite amount of resources and we have to put them in sensible places.

Others are switching investment from other areas to focus on voice. The BBC has set up a number of new product and innovation teams to work on news, radio, and children’s voice output. Teams from the BBC News Labs have developed a number of internal prototypes to explore options. One involved cutting up a 45-minute radio interview with a sleep expert and mapping answers to different queries. This has been turned into an Amazon skill, which has been tested with users.

The Financial Times has been experimenting with automated text to speech services for some time, but has recently worked on an interactive project to showcase content that is optimised specifically for these new platforms. The Hidden Cities project is part of collaboration with Google, which combines a physical map in the newspaper (November 2018) with a rich interactive audio tour of Berlin. A specific audio prompt from the map provides access to the FT’s bureau chief talking about the new city airport or reporting from the queue of one of Berlin’s top clubs. Rather like a museum audio guide, users can skip, stop, or go deeper. The FT hopes it will learn more about how to put these experiences together but also what works for audiences.

FIGURE_CITIES.psd

Multimodal Outputs

Some of the most interesting experiments are around mixing voice with smartphone or other screens. In the example here, a recipe can be found using a search on the smartphone and then sent to the voice device. When you say ‘start recipe’ your voice device starts reading the ingredients and steps.

A reverse use case involves asking about films playing at your local cinema on your voice device, but at the point of booking – a message is sent to your phone to allow you choose the performance time and add your payment details. Few people are using these advanced functions yet, but Google’s commitment could change that over time. A key question is how news publishers can take advantage of these potential linkages between screens and voice-enabled speakers.

Conclusions

Over all we can detect different types of news usage emerging on voice platforms. These could be characterised as:

Our research shows that the most common type of usage today is passive and essentially involves a more convenient way of accessing linear output from broadcasters and other publishers. But we also see a significant minority beginning to develop new habits of ‘command and control’ access to short news updates, even if our research suggests that the content itself is not yet sufficiently tailored to the context of this new platform.

There is some evidence that audiences would like more personalised experiences and more control over the content too, but the process of configuring this is still too difficult for most. Perhaps the most interesting area is around assistive experiences such as conversational and immersive interactions. This is the least developed currently but could ultimately be the most disruptive to existing content as well as offering the greatest scope for creativity.

1 2017 NPR/Edison Smart Audio report via https://finance.yahoo.com/news/smart-speaker-sales-growing-faster-152339686.html

2 Antonis Kalogeropoulos and Nic Newman, ‘I Saw the News on Facebook’: Brand Attribution When Accessing News from Distributed Environments, RISJ, 2017. https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2017-07/Brand%20attributions%20report.pdf

3 https://voicebot.ai/2018/08/16/china-is-driving-half-of-global-smart-speaker-growth/

4 https://blog.google/products/assistant/google-assistant-going-global/

5 Juniper Research Report, Nov. 2017. https://www.juniperresearch.com/press/press-releases/amazon-echo-google-home-reside-over-50pc-us-house

6These data represent those who say these are their MAIN device used in the home.

7 Amazon quote from https://www.theguardian.com/technology/2018/jan/06/how-smart-speakers-stole-the-show-from-smartphones, Steve McLendon interview with author, Sept. 2018.

8 Edison Smart Audio Report 2018 for National Public Radio (NPR). https://www.edisonresearch.com/the-smart-audio-report-from-npr-and-edison-research-spring-2018/

9 Adobe Digital Assistants report. https://www.slideshare.net/adobe/adi-state-of-voice-assistants-113779956

10 A number of publishers distribute via podcast distributor Acast. Using its proprietary dashboards, it is possible to see the proportion of users from Amazon and Google devices.

13 Assuming 1m of 1.2m devices active, 50% of these access news, and roughly 250,000 of these have used Tagesschau.

5. Publisher Strategies and Monetisation

In this section, we explore in more detail how publishers and platforms are looking at voice devices. How do broadcasters and newspapers and digital-born outlets think about content and monetisation. What do publishers want from platforms and vice versa?

Most companies are still formulating strategies around voice. Others are overlaying them on top of existing audio strategies, which have become more central in recent years. Here is a selection of thinking from leading news organisations.

Washington Post

washingtonpost_logo_black.png

Type: Newspaper background, US

Monetisation: Advertising, supporting subscription a secondary goal

Owned by Amazon’s Jeff Bezos, it is not surprising that the Post has taken a strong interest in voice. It created one of the first Alexa skills back in 2016 for the presidential election, followed by an on demand politics brief, now called The Daily 202’s Big Idea. The Post recognises that it needs to offer something distinctive that complements news bulletins offered by other publishers. This includes two additional Flash briefing products, one for local Washington weather and another called Retropod, an audio snack that introduces listeners to a ‘fascinating moment from history’ every weekday. ‘Flash Briefings can be very repetitive, with many publishers covering the same national and political content. With Retropod, we wanted to add whimsy and delight to the smart speaker experience,’ says product manager for off-platform services Joseph Price. Another early learning is that audiences ‘appreciate brevity more than breadth’. Retropod runs to three or four minutes, The Daily 202 a little longer. The Post finds that there is twice as much audience for Flash Briefings in the morning compared with the evening, so they try to create clear signposting within the programming that bears in mind the busy listening context at that time of day. ‘We want to create programming that works everywhere the user can encounter the audio, whether it’s the traditional podcast or newer smart speaker ecosystems,’ Price said.

Price does not think enabling conversations around the news using voice is likely to be a mass-market activity any time soon. But as voice search becomes much more important he is acutely aware of the potentially powerful role of platforms. Like other publishers, the Post would like to know how it could get attribution and be compensated for the extra effort required to plug content into voice search.

The Post is exploring a new daily show, which will be the ‘most ambitious attempt yet’ to translate its every day journalism into audio. Like their other audio programming, it will likely be featured both as a podcast and flash briefing format.

The Post sees good monetisation potential in existing audio formats, especially those that attract a large and regular audience. For their Flash Briefing programming, they include short (less than 10 second) pre-roll advertising. Supporting subscription is a secondary goal for audio products, with a priority on expanding The Post’s overall audio audience.

New York Times

nyt-logo.png

Type: Newspaper background, US

Monetisation: A work in progress

The New York Times has taken a cautious approach to voice, partly due to the lack of good data and partly because of the expense involved in creating bespoke content. Instead of going ‘all-in’ on voice, they have been conducting qualitative and interview-based research to drive independent insights before setting their strategy. In common with our findings, they were struck by the simple ways in which the platform was being used today and by the clear desire to spend less time with screens. The new editorial lead for voice, Dan Sanchez, will be working closely with research and product teams to run a series of experiments exploring both new consumer experiences and commercial potential.

The Times is focused on short-, medium-, and long-term goals and is looking to identify areas where it can create differentiated experiences. One of the first short-term changes will be to replace The Daily with a shorter, more native flash briefing. ‘[The Daily] is a great narrative deep dive but it not really a way to quickly get informed about what you need to know at the beginning of every day,’ says Sanchez. A medium-term goal is to launch an interactive news quiz, while longer term innovations may relate to multimodal experiments linking say the newspaper to voice and vice versa. With discovery happening off-platform – another key insight – the Times is looking for the best way to raise awareness of new product offerings.

Wall Street Journal

wsj.png

Type: Newspaper background, US

Monetisation: Nascent, a tool for reaching new audiences

The Journal runs news briefings on both Amazon and Google platforms that are updated two or three times a day. They also run a custom skill, which offers podcasts and market updates, but awareness remains a barrier to usage.

As with many news organisations, ownership of voice is split between editorial teams responsible for audio and video, audience engagement teams, and product teams. The audience team led by Carla Zanoni sees voice as an important new way to reach new audiences rather than driving immediate commercial return. The focus is on experimentation, similar to recent work with Snapchat and Messenger bots. Like other publishers from a newspaper background, the Journal sees voice as a medium- to long-term play and thinks it will only really take off when assistants like Alexa become truly mobile through deep integration into headphones and cars, where Americans spend considerable time.

The Economist

economist.png

Type: Magazine publisher, UK/US

Monetisation: Value add for subscribers plus sponsorship, plus brand marketing

The Economist has been a pioneer in audio for many years. A significant proportion of app usage (10%) already comes from its audio (human-read) edition consumed on the go.

This ‘audio edition’ will be the first of a number of premium voice services that the magazine hopes to extend to the Alexa and Google platforms when functionality is in place. Only subscribers will have access to the audio edition, helping to create extra value and lock-in.

At the same time, the team will be working on a number of new ‘free’ products that they can use to market their subscription trials and offers. The Economist will soon launch a daily podcast, along with many of the other publishers we have talked to.

Advertising and sponsorship will help cover the costs of these enterprises but the real aim is to help The Economist bring in a new generation of subscribers. Innovation work will focus on contributing to new voice search providing audio responses on key questions in areas where The Economist feels it has particular expertise.

Telegraph Media Group

telegraph.jpg

Type: Newspaper background, UK

Monetisation: Mainly marketing at this stage. Money can come later

The Telegraph has invested significantly in voice, hoping to get ‘first mover advantage’ by embedding the brand into emerging morning routines.

An early flash briefing driven by automated feeds was badly received by audiences and was replaced by professionally produced news bulletins in September 2017, read by journalists with a broadcast background. ‘Having a team that has these skills is valuable,’ says director of video and audio Robert Owers. He says it helps the ‘whole organisation think differently’.

The bulletins are currently about two minutes long and are updated four to six times each day. A version with visuals was added for the Amazon Show. Given the strength of the BBC in the UK, the team is now considering making the briefings more focused on the big story of the day to showcase the Telegraph’s strength in politics and original reporting. ‘We can’t compete with the broadcasters. It doesn’t make sense for every brand to be doing the same thing,’ says Owers. A technology brief was added in 2018 partly due to a wider strategic focus on this vertical and they have also experimented with pop-up briefings (for the Rugby World Cup) but found building awareness hard in a short period of time.

In terms of monetisation, the Telegraph can see potential advertising as well as subscription revenue, but believes that building sticky products should be the priority right now. ‘I do believe the money will follow’, says Owers. ‘Everyone knew that it would be slow at the start, I think personally as soon as it moves to cars that breaks down a lot of the barriers for people and there is another reason to speak rather than pressing buttons.’

Sky News

sky-news-logo.png

Type: Commercial broadcaster, UK

Monetisation: Advertising focus

Sky News has built its reputation on breaking news in TV, radio, and online and the flash briefing format fits well with this brand. Sky’s round the clock operation and broadcast infrastructure has allowed it to deliver four updated briefings for News, Sport, Showbiz, and Business – each of around 90 seconds.

Sky would like to offer greater depth, explainers, even conversations around big subjects like Brexit and climate change, but their data suggest that audiences are mainly looking for short, sharp, snappy information in response to very specific commands. Senior product owner Hugh Westbrook says the main limitation right now is that the platform is forcing users to learn its language – rather than the other way round: ‘It will only get good when it understands the person and learns how they talk and think – so it gets it right more often.’

ARD, Tagesschau

Type: Broadcaster, Germany

Monetisation: None (public service)

Public broadcaster ARD’s Tagesschau 100-second news briefing comes pre-installed on Alexa devices in Germany. ARD says around 250,000 unique browsers have access each month (September 2018), with each browser accessing the bulletin, on average, around five or six times in that period. We did not survey brand usage in Germany, but ARD’s publisher data suggest that around 50% of those consuming news on Amazon devices may be accessing the Tagesschau bulletin13 – again confirming the power of the default. The frequency data are also consistent with our survey findings, which suggest that only a minority access bulletins daily.

Using new devices like the Echo Show, ARD is looking to offer screen-based controls to allow users to move forwards and backwards through a bulletin. Christian Radler, editor of strategy and innovation at ARD sees voice as both interesting and disruptive but feels that more complex interactions remain extremely disappointing.

It is a new way of accessing content, of understanding context and another way of moving from a broadcast world to a one-to-one personalised world. But we are a long way from that.

T-Online

T-online.jpg

Type: Digital born, Germany

Monetisation: Brand building and learning for now

T-Online is one of the biggest news websites in Germany, now owned by an advertising and marketing company (Ströer Group) which has built a new Berlin newsroom with a focus on more original journalism and better experiences.

Florian Harms, editor in chief t-online, is a huge advocate: ‘I believe in voice technology because it’s the easiest and most natural way to retrieve information’. He has installed Alexa devices in the newsroom and in the washrooms to help change culture. During the World Cup the washroom walls displayed tips on ways to engage with the devices to get staff playing and learning. T-Online has brought in a former ARD broadcaster to help create and develop native audio experiences.

Currently the main focus is on a flash briefing for Amazon devices. This was conceived as an audio briefing – everything you need to know in two or three minutes. It takes its agenda and name from the email newsletter – Daybreak – that Florian Harms puts out himself. Production is outsourced to a Leipzig radio station and the edition is available at 6am every weekday. The numbers are still small but growing. As the graph shows, the majority of usage comes as part of early morning routines, with a peak at around 7am.

T-Online news usage by time of day

26381.png

T-Online’s ownership structure means that there is little pressure for immediate financial return. Learning what works in voice is extremely valuable to a company that makes its money from digital marketing and advertising.

Zeit Online

 

Type: Newspaper background but digital-born approach

Monetisation: Limited advertising and brand marketing (cross-promotion)

Die Zeit has been running a short daily news update for over a year, which generates significant traffic both as a podcast, and via the Alexa platform. With ARD’s Tagesschau providing regular news updates, Zeit’s briefing is focussed on background and original perspectives on the news. Most of the bulletin is finished by 9pm the previous evening and contains around five or six short stories per episode. It tends to include one funny or unusual story – and this serendipitous approach has gone down well in audience testing. They are about to launch a news quiz skill (app) for Alexa – essentially an audio version of the popular website quiz.

Die Zeit is looking to counter declining returns from online advertising. In this respect they see audio and voice as interesting because of the levels of attention that is being generated and because these platforms are not subject to ad-blocking. Their research also shows that audio could be a way of building bridges to the main website and magazine brands. User testing shows that many podcast listeners do not currently use Zeit Online so the focus right now is using voice to market other services. Advertisers in Germany are holding back from going all-in on podcasts but this may shift as audiences grow. Only a small percentage (under 5%) of listening to popular Zeit podcasts come via smart speaker at the moment.

What is Stopping Publishers Investing More in Voice?

From talking to publishers, we find four key barriers at this stage:

  1. Lack of resources for innovation.
  2. Lack of a clear path to monetisation.
  3. Problems of discovery and awareness.
  4. Lack of usage data to guide development.

Legacy newspaper groups are finding it particularly hard to find new money for innovation without a clear business case. Previous investments in podcasting, online video, or VR have not always worked out and news organisations tend to be wary these days when the platforms come calling with the next big thing. ‘If you are a broadcaster then it is a no brainer to get into this early but it is much more difficult for us’, says Christian Bennett, global head of video and audio at the Guardian. Spinning up new audio teams is expensive and Bennett is yet to be convinced that there is an audience big enough to start making bespoke content. Part of the problem is that platforms have not been sharing data on the amount of news usage on the platform and that makes him suspicious:

No one showed me any figures anywhere. If a new platform is good they will sing from the rooftops and I’m not seeing that right now.

Talking to publishers, we heard many complaints about lack of data that could help them make rational decisions. As one example, platforms have been pushing for content to be created for their new screen-based devices, they say, but are not providing any information on how many of these devices have been sold. While understanding that there are some constraints around data sharing, one leading publisher said they currently had no demographic data on their own usage and no details of frequency. They also wanted benchmarks from platforms, so they could understand relative performance.

We’ve been told that our content performs better than expected but what does that mean? We see the number of plays, but is that a good number? We don’t have context for the data we have.

In terms of monetisation, publishers are aware that the platform is still developing but want their requirements in this area to be prioritised. Subscription-based publishers, for example, are asking for ways to seamlessly enable premium services on smart speakers. Others would like some kind of payment from platforms in return for content to help build usage, not least because few believe that selling advertising around short-form voice content can ever deliver financial return. This is why podcasts or slightly longer daily briefings are currently favoured options, even if these are not the most always the most native way to use these technologies.

Discoverability of news content remains a critical barrier for others. With broadcasters dominating default positions, the rest are finding it hard to raise awareness of new products. Most discovery is currently happening off-platform but invocation terms (the way things have to be asked for) are sometimes different across Amazon, Google, Apple, and Samsung platforms. This makes it complicated to promote a new offering and drive usage. If platforms can’t create better on-boarding – such as using pre-existing signals about brand preference to flag relevant content – there will be little incentive for smaller players to create that content in the first place.

There are also technical concerns related to the complexity of generating content for different devices. There is little tooling to help media companies create and distribute across platforms. To help with this, Der Spiegel is working with a range of other publishers (the Guardian, El Pais, the Volkskrant, Le Monde, NZZ, FAZ) to make this interface cheaper and quicker.

We want to build an infrastructure for audio content and text to speech solutions, that help us getting into this voice ecosystem much more efficiently.
(Stefan Ottlitz, head of product, Spiegel Group)

Using money from Google’s Digital News Initiative (GNI) they are looking to develop:

  • Tools for cross-platform content creation.
  • Standards of metadata to describe content that is designed to be read out.
  • Tools to help improve the pronunciation of emerging news terms for synthesised voice outputs.
  • Easier ways to optimise text content for voice.

Broadcasters may be Disrupted More Quickly

Broadcasters have many of the same concerns as former newspaper publishers but the constraints are often different. With changes in audio listening in full swing, voice is much more central to company strategies.

Our hunch is that voice is a disruptive technological change, much as the mobile phone was or the internet itself.
(Mukul Devichand, executive editor, voice + AI, BBC)

The BBC is investing in product teams, editorial innovation teams and in understanding how media companies can interact with Artificial Intelligence (AI) platforms at a deeper level. Constraints are related to internal culture and defining products that can break away from linear broadcast thinking. The wide remit of the BBC may make it a little harder to focus than audio focused organisations such as NPR and Swedish Radio (SR). These companies are probably the most forward thinking in terms of rethinking linear radio.

In Sweden, the biggest barrier has been the time it has taken for voice platforms to arrive in the first place. Google Home devices only launched in Autumn 2018 – and there are no Amazon devices for the Swedish market. Even so, take up in this early adopter market is expected to be extremely rapid.

Broadcasters are also worried about their increased dependence on platforms. Public service broadcasters, in particular, have traditionally had privileged access to their channels and programmes. In the world of voice, there is only one answer to any given question, making optimisation for voice critical in a more competitive audio landscape. This is causing some broadcasters to agonise over the naming of popular programmes. ABC has a programme AM that doesn’t work well on these devices – and there are also frequent naming clashes (many programmes called the same thing) where the platforms have to use a range of signals to make the decision. ‘I think it changes everything and organisations like ours need to be aware of this’, says Stuart Watt, head of distribution at ABC News. ‘As soon as you move to voice, as opposed to touch, as the main interface between the people and the platforms, you are ceding any opportunity to make a decision because the machine is going to have to make the decision for you.’

Platforms will not just be critical gateways for accessing content but also for onward journeys and next step listening. These features have yet to be built out and could further dilute direct connection, attribution, and monetisation by news organisations. This is another reason why broadcasters and print publishers alike would prefer to build up and promote their own destinations on these platforms.

6. Future Developments and Conclusions

Given the lack of data from platforms this research has provided much needed evidence about current usage patterns of voice activated speakers – for news but also more widely.

Device penetration is growing fast across countries, driven by the relatively low cost and the simplicity of hands-free interfaces. Early adopters appreciate the speed with which they can now access existing music and radio services as well as news, weather, and traffic updates. Others find it fun as well as functional, with smart assistants like Alexa welcomed into family conversations and interactions. Voice has helped some older groups access the internet for the first time, whilst a number of our respondents talked about finally feeling in control of technology rather than the other way round.

But significant problems and barriers still exist. Most users are only using a handful of functions, with little desire to learn more. Beyond initial set up, there is little attempt to configure or personalise these devices. Users also report the clunky nature of these early interfaces and frustration in getting the technology to understand the way they talk rather than expecting the reverse to happen. More complex outputs are also painful to access in voice and may ultimately be better expressed in combination with a screen of some kind. For non-users, in particular, there are concerns about the amount of personal data that could be collected by technology companies, along with fears about looking foolish when talking to computers. Among all three countries in our study, non-users are balancing these concerns with the promise of convenience, utility, and fun that they hear about from those who already have them.

Consumer trade-offs

26432.png

In terms of news, our research suggests a rather mixed picture in terms of current usage and future potential. Beyond passive uses of these devices to play radio or specific podcasts – which is largely replacement activity – native interactions with news are generally short and not particularly frequent. While around a half have used the news function, only about a quarter are asking for the news on a daily basis and just 1% say it is the most important feature. Many of those we talked to felt the existing news updates were too long and wanted a minute or less. From a consumer’s point of view, this focus gives them more control and wastes less time, but publishers have a different agenda. They want longer dwell times, the ability to promote further content, and to sell adjacent advertising. None of this looks easy via native voice interfaces right now, which is why some news publishers see voice as an existential threat.

On the other hand, we did also find some latent needs around depth, serendipity, and interaction. A few respondents were already enjoying longer personalised audio programming that they had assembled from multiple brands. Some of our respondents also showed interest in prototypes that might answer deeper and complex questions around the news. Although few were using podcasts through smart speakers, many said that they would be very interested to access them more easily in cars or via headphones and mobile phones. As voice develops beyond the home, there will be more opportunities to distribute and monetise longer and richer audio content. Indeed, we already know that early adopters of these devices consume more audio, more frequently, than ever before.

So What Should Publishers Do Today?

So how can publishers capitalise on these trends and insights? What is the right time to invest? And how much of a priority should it be today?

For broadcasters it is clearly critical that they make their streams and podcasts as accessible as possible. NPR’s revelation that almost 20% of online radio listening is now coming from these devices shows how important these devices already are today. Broadcasters can also leverage their infrastructure and audio skills in the news update space relatively cheaply to secure the majority of news update listening. But beyond this, it is possible that the traditional ways in which radio news is made – and the tone of that content – may need to be rethought. NPR’s partly automated, personalised wheel of news is one option. More frequent, bite-sized conversational content may be another. Experimentation will help us find out.

Meanwhile, publishers from a newspaper background are hampered by the lack of resources to invest in innovation, and are wary of providing content to platforms without a clear path to monetisation. The lack of good data around news use and frequency is another significant issue holding back investment in native voice experiences. Against that background, it is not surprising that many are following the New York Times in investing in longer, more traditional podcasts and audio briefings – though they should be aware that the supply of content is growing as fast – or faster – than demand.

In terms of innovation, it may be that newspaper publishers are in a better position to break away from traditional audio conventions. Success is likely to come from ‘differentiated experiences’, as the New York Times puts it. This could be mass products or niche ones, but will need to fit the context of the consumer, rather than be an offcut of some other process. Local media could consider short but useful interactions around events, travel, weather, or news, while national publishers could look at owning and monetising a specific topic niche or using the social nature of these devices to create events or games. Answering questions around a specific area of expertise is likely to be challenging in the short term, but could also deliver significant value over time. Assistive and interactive experiences are likely to take time to catch on. Here, as the BBC is discovering, experimentation will be key. In Children’s and in News they are starting with simple interactions and then learning what works through iterating a series of prototypes.

The Role of Platforms

Voice remains a critical focus for technology platforms. Platforms believe that it will change the way we interact with the internet and that in turn will disrupt and change their business models – be that e-commerce, advertising, or hardware sales. The tech companies’ vision for voice goes way beyond smart speakers to become embedded in every device, supporting and anticipating user needs. For consumers, it provides a new and often more convenient way of signalling our intent than a traditional computer mouse or mobile touchscreen.

How news media will be affected by these changes is another question. It should enable quicker and more seamless access to all kinds of content that we already know exists but raises challenges around how new content and choices might be surfaced.

Relations between publishers and platforms are already strained, but the issues of discovery in voice are likely to be even more fraught in a world where the algorithm needs to decide which ‘one answer’, which ‘one brand’, to return. Voice search optimisation and answer engine optimisation are likely to enter the news media lexicon over the next few years requiring new skills and new ways of describing (and tagging content). As consumers look to platforms to answer more questions, what will be the role of publishers? Will aggregators like Google and Amazon take most of the value or will they be prepared to share the spoils with those creating the content? Core issues around data, attribution, monetisation, and fairness are re-emerging in a new context. It remains to be seen if this time platforms and publishers can find ways to resolve these tensions in more constructive ways.

References

Adobe Digital Assistants Report: https://www.slideshare.net/adobe/adi-state-of-voice-assistants-113779956

Edison Smart Audio Report 2018 for National Public Radio (NPR): https://www.edisonresearch.com/the-smart-audio-report-from-npr-and-edison-research-spring-2018/

Juniper Research Report, November 2017: https://www.juniperresearch.com/press/press-releases/amazon-echo-google-home-reside-over-50pc-us-house

Newman, N., Fletcher, R., Kalogeropoulos, A., Levy, D. A. L., and Nielsen, R. K. 2018. Reuters Institute Digital News Report 2018. Oxford: Reuters Institute for the Study of Journalism.

Differentology, Qualitative Research findings, for Reuters Institute, September 2018. https://www.digitalnewsreport.org/risj-future-of-voice-qual-report-slides/

Differentology, Final Quotes tables, for Reuters Institute, September 2018. https://www.digitalnewsreport.org/final-quotes-tables/

Appendix: List of Interviewees

Positions held at the time of the interviews

UK

Mukul Devichand, Executive Editor, Voice + AI, BBC

Hugh Westbrook, Senior Product Owner, Sky News

Ana Jakimovska, Director of Product, the Guardian

Christian Bennett, Global Head of Video and Audio, the Guardian

Robert Owers, Director of Video and Audio, Telegraph Media Group

Louise Curtis, Product Manager for Offsite Platforms, Telegraph Media Group

Chris Gathercole, Head of FT Labs, Financial Times

Alastair Mackie, Commercial Development, Financial Times

Remy Becher, Head of Digital Innovation, The Economist

Nick Cohen, Global Head of Video Products, Reuters

USA

Carla Zanoni, Editor, Audience and Analytics, Wall Street Journal

Leandro Oliva, Engagement Editor, Wall Street Journal

Dan Sanchez, Editorial Lead for Voice, New York Times

Kourtney Bitterly, Research & Development Lead, Voice, New York Times

Elizabeth Johnson, Senior Editor, Projects and New Initiatives, CNN

Joel Sucherman, Vice President, New Platform Partnerships, NPR

Germany

Stefan Ottlitz, Head of Product, Spiegel Group

Matthias Streitz, Editorial Chief of Product, Spiegel Online

Marc Krüger, Audio Editor, T-Online

Florian Harms, Editor in Chief, T-Online

Holger Wiebe, Head of Editorial R&D, Zeit Online

Christian Radler, Editor for Strategy and Innovation, Tagesschau, ARD

Rest of World

Simon Gooch, Chief Innovation Officer, Swedish Radio

Tomas Granryd, Head of External Collaboration, Swedish Radio

Craig McCosker, Product Strategy Manager (CUI, Bots & Future Focus), ABC

Stuart Watt, Head of Distribution, ABC News

Platform Interviews

Steve McLendon, Product Lead , News on Assistant, Google

Anneka Sharpley, News Partnership Manager, Google

Laura Doward, Publishers Marketing Lead, EMEA, Google

About the Author

Nic Newman is Senior Research Associate at the Reuters Institute and lead author of the Digital News Report, as well as an annual study looking at trends in technology and journalism. He is also a consultant on digital media, working actively with news companies on product, audience, and business strategies for digital transition.

Acknowledgements

The author is particularly grateful to media companies and experts for giving their time to share insights for this report in such an enthusiastic and open way. Particular thanks, also, to Peter Stewart for his early encouragement and for his extremely informative daily Alexa ‘flash briefings’ on the ever changing voice scene. The author is also grateful to Differentology and YouGov for the professionalism with which they carried out the qualitative and quantitative research respectively and for the flexibility in accommodating our complex and often changing requirements. The research team at the Reuters Institute provided valuable advice on methodology and content and the author is grateful to Lucas Graves and Rasmus Kleis Nielsen for their constructive and thoughtful comments on the manuscript. Also thanks to Alex Reid at the Reuters Institute for keeping the publication on track at all times.

Published by the Reuters Institute for the Study of Journalism with the support of the Google News Initiative.

Please note that Google played no role in selecting the subject of this report, nor were they consulted about the research design or approach. Google, Amazon and Apple were all given the opportunity to participate in the report, to provide data and talk about future plans. The analysis and content of the report is the responsibility of the author and the Reuters Institute.