The Publisher's Ultimate Guide to Generative AI

Dan’s Take

Dan Benyamin, CEO, Aeon

There is significant apprehension about the potential of Generative AI (GenAI): technologies that can create new content such as text, images, and video. It represents a transformative force, probably as impactful as the smartphone or even the internet.

While we can only begin to understand the ways in which AI can impact our lives, I believe it offers some of the greatest opportunities to augment human creativity and overcome the challenges of democratizing information and communication.

I am a Tech Optimist.

At Aeon, our point of view isn't about perceiving AI as a threat to employment or industry, but rather as a catalyst for unprecedented business opportunities that never existed before. Specifically, we are excited about the applications of AI in video space - creating videos that would not have existed without AI.

What could you do with that?

Sell premium advertising, make digestible ‘teasers’ of your content, use it to drive more traffic to your site, enhance your commerce initiatives, or maybe make an entirely new streaming video channel for your brand. Or perhaps have an AI-powered video assistant tailored to your customer’s needs? The possibilities are endless….

This guide attempts to provide a concise yet in-depth look at AI’s history, impact, and potential changes it may bring to publishers, media companies, and brands.

-- Dan Benyamin, CEO Aeon

Introduction

The integration of Artificial Intelligence (AI) across various industries has marked a revolutionary shift in content creation and media consumption, significantly impacting sectors like publishing, entertainment, and digital media.

Over the past decade, the publishing industry has seen a steady advancement in AI technologies with major media outlets adopting automated systems to produce content. Beginning with routine reports like weather forecasts, sports recaps and financial reports, automation has gradually extended to a wider array of creative tasks.

In the year 2020, the spotlight shifted on Generative AI (GenAI) models, where computers arguably began to exhibit creativity. Capable of processing and emulating human language using advanced machine learning algorithms, these models became adept at deciphering patterns in vast, unstructured datasets. They could analyze millions of images, books, and articles to generate original, human-like content based on textual inputs. However, despite impressive achievements, the use of AI remained restricted to a few in academia and the industry.

This changed with the release of ChatGPT in November 2022 which marked a pivotal moment, significantly boosting public interest in AI. The chatbot’s user-friendly interface brought the power of GenAI to the masses. In mere 5 days, it surpassed the million user mark - a feat that Instagram, the next fastest app at the time, took 2.5 months to achieve.

Time taken by popular apps (as of Nov 2022) to hit one million users (Source: Statista)

While ChatGPT made AI more accessible, its high quality output led influential publishers like Nature Publishing Group and PNAS Journals to reconsider their editorial policies and take a cautious stance towards AI. But there was no looking back. Two months since launch, ChatGPT crossed the 100 million user mark propelling OpenAI’s revenue to over $1 billion and fueling a surge in venture capital investments that reached $40 billion in the first half of 2023, even by publishers such as Axel Springer.

Source: Delloitte insights

While GenAI has had (or will have) an impact on almost every industry imaginable, its impact on the world of entertainment and publishing has been particularly transformational. In a short time, it has empowered legions of creators, disrupted industry workflows, and sharply amplified challenges to intellectual property, trust, and ownership. It is bringing about a revolution in innovation and changing the way content is produced and consumed.

As per a report released by Market.us, the size of GenAI in the Media and Entertainment Market is poised to cross $1,743.6 million in 2024 and is likely to attain a valuation of $11,570 million by 2032 - a significant increase from the $1,158.5 million revenue recorded in 2022. This translates to a Compound Annual Growth Rate (CAGR) of 26.3% between 2023 to 2032. Text-to-Image Generation segment, which was valued at $299.3 million in 2022, is projected to soar to approximately $2,644.9 million by 2032. Similarly, the Image-to-Image Generation segment is witnessing substantial growth while offering cost-effective solutions for high-quality visual content creation in entertainment.

GenAI is revolutionizing content creation, consumption, and experience enabling creators to streamline repetitive tasks, enhance audio and visual effects, and offer personalized and interactive experiences to their audiences. Adept at analyzing extensive data sets, recognizing patterns, and generating content aligned to individual preferences, GenAI has elevated productivity to levels previously thought unimaginable.

Likewise, AI's role in music generation is reshaping music production, allowing for the creation of genre-specific compositions and custom playlists. Video Generation through AI is also revolutionizing video production, offering efficient alternatives to traditional methods in various sectors like marketing and education. 3D Modeling and Animation is another segment leveraging AI to enhance the creation of realistic 3D content, particularly in gaming and film production.

History of Gen AI

What Is Generative AI?

Generative AI, or GenAI, involves machine-learning (ML) models that create new content such as text, images, music, animation, or code. These advanced GenAI models have evolved from large foundation models like deep learning, trained on extensive and varied unstructured data like text and images, covering a wide range of topics. They process vast quantities of human-created content through self-supervised learning, enabling them to imitate human creativity.

Previously, AI in areas like image recognition focused on teaching computers to identify content within images. However, GenAI has shifted the focus to generating images, garnering significant attention. For instance, in January 2021, OpenAI introduced DALL-E, a model that transforms text descriptions into visual art.

An early example of publishers using GenAi, the cover of The Economist (Source: The Economist)

GenAI applications utilize a type of machine learning model called Large Language Models (LLMs). These models, inspired by the neural architecture of the human brain, treat words and their components as points on a multi-dimensional grid, aiming to determine their relational distances to predict the next likely word in a sequence. LLMs employ unsupervised learning, meaning they improve autonomously. As they are fed more data, LLMs enhance their capabilities, producing increasingly sophisticated text or more accurate and relevant visuals.

GenAI for Text

The use of AI in text generation belongs to a field of AI known as Natural Language Processing (NLP) that focuses on the interactions between computers and natural languages, specifically how to make computers process a large amount of natural language data and ultimately understand even the contextual nuances.

From word processing softwares detecting spelling mistakes to Google predicting search words before they are keyed-in, to the most advanced ChatGPT, all of them have their roots in NLP, beginnings of which dates back to the mid-1950s when computer science pioneers Alan Turing and John McCarthy proposed early models of computation, hinting at machines mimicking human intelligence.

A significant early example of GenAI in text was the ELIZA chatbot, developed in 1961 by Joseph Weizenbaum, which simulated a psychotherapist and could interact in natural language. Although primitive by today’s standards, ELIZA was groundbreaking for its time and laid the foundation for future advances in NLP.

While the 1960s and 70s saw advancements in computer vision and the development of advanced systems such as the MYCIN for diagnosing bacterial infections, by late 1970s, the AI buzz died down due to a lack of funding.

It was only in the late 90s and early 2000s that the rise of the internet led to an explosion in data while processing chips simultaneously got cheaper and powerful creating perfect conditions for the computationally intensive field of AI. This is where the field witnessed a resurgence. Numerous milestones were achieved in the field of machine learning, neural networks, and deep learning that forms the foundation of today’s AI models.

Some of the foundational advancements during late 90s and early 2000s paved way for the rapid rise of AI in the past decade (Image source: Towards Data Science)

2017 saw the rise of a new type of deep learning model called the transformer. First described in a paper titled "Attention is All You Need" by Ashish Vaswani, a team at Google Brain, and a group from the University of Toronto, the release of this paper is considered a watershed moment in the field of GenAI, given its widespread applications.

Capable of translating text and speech in near-real-time, transformers have become fundamental to NLP, and found applications in a wide range of tasks. Most of the text-based GenAI tools including ChatGPT have benefited from transformer models and their ability to predict the next word in a sequence of text, based on a large, complex data set.

GenAI for Image

Early versions of image GenAI included statistical models like Hidden Markov Models and Gaussian Mixture Models, which found applications in optical character recognition. The development of neural networks in the 2000s further improved pattern recognition capabilities and allowed for making predictions without explicit programming.

While Concurrent Neural Networks (CNNs) were commonly used for NLP and speech recognition, Convolutional Neural Networks (ConvNets) were developed for classification and computer vision tasks.

Prior to CNNs, manual, time-consuming feature extraction methods were used to identify objects in images. CNNs provided a more scalable approach to image classification and object recognition tasks, leveraging principles from linear algebra, specifically matrix multiplication, to identify patterns within an image. However, they were computationally demanding and required graphical processing units (GPUs) to train models.

In 2014, Ian Goodfellow and his colleagues at the University of Montreal developed a deep learning architecture known as Generative Adversarial Network (GAN) that transformed the generative capabilities of AI. The key idea was to train two neural networks to compete against each other to generate more authentic new data from a given training dataset. For instance, GAN could generate new images from an existing image database or original music from a database of songs.

This approach proved successful in generating high-quality pictures and has since been used in a variety of applications including generating images of faces, landscapes, and objects. GANs have also been adapted for generating detailed images from rough sketches, transferring the style of one image to another, or replacing fragments of an image with a desired object.

The combination of Variational Autoencoders, a form of neural network, and Transformers led to the development of models like DALL-E which are capable of generating high-quality images from textual descriptions. This period also saw the emergence of diffusion-based models, which involve the process of adding and then removing noise from images to create new visuals.

Today, GANs power the underlying deep learning architecture for advanced image generation apps like Midjourney and DALL-E.

GenAI for video

Generating coherent video introduces a whole new set of challenges for AI models. They must model implied physics, interpolate motion, preserve identities and spatial relationships across frames to avoid glitching characters and backgrounds - Not an easy task.

The best way to learn about the history, origin, and future of generative video technology is to look at companies positioned at the vanguard of advancements in the field - companies like Runway AI, Pika Labs, and Stability AI (Stable Diffusion Video). Operating at the forefront, these startups have made significant advancements and pack promising potential:

Runway AI

Established in 2018, Runway AI focuses on developing AI-powered video-editing software. Its tools are used by TikTokers and YouTubers as well as mainstream movie and TV studios. In 2021, the company collaborated with researchers at the University of Munich to build the first version of Stable Diffusion, which was later mainstreamed by Stability AI.

In early 2023, the company released Gen-1, an AI model to transform existing videos into new ones using text prompts or reference images. The model was capable of creating claymation puppets from street clips or turning books on a table into a cityscape at night.

While Google’s VideoPoet and Meta’s Make-A-Video offer similar solutions, Gen-1 offers a step up in video quality, not just transforming existing footage, but also producing much longer videos. Runway has built its model with customers in mind while interacting closely with a community of content creators, filmmakers and VFX editors.

Runway is backed by Google and Nvidia and has partnered with Getty Images - the largest repositories of paid stock imagery and editorial imagery in the world. CEO and Co-founder Cristóbal Valenzuela expresses strong optimism regarding the future of GenAI video.

Pika Labs

Headquartered in Palo Alto, CA, Pika Labs aims to lower the barrier for producing captivating animations. The company offers both, casual creators and professional studios, new ways to unlock imagination.

Using a specialized video generation framework called ParticLE (Particulate Luck Engine), which is built on top of Stable Diffusion, Pika provides tools to break down text prompts into semantic and visual particles which it then sequentially maps to video frames. This modular particle-based approach allows for greater coherence as scenes transition. Early benchmarkings have indicated a 32% performance boost compared to previous methods.

Pika promises to democratize video creation by unlocking several new possibilities such as 5x faster animating storyboards, the ability to rapidly produce VFX shots, accelerating studio workflows (for example, using text-to-video generation to block out scenes and character motions) and giving authors the ability to bring pivotal literary scenes to life for eventual movie adaptations. The model has the potential to enhance social content through unlimited idea springboards, engaging GIFs, dancing avatars, scenic timelapses and more which are mere prompts away.

While generative video AI will open new creative possibilities long before perfect photorealism is achieved, in the next 5-10 years, the technology holds the potential to convert studios into "automated animation factories''. The future could be centered on "AI-Assisted Filmmaking" in which generative algorithms amplify creativity rather than replace animators outright.

Pika Labs promises potential and has raised $55 million within the first six months of pre-seed and seed rounds. Its prominent investors include Elad Gil and Adam D’Angelo (Quora), Andrej Karpathy and Clem Delangue (Hugging Face) and Alex Chung (Giphy).

Stability AI’s Stable Video Diffusion

Stability AI is the world’s leading open source GenAI company that began its mission of democratizing AI in 2019. The company has since amassed a community of more than 300,000 creators, developers, and researchers around the world.

A latest addition to the company's impressive AI portfolio is Stable Video Diffusion (SVD), an image-to-video model that generates videos by animating existing images. It is one of the few video-generating models available as open source and is designed for applications in various sectors, including advertising, marketing, TV, film, and gaming.

SVD offers frame interpolation for 24 frames-per-second video output and includes safety and watermarking features. The model can generate 2 seconds of video, comprising 25 generated frames and 24 frames of FILM interpolation, within an average generation time of 41 sec. It features motion strength control and supports multiple layouts and resolutions (including 1024x576, 768x768, and 576x1024)

As of Nov 2023, SVD was available only for research and did not support textual control. Its limitations included short video length (less than 4 seconds), a lack of perfect photorealism, and a lack of camera motion (except slow pans). The model also lacked the capability to generate legible text or render faces. While still in its early days, Stability notes that the model is quite extensible and can be adapted to use cases like generating 360-degree views of objects.

Unlike other rivals, Stability has been through rough times. In April 2023, Semafor reported that Stability AI was burning through cash, spurring an executive hunt to ramp up sales. Earlier, Forbes had reported that the company repeatedly delayed or has outright not paid wages and payroll taxes which led AWS, Stability’s partner for computation needs, to threaten to revoke access to its GPU instances.

What can we learn from companies in GenAI video?

Startups like Runway AI, Pika Labs, and Stability AI are at the forefront of revolutionizing video creation. They represent the significant initial steps towards more accessible, efficient, and creative video production.

As evident from the evolution of AI, the initial steps have always been slow. It took almost six decades for computers to accurately recognize text, but a mere six years to generate them with human-like precision. There comes a pivotal moment when progress is sudden and disruptive. AI in video is fast approaching that imminent juncture.

There is no doubt that the future of video is going to be increasingly automated where AI-assisted processes will aid filmmaking and content creation driving a dramatic shift in how visual content is produced and consumed.

AI in Media, Advertising & Entertainment

Machines have been helping the media, advertising and entertainment business for years: the Associated Press (AP) began publishing automated company earnings reports in 2014. The New York Times uses machine learning to decide how many free articles to show readers before they hit a paywall.

Businesses are using Gen AI in existing marketing and advertising workflows to generate copy and images in less time, personalize campaigns, and respond to and learn from customer feedback. Research by McKinsey suggests that a fifth of sales-team functions could be automated using AI.

The entertainment industry has become increasingly diverse and complex. It encompasses sectors such as film, television, music, gaming and sports with fading boundaries that used to define these segments. But, they all share one common objective - providing captivating content that can be monetised.

AI in entertainment can help achieve that objective by addressing three key business challenges: enhancing content creation and production, personalizing audience experiences, and improving monetisation.

Let's look at some of the specific use-cases of AI in the Media, Advertising and Entertainment:

Data Analysis

AI-driven analytics tools have become increasingly easy and less costly to implement, while offering ever-accelerating complexity and speed that far exceeds human capacity. This has led to an explosion of “usable” data (data that can be used to formulate insights and suggest tangible actions) and accessible technology (such as computation power and open-source algorithms).

AI coupled with company-specific data and context now provide consumer insights at the most granular level. Winning B2B companies now use AI to go beyond account-based marketing and use hyper personalization in their outreach.

Similarly, publishers discern trends in reader preferences and behaviors to gain insights that lead to more engaging content and improved return on investment (ROI). In advertising, AI is transforming ad delivery - platforms like Google Ads and Facebook Ads use machine learning to tailor ads to user interests, resulting in higher engagement and ROI.

Content Creation, Optimization and Personalization

With its ability to analyze customer behavior, preferences, and demographics, gen AI can generate personalized content and messaging allowing more targeted marketing and sales campaigns. Some of the examples include hyper-personalized follow-up emails at scale and contextual chatbot support.

AI can also act as a 24/7 virtual assistant for team members and customers alike - offering tailored recommendations, reminders, and feedback, resulting in higher engagement and conversion rates. For example, when a new customer joins, gen AI can provide a warm welcome with personalized training content, highlighting relevant best practices.

Another use case is in operational cost reduction, as AI can significantly cut down manual labor and reduce workloads. For example, editing aids like Grammarly or Jasper, empower even smaller teams to achieve greater output. Publishers can use such tools to cover a wider array of topics and connect with niche audiences with minimal investment.

In journalism, as research and writing become increasingly augmented by GenAI models, a safe approach for journalists is to evaluate the content for accuracy, style, and comprehensiveness before finalizing the article for publication. Once finalized, GenAI models can additionally be trained on regional data to hyper-localize the message for readers through language translation, dialect intricacies, latest news integration, and more:

GenAI can increase journalist’s productivity by automating portions of the content lifecycle (Source: Deloitte)

Research and Fact-Checking

The practice of using AI for factcheck started during the days of the cold war in the 1960s, when the U.S.’s NSA and Britain’s GCHQ explored early AI to transcribe and translate enormous volumes of Soviet phone-intercepts. Technology was immature back then, but today its applications go far and wide.

AI-assisted fact-checking can now spot faked images, check disinformation against trusted sources and identify social-media bots. It can even block cyber-attacks by analyzing patterns of activity on networks and devices or fight organized crime by spotting suspicious chains of financial transactions.

The Nuclear Threat Initiative, an NGO, recently showed that applying machine learning to publicly available trade data could spot previously unknown companies suspected of involvement in the illicit nuclear trade.

Likewise, journalists have several fact checking tools at their disposal with capabilities to detect coordinated misinformation campaigns, automatically detect emerging narratives and coordinated patterns across social media and analyze web sentiments and trends in real-time to prevent spread and virality of misinformation.

DMINR - a joint collaboration between the Department of Journalism and the Centre for Human-Computer Interaction Design at University of London aims to blend journalistic expertise and routines with the many opportunities AI technologies offer. Another research project called “new/s/leak” aims to support journalists, allowing them to quickly and intuitively explore large amounts of textual datasets such as war diaries or the embassy cables. Such projects exemplify AI's role in investigative journalism.

Implementation Challenges

To be fair, the challenge in implementing AI isn’t new, but it’s an increasingly pressing one. For media companies, AI introduces ethical dilemmas, forcing them to balance technological experimentation with maintaining public trust, and upholding legal rights.

A 2023 survey by the World Association of News Publishers revealed that 49% of the respondents were actively working with GenAI but 85% of them were concerned about the inaccuracy of information and the quality of content:

Source: WANP

Smaller media houses in particular face financial and technical limitations which can hinder AI integration. Quality concerns can arise when AI-generated content inherits bias or inaccuracy from training datasets leading to public criticism. Such issues require careful management and can potentially lead to legal concerns, including data security and copyright issues.

AI Video for Publishers in 2023: Exciting but Uncertain Times

We now turn our focus to the video space. As we have noted earlier, image and video generation hold the promise of nearly limitless creative possibilities. It has rightfully captured the attention of creative departments and IT teams alike. Looking back at this year, however, we can see how multiple challenges have made the nascent business models of AI companies less appealing in the eyes of publishers.

How Publishers perceive the impact of AI Video

Rapidly evolving technology and a shifting landscape rife with uncertainty poses a formidable challenge in predicting the impact AI can have on video production. A good place to start however, is by taking a pulse check on how video creators and producers feel.

A survey of video creators and producers by Kapwing reveals some interesting insights:

Positivity in the future of AI

A significant 79% of those surveyed express at least a cautious optimism regarding the future of AI in the video sector. Conversely, a smaller portion, 16%, are apprehensive:

Source: Kapwing

Growth challenges for AI in video production

The majority of participants identified either quality or accuracy as the primary obstacle they faced while integrating AI into their video production workflow. In the context of this survey, 'quality' was interpreted as adherence to brand standards, while 'accuracy' meant the content did not necessitate further review or fact-checking.

Source: Kapwing

How good are today’s GenAI video tools?

Do generative AI video tools live up to their promise of creating videos from text prompts? The majority opinion suggests they do not.

A significant 74% of users feel that the current capabilities of GenAI video tools fall short of their expectations. Despite this, there is a sense of optimism about the future possibilities of GenAI, with many excited about its potential. In stark contrast, only a minor 6% believe GenAI is overrated.

Around 25% of the users acknowledge that GenAI tools largely meet their expectations. Yet, a mere 6% of these individuals release their AI-generated videos without any manual modifications. This indicates that even satisfied users anticipate the need for some manual intervention before finalizing their videos.

Source: Kapwing

Broader challenges to AI’s success

Though impressive, in its current form, GenAI Video must first overcome several challenges to overcome quality issues, some of which are:

Data Limitations and Bias: AI algorithms require large and diverse datasets for training. However, the availability of high-quality and unbiased video data can be limited. Biased training data can lead to perpetuation of stereotypes and unfair representation in generated content.

Bias is complicated - sometimes obvious, other times more nuanced. For example, quantifying how often skin tones and perceived genders appear is one of the clearer signals, but more difficult to measure are religious accessories or types of facial hair, that contribute to the overall bias encoded in GenAI outputs.

Complexity of Content: Videos often involve intricate details, such as facial expressions, natural language, and context. Current AI models struggle to capture such complexity, real-world scenes and emotions accurately.

Ethical and Legal Concerns: The use of AI-generated content raises ethical questions, particularly in cases where the generated content is deceptive or harmful. The ownership and copyright of AI-generated videos are complex legal issues that require clarification.

A survey conducted by the Pew Research Center shows increasing concern among US adults about the role of artificial intelligence in daily lives.

Source: Pew Research Center survey

Computational Resources and Energy Consumption: Training advanced AI models for video generation demands substantially more computational power and energy when compared to text and images. This raises concerns about environmental impact and accessibility for firms with limited resources.

Lack of Fine-grained Control: Current AI video generation techniques lack fine-grained control over the generated content. This limitation hinders the ability of creators to achieve specific artistic or storytelling objectives.

Achieving Realism: While current AI-generated videos are significantly far from being photo-realistic, progress has been steady. However, generated videos sometimes fall into the “uncanny valley,” where subtle discrepancies can make the content appear unsettling to human viewers.

Value Proposition: Is the Juice worth the Squeeze?

A look back at our survey culminates where we are quite clearly: even with all the high powered computing power, the most common use case for AI in video is the humble subtitle. Scriptwriting, brainstorming, audio editing, and voiceover are close runners-up.

Source: Kapwing

Indeed, these are time consuming, thankless, and wholly uncreative tasks. The value proposition is quite anticlimactic. Is the value of AI really cost savings due to eliminating entry level, offshore contract work?

We believe that 2024 will usher in the real change: where we start to see to what Deloitte calls an “AI-first” approach, giving rise to a new media value chain model where all aspects of the creative processes commence with GenAI and conclude with human refinement. To quote: “Companies that can effectively harness the wealth of data generated through media consumption–through an interactive content/data feedback loop–will carve out a competitive edge. This future emphasizes the importance of understanding and responding to audience behaviors and preferences in real-time, effectively targeting and interacting with a segment of one.”

2024: Generative AI means business

In 2024 we'll see the main value proposition - expediency and human-like thinking/creativity - applied to new revenue streams, not merely cost reduction. This will usher in a new sequence of innovation that allows for the bulk creation of content but addresses the concerns of 2023: namely, accuracy and quality.

Three Business Drivers: the dollars that fuel innovation

We anticipate the emergence of a new generation of tools specifically tailored for the creation of video content for publishers. The significance of video in enhancing publishers' business is undeniable. It ranges from providing premium ad inventory to building a more engaging shopping experience. Lets breakdown these main business drivers:

1. Advertising: Selling Ads alongside Videos for every Article

People love video. YouTube’s 2 billion plus monthly users consume over 694,000 hours of videos each minute, most of whom watch videos daily. Likewise, American adults spend over 4.43 billion total minutes per day on TikTok. On Facebook, video drives 59.3% of the ad-clicks vs just 29.6% for Images:

Source:TargetVideo

However, for most publishers not named Youtube, there exists this huge dichotomy between what people want and what they find on the web today. Over 80% of the web traffic to top publisher sites consist of video, but just 16% of publisher pages have a video on them.

This disparity can be primarily attributed to the cost and time-intensive nature of video production, which ranges from 10 to 100 times higher than that of written content, depending on the method of measurement.

Thankfully, this is commensurate with video’s value as well. Video ad CPM (Cost Per Mille) rates are also generally higher than traditional display advertising, with earnings from $5-$30 CPM for desktop ads in Tier 1 countries. Directly selling ads to advertisers and hosting videos on-site is the most lucrative approach, allowing publishers to retain 100% of revenue.

While video advertising’s lucrative rates are not new to publishers, it may get much harder to achieve them. In August 2022, the IAB Tech Lab updated its Ad Formats Guidelines for Digital Video and CTV, aiming to enhance transparency in the industry.

The amended guidelines changed the definition of in-stream video to include video that is sound-on and plays before, during or after streaming video content the user has requested.

Consumers, and consequently advertisers and ad agencies, are no longer willing to invest top dollars in a subpar video experience. Top dollars, $20 CPM or higher, are for instream ad placements that users proactively click that play button with sound on; these are videos the user actually intends to watch.

Unfortunately for consumers, such videos are in short supply. IAB estimates that over 90% of the videos on the internet do not qualify as high quality. The days of spammy video are over – only quality video can bring in those high CPMS.

Can 2024 be the year that AI can deliver that?

2. Content Discovery and Syndication: Video as a traffic driver

Recent findings highlight a striking 157% boost in organic traffic from Search Engine Results Pages (SERPs) attributable to video content. Given the shift towards visual search platforms like Instagram and TikTok, particularly among younger demographics, optimizing video content for these platforms is crucial for future-proofing SEO efforts.

AI can be effectively utilized to harness SEO and search data for the creation of video content. By monitoring search trends throughout the year, businesses can tailor their video topics to match user interests, effectively addressing specific queries and demands. This, of course, means being able to produce video as quickly as creating written content, which is something generative AI could be particularly well suited for.

In the realm of video marketing, a common mistake to avoid is neglecting the promotion and optimization of content post-creation. Active socialization and ongoing nurturing of the video are crucial, and these processes can be automated with the aid of AI.

3. eCommerce: Video drives sales

Research has shown that 69% of the customers prefer watching a video over reading text to learn about products and services while using video on a landing page has shown to increase conversion rate by 80%.

According to a study by Wyzowl, 84% of people say that they have been convinced to buy a product or service by watching a brand’s video. AI generated video can help e-commerce businesses create personalized and engaging videos for their customers, based on their preferences, behavior, and feedback. AI generated video can also help optimize product listings, increase conversions, reduce returns, and improve customer loyalty by delivering relevant and compelling content at scale.

Generative AI: A checklist for Publishers

With the challenges laid out, but the opportunity for huge gains with GenAI clear, what should publishers and brands look for when researching AI tools for their business? We believe it boils down to the following:

AI tools need to be “on Brand”

First and foremost, if AI is generating content for your brand, you should treat it like any other member of your content team. That means any tool you choose should have the ability to take in a brand kit: brand guidelines, logos, style guides, and more. In addition, for individual projects, a tool should be able to understand creative briefs and business goals needed for your success.

Aeon’s software features an extensive Brand Guidelines section where you can upload an entire style guide

AI tools need to QA their work

Speed means nothing if you can’t use the content. While watching ChatGPT studiously march out words on the screen faster than you can type brings a smile to most, what good is that speed if you have to rewrite everything? Any tool you use must demonstrate it takes the steps to get things right. Does it provide citations for information that it sources from third parties? Alternatively, does it provide confirmation that it only uses primary sources? Does it QA its own work and provide a record of it doing so?

Aeon’s software during a final QA step, ensuring a high quality output

AI tools need to learn your business

No one gets it right the first time, whether that be humans or machines. The goal, of course, is to be able to effectively learn over time. Does your AI tool do that? Does it have both explicit feedback (for example, when a human gives a ‘thumbs down’ to content) as well as implicit (when videos get skipped, does the AI learn it probably wasn’t very good)? Brands need to think about training their AI systems the same way they train their employees: it is an investment that pays many-fold over time.

When editing a video in Aeon, editors can directly provide training feedback that is utilized for future production

Final Thoughts

It’s expected that 90% of online content may be synthetically generated by 2026. If that forecast proves accurate, AI is set to profoundly impact the publishing and content creation industry as a whole. Newspaper editors, TikTok creators, designers, writers — everyone within the creative space will feel the impact of this scaling in some way or or the other.

What are the steps between here and there? We think the sequence looks something like this:

Today: Text Generation, Image Generation, Video Generation
AI Video Production: Web page to video production at scale
Personalized AI: Personalized video production based on consumer data
Realtime AI: Realtime, interactive video experiences

As mentioned earlier, we will start seeing solutions that take us from mass communication to niche communication, and eventually individual, interactive communication.

The advent of AI is poised to reshape job markets, emphasizing the need for journalists and publishers to upgrade their skills. As the integration of AI advances, publishers will be compelled to embrace transparent and ethical strategies, necessitating the establishment of dedicated task forces to navigate the transforming landscape.

The publishing industry’s business aspirations must align closely with ethical values to safeguard reputational risks. That being said, publishers will be expected to leverage AI to maintain authority, explore reader behavior, reach wider audiences and above all, remain competitive.

The Publisher's Ultimate Guide to Generative AI

In This Guide

Dan’s Take

Introduction