Less than half (47%) of media leaders interviewed by the Reuters Institute at the beginning of this year were confident about the prospects for journalism and publishing. Some immediate concerns were rising costs, declining ad revenue, and slowing subscription growth. However, a key challenge was embracing Artificial Intelligence (AI). 

If you're a publisher waiting to take this step, read on. This blog explores why 2024 presents unique opportunities and is the ideal time to leverage AI with significantly lower investments in time and resources than what was required just a few years ago.

New Models Emerge As Older Ones Evolve

Most of the current excitement in AI has been focused on two families of models: large language models (LLMs) for text and diffusion models for images. Both belong to the category of deep neural networks that go deeper (i.e., have more layers of neurons) than what came before and can churn quickly through reams of data.

If you're interested in gaining a deeper understanding, here’s a brief explainer video on neural networks, covering their layered structure and the underlying mathematics:

But what is a neural network? | Chapter 1, Deep learning

Large Language Models (LLMs) 

LLMs—such as GPT, Gemini, Claude, and Llama—are all built on what's known as a transformer architecture, which Ashish Vaswani and his team at Google Brain introduced in 2017. The key principle here is that of “attention”— an attention layer that allows the model to learn how multiple aspects of an input—such as words at certain distances from each other in text—are related to each other. Many attention layers in a row allow a model to learn associations at different levels of granularity—between words, phrases, or even paragraphs. 

Transformer-based models can also generate images. The first version of DALL-E, released by OpenAI in 2021, was a transformer that learned associations between groups of pixels in an image instead of words in a text. 

State-of-the-art LLMs are getting larger and better every day, even surpassing human capabilities in some of the benchmarks. 

For example, the first public version of ChatGPT (GPT-3) achieved an accuracy of just ​​43.9% on MMLU, a popular benchmark for evaluating language models' capabilities. The developers of the MMLU estimate that human domain experts could achieve around 89.8% accuracy. Google’s Gemini Ultra, its largest model built for highly complex tasks, has achieved a score of 90%. 

MMLU scores for various foundational AI models 

unnamed (8)-1(Source: Papers with Code - Multi-task Language Understanding on MMLU)

 

Diffusion Models

Because transformer-based models are prone to so-called “hallucinations”, they make up plausible-looking wrong answers, sometimes even with citations to support them. For the same reason, some of the images produced by early transformer-based models broke the rules of physics and were implausible in other ways, making it difficult to produce photo-realistic images. 

Diffusion models prove to be a better alternative. They work by progressively adding blur and noise to an image until it appears completely random. The model is then trained to reverse this process, reconstructing the original image using a technique called self-supervised learning.

This is similar to how LLMs are trained on text, instead of covering up words in a sentence and learning to predict the missing words, the neural network in the diffusion model learns how to remove increasing amounts of noise to reproduce the original image. As it works through billions of images, learning the patterns needed to remove distortions, the network gains the ability to create entirely new images out of nothing more than random noise.

Most state-of-the-art image-generation systems today use diffusion for image generation, though they differ in how they “de-noise” or reverse distortions. Some even combine the diffusion model with the transformer architecture to improve results. 

For example, the latest model, FLUX.1 by Black Forest Labs, a German-based company founded by researchers who developed Stable Diffusion, uses what the company calls a "hybrid architecture" combining transformer and diffusion techniques, scaled up to 12 billion parameters. 

unnamed (3)-1

An image generated using FLUX.1 by Black Forest Labs
Prompt: "A beautiful queen of the universe holding up her hands, face in the background." Source: Arstechnica

Recommendation Systems

It is rare to get a glimpse of the inner workings of recommendation AI models because most are closely guarded by the companies that build them. In 2019, Meta released details about its deep learning recommendation model (DLRM), showing three main parts. 

First, it converts inputs (such as a user’s age or “likes” on the platform or content they consumed) into “embeddings”. It learns in such a way that similar things (like tennis and ping pong) are close to each other in this embedding space.

The DLRM then uses a neural network to perform what is called a matrix factorization, which is similar to building a spreadsheet where the columns are videos and the rows are different users. Each cell says how much each user likes each video. But most of the cells in the grid are empty. In the third step, the recommendation system makes predictions for all the empty cells regarding how much each user will like each video.

Today, the same approach can be applied to advertisements, songs on a streaming service, products on an e-commerce platform, or even articles and videos on a publishing platform.

AI Models Are Going Multi-Modal

Early AI models were designed for single-mode functions, meaning they could only process one data type at a time, such as text, images, or voice. Most AI models today are capable of incorporating diverse data types. OpenAI’s latest GPT4-o, for example, can reason across audio, vision, and text in real-time:

unnamed (2)-1

Source: OpenAI

AI In Video Has Reached An Inflection Point

The use of artificial intelligence in video generation can be traced back to the early days of computer graphics and algorithms. However, the true transformation came with the rise of deep learning and convolutional neural networks (CNNs) in the 2010s. By harnessing the power of CNNs, AI systems could analyze and understand video frames, enabling applications like automatic video tagging, content moderation, and video summarization. 

Today, advanced models like OpenAI’s Sora can create videos by manipulating pixels and conceptualizing three-dimensional scenes that unfold in time. Sora combines both diffusion and transformer architectures together to create a diffusion transformer model to power features such as:

  • Text-to-video
  • Image-to-video: Bringing life to still images
  • Video-to-video: Changing the style of video to something else
  • Extending video in time: Forwards and backward
  • Create seamless loops
  • Image generation
  • Simulate virtual worlds like Minecraft and other video games
  • Create a video Up to 1 minute in length with multiple shorts

In addition to advancements in video generation models, segmentation models—capable of identifying and grouping pixels in images or videos to determine object boundaries—have also become increasingly sophisticated.

Meta’s SAM2, released last month, is capable of segmenting videos in real-time while performing extremely well in practical image and video-processing applications. These models can streamline and automate video editing and VFX workflows for publishers aiming to shift toward video content. This process was once time-consuming, skill-intensive, and resource-heavy.

At Aeon, we leverage foundational models to tackle real-world challenges in the publishing industry. The future of AI-assisted video generation holds immense promise. With our AI-powered article-to-video service, publishers can scale video content and streamline workflows while retaining the ability to:

Artificial intelligence will undoubtedly unlock numerous opportunities in the publishing industry, especially in the video space, where the power of video is yet to be leveraged at scale.

AI Chips Are Getting Better: More Power, Lower Cost

Microchips have brought increasingly smaller, more powerful, and more efficient electronic devices within reach of the masses. Following Moore’s Law, they have advanced to the point where the smartphone in our pockets is more powerful than the computer that landed a man on the moon or the supercomputers of the '80s. 

While Moore’s law may have hit a wall now, artificial intelligence and its vast need for power and speed are driving a new generation of microchip innovations. Nvidia’s latest chip, unveiled in March this year, can perform tasks 30 times faster than its predecessor. 

Unlike computation chips, AI chips, like those of Nvidia, can parallel process tasks. Also known as graphics processor units (GPUs) or “accelerators,” they were initially designed for video games. They break each computation into smaller chunks and distribute them among multiple “cores”—the brains of the processor—in the chip. This means a GPU can run computation tasks far faster than if it completed tasks sequentially. 

On June 2nd, Nvidia’s CEO Jensen Huang unveiled detailed plans for upcoming chips. He highlighted a 98% cost savings and 97% reduction in energy consumption with Nvidia’s technology while noting that these figures represent “CEO math, which is not accurate, but it is correct.”

Scientists are unlocking next-generation advances in microchip technology by using particles of light instead of electricity to carry data. For example, Boston-based startup Lightmatter uses light to multiply processing power and cut the huge energy demand for chips used in AI technologies. 

German startup Semron is developing a chip designed to run AI programs locally on smartphones, earbuds, virtual reality headsets, and other mobile devices. Generative AI and sustainable computing were among the key innovations highlighted in the World Economic Forum’s Top 10 Emerging Technologies in 2023

While Nvidia holds over 80% of the market share in AI chips, several tech giants have ambitions in this space (see figure below). Given the lucrative AI chip opportunities, it's only a matter of time before more competitors emerge and fight for market share.

unnamed (5)-1

Source: Forbes

Competition Is Good

For application developers and publishers alike, this is an ideal moment to leverage AI, as increasing competition will accelerate the democratization of access to advanced AI models and computing capabilities.

In May, the price of AI services in China plummeted after ByteDance kicked off a price war by pricing access to its LLMs at 99.8 percent below GPT-4. In the US, tech giants like Google, Meta, Amazon, and Microsoft have also been competing in AI by following the “blitzscaling” playbook that has become commonplace in Silicon Valley.

Today, Google Cloud sells Gemini Pro for $0.002 per second of video. That's about $10 to analyze a full-length movie and write a 150-page scene-by-scene synopsis!



Investment In AI Continues To Be Strong

Microsoft has invested a cumulative $13 billion in OpenAI, while Amazon has invested $4 billion in Anthropic. Over the coming years, the tech industry will spend a trillion dollars building out the artificial intelligence industry. 

No one spends a trillion dollars on something unless they believe in it — and Silicon Valley believes in the transformative economic potential of AI. In the US alone, AI startups raised $23 billion in capital in 2023, and more than 200 such companies around the world are unicorns — meaning they’re valued at $1 billion or more.

According to a recent Crunchbase report, AI startup funding doubled to $24.1 billion in Q2 of 2024:

unnamed (6)-1

Source: Crunchbase

All this money is, in part, a measure of tech’s confidence that the AI market will eventually prove titanically huge. One forecast by the consultancy PwC estimates that AI could add nearly $16 trillion to the global economy by 2030, chiefly from vastly enhanced productivity.

Today, tech giants Nvidia, Amazon, Google, Microsoft, and Apple are worth $14.5 trillion combined and comprise about 32% of the S&P 500. Nvidia’s data center revenue is growing at about 60%. If the pattern continues over the next decade, it alone will have a market cap of about $49 trillion, more than the combined value of all companies in the S&P 500. But it’s too early to say if Nvidia has the juice to lead Big Tech into the AI frontier in the long term.

One thing is certain: with massive investments in AI technology, competition will inevitably drive rapid innovation. This will create exciting opportunities for application developers and users, including digital media publishers, to experiment with AI and boost productivity at a fraction of the cost compared to what was possible just a few years ago.

Open Models Will Soon Rival Proprietary Models

Closed-source AI refers to proprietary and confidential models, datasets, and algorithms. Examples include ChatGPT, Google’s Gemini, and Anthropic’s Claude. Though anyone can use these products, there is no way to find out what dataset and source codes were used to build the AI model or tool. This risks undermining public trust and accountability.

In contrast, an open-source model offers transparency by allowing the public to explore the inner workings of these sophisticated models. Meta has taken up the fight for open-source AI in a big way. In recent weeks, it released a new collection of large AI models, including SAM2, its state-of-the-art segmentation model. 

Open-source models foster rapid development through community collaboration and enable the involvement of smaller organizations and even individuals in AI development. It also makes a huge difference for small and medium-sized enterprises, as training large AI models is prohibitively expensive.

One of the most crucial advantages of open-source AI for publishers is the ability to scrutinize and identify potential biases and vulnerabilities.

Conclusion

Publishers should see AI as not merely a curiosity but an essential part of their business strategy. Starting early and forming strong partnerships will be key to building a competitive advantage with AI. With several favorable factors aligning, 2024 is the year to take the leap and implement AI.