The large multimodal language model, GPT-4, is ready for prime time, although, contrary to reports circulating since Friday, it doesn’t support the ability to produce videos from text.
GPT-4 can, however, accept image and text input and produce text output. Over a range of domains — including documents with text and photographs, diagrams, or screenshots — GPT-4 exhibits similar capabilities as it does on text-only inputs, OpenAI explained on its website.
That feature, though, is in “research preview” and won’t be publicly available.
OpenAI explained that GPT-4, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
For example, it passed a simulated bar exam with a score around the top 10% of test takers. In contrast, GPT-3.5’s score was around the bottom 10%.
Leaps Over Past Models
One of the early users of GPT-4 is Casetext, maker of an AI legal assistant, CoCounsel, which it says is capable of passing both the multiple-choice and written portions of the Uniform Bar Exam.
“GPT-4 leaps past the power of earlier language models,” Pablo Arredondo, co-founder and chief innovation officer for Casetext, said in a statement. “The model’s ability not just to generate text, but to interpret it, heralds nothing short of a new age in the practice of law.”
“Casetext’s CoCounsel is changing how the law is practiced by automating critical, time-intensive tasks and freeing our lawyers to focus on the most impactful aspects of practice,” Frank Ryan, Americas Chair of DLA Piper, a global law firm, added in a press release.
OpenAI explained it had spent six months aligning GPT-4 using lessons from its adversarial testing program, as well as ChatGPT, resulting in its best-ever results — though far from perfect — on factuality, steerability, and refusing to go outside of guardrails.
It added that the GPT-4 training run was unprecedentedly stable. It was the company’s first large model whose training performance it was able to predict ahead of time accurately.
“As we continue to focus on reliable scaling,” it wrote, “we aim to hone our methodology to help us predict and prepare for future capabilities increasingly far in advance — something we view as critical for safety.”
Subtle Distinctions
OpenAI noted that the distinction between GPT-3.5 and GPT-4 could be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold, it explained. GPT-4 is more reliable and creative and can handle more nuanced instructions than GPT-3.5.
GPT-4 can also be customized more than its predecessor. Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, OpenAI explained, developers — and soon ChatGPT users — can now prescribe their AI’s style and task by describing those directions in the “system” message. System messages allow API users to customize their users’ experience within bounds significantly.
API users will have to initially wait to try out that feature, however, since their access to GPT-4 will be restricted by a waiting list.
OpenAI acknowledged that despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable. It “hallucinates” facts and makes reasoning errors.
Great care should be taken when using language model outputs, particularly in high-stakes contexts, OpenAI cautioned.
GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake, it added.
T2V Absent
Anticipation for the new release of GPT was stoked over the weekend after a Microsoft executive in Germany suggested that a text-to-video capability would be part of the final package.
“We will introduce GPT-4 next week, where we have multimodal models that will offer completely different possibilities — for example, videos,” Andreas Braun, chief technology officer for Microsoft in Germany, said at a press event on Friday.
Text-to-video would be very disruptive, observed Rob Enderle, president and principal analyst at the Enderle Group, an advisory services firm in Bend, Ore.
“It could change dramatically how movies and TV shows are created, how news programs are formatted by providing a mechanism for highly granular user customization,” he told TechNewsWorld.
Enderle noted that one initial use of the technology could be in creating storyboards from drafts of scripts. “As this technology matures, it will advance to something closer to a finished product.”
Video Proliferation
Content created by text-to-video applications is still basic, noted Greg Sterling, co-founder of Near Media, a news, commentary, and analysis website.
“But text-to-video has the potential to be disruptive in the sense that we’ll see lots more video content generated at very low or almost no cost,” he told TechNewsWorld.
“The quality and effectiveness of that video is a different matter,” he continued. “But I suspect some of it will be decent.”
He added that explainers and basic how-to information are good candidates for text-to-video.
“I could imagine that some agencies will use it to create video for SMBs to use on their sites or YouTube for ranking purposes,” he said.
“It will not be good — at least at first — at any branded content,” he continued. “Social media content is another use case. You’ll see creators on YouTube use it to crank out volume to generate views and ad revenue.”
Not Fooled By Deepfakes
As was discovered with ChatGPT, there are potential dangers to technology like text-to-video.
“The most dangerous use cases, like all tools like this, are the garden variety scams impersonating people to relatives or attacks on particularly vulnerable persons or institutions,” observed Will Duffield, a policy analyst with the Cato Institute, a Washington, D.C. think tank.
Duffield, though, discounted the idea of using text-to-video to produce effective “deepfakes.”
“When we’ve seen well-resourced attacks, like the Russian deepfake of Zelenskyy surrendering last year, they’ve failed because there’s enough context and expectation in the world to disprove the fake,” he explained.
“We have very well-defined notions of who public figures are, what they’re about, what we can expect them to do,” he continued. “So, when we see media of them behaving in a way that’s aberrant, that doesn’t comport with those expectations, we’re likely to be very critical or skeptical of it.”