I test AI for a living and these are the 5 most amazing AI tools of the year
A lot of changes can happen over a year, especially in the world of technology. 2023 is no different. These past 12 months will go down as the year of AI, where new models, products and use cases for generative AI were unveiled and the entire tech sector changed forever.
I have tried a wide range of artificial intelligence products from image generators to video production tools. I’ve played with apps that make music from a simple text prompt and others that can turn one voice into another in seconds.
Despite the flurry of new releases, new models and even other chatbots, for me, ChatGPT is still my AI app of the year. Until we see what Gemini Ultra can do for Google Bard, GPT-4 still outperforms all other artificial intelligence models by some margin. Here's why.
Why is ChatGPT my app of the year?
While it might seem like the easy and obvious choice, what makes ChatGPT the app of the year is how big of an impact it has had on the entire AI sector. Just over a year ago, there were little more than tentative steps toward commercialized generative AI in mainstream products. Today we’ve got chatbots integrated into Windows 11, image generation in Photoshop and a way to make a song out of nothing in a web browser. That is all in part due to the success of ChatGPT.
OpenAI didn’t rest on its laurels with ChatGPT. Over the past year, it has been upgraded multiple times. It has turned from a research experiment into a useful product and become multimodal.
You can now give ChatGPT a photograph and have it describe the contents, write a poem about and even produce an image to illustrate that poem. ChatGPT also has a new voice mode that allows you to interact with it much like you would Alexa or Siri — but with intelligent responses.
Best AI Tools 2023 — my top picks
There were so many choices this year. This includes apps like Otter.ai that take note-taking to a new level. Let's not forget the various Meta AI tools embedded in Instagram and Facebook, along with open-source AI models like Mixtral that can run on your device.
In the end, there were a handful of genuine standout moments, services and models in the world of AI this year that weren't ChatGPT. Here's the list.
Runway for AI video generation
The only program that had me question whether it should be my AI tool of the year over ChatGPT was Runway's Gen-2. This multimodal video AI model came out in June and using it for the first time triggered a similar “wow” response in me as I felt using ChatGPT.
Other commercial and non-commercial video AI tools have started to emerge in the second half of the year, including the impressive Pike 1.0 from Pika Labs, Stable Video Diffusion from StabilityAI and Meta's Emu. However, for the simple fact it hit early, hit hard and was impressive out of the gate, I have to give it to Runway.
ElevenLabs for human-like voice AI
While it might not get as much attention as the more flashy image, video and text generation, ElevenLab's ability to create impressively natural-sounding synthetic voices and clone a voice from minutes of audio is a stand-out.
Text-to-speech software isn't a new idea. We've had synthetic voices and "read-aloud" features for years. What ElevenLabs achieved was to make those voices sound so human that you can barely tell they are synthetic. The company also introduced a new feature that converts voice to voice. Essentially you speak and it makes it sound different using its AI voices.
Other tools like Hey Gen get a special mention in this category. They demonstrate the potential for live and video translation. It can create an avatar from nothing, complete with artificial voice or translate one voice into another while keeping the original tone and accent.
MidJourney for hyper realistic images
The best AI image generator is a crowded space but I give it to MidJourney because it's stayed at the top of the game. Its images have a style and flare that others are only just catching up to and v6 can add images to pictures.
Even with the prompt to make it photorealistic, other image generators still struggle with a degree of artificiality or cartoonish overtones. MidJourney seems to be able to get as close to replicating reality as I've seen.
Honorable mentions go to the multitude of models built on top of Stable Diffusion. An open-source technology built in part by Runway and funded by StabilityAI. SDXL 1.0 is close to MidJourney in quality, Turbo can create images in real-time and adaptions from firms like Leonardo take it to a whole new level of performance.
Claude 2 for large context chat
Anthropic’s Claude 2 is a chatbot that doesn’t get the kudos it deserves. It has a massive context window, has impressive reasoning and creativity skills and can take large files as input and analyze the contents in seconds.
The company was also one of the first to put efforts into the concept of Constitutional AI. This is a concept where large AI models are given explicit values determined by a constitution rather than from human feedback.
Under the feedback approach, model behavior was guided by responses from human contractors comparing output and selecting the one that was more helpful or more harmless. Anthropic's approach defines a set of principles for the AI at a high level that it uses to make judgment calls.
Other chatbots may be better at different tasks. Bard can interact with other Google products and even analyze a YouTube video. Pi is quick, has impressive reasoning abilities and Llama is free and open source but the approach to safety, context size and reasoning give Claude 2 an edge.
StabilityAI for investment in open source
My final pick, although this list could have been significantly longer, is StabilityAI. Not a specific model in itself but a company focused on bringing together various aspects of generative AI tools into one platform.
Its standout investment is in various upgrades to Stable Diffusion, through the Turbo model, adding video capabilities and improved quality. However, the company also has 3D, audio and text generation models in its library.
StabilityAI's text model Zephyr is small enough to install on a laptop and offers rapid, well-reasoned responses to queries without sending any data to the cloud. Its image models are accessible and free to use for non-commercial purposes on local machines, or licensed for external products.
This approach is why in the future, we may all be using models built, licensed or in some way invested in by StabilityAI without even realizing it. A video editor that lets you add clips from text may use Stable Video Diffusion, text summarizing in a homework app might be using Zephyr and a future audio editor might be incorporating a version of Stable Audio.