Emad Mostaque, unleashing AI
The founder of Stability AI’s vision for open source AI, and the many challenges that come with it.
Written by Meghna Rao
Photography by Tex Bishop
In one scenario, Emad Mostaque’s startup, Stability AI, which builds and funds open source text-to-image artificial intelligence, opens a Pandora’s box. Suddenly, everyone can train their own model based on Stable Diffusion (at least, everyone who has access to a computer with enough GPUs and the money to procure hundreds of thousands of images). The technology creates horrors that our current brains can’t even envision: existing bigotry is amplified, stretched, distorted; deep fakes run rampant and become more believable than what we ourselves post and verify; funhouses are created around people that confirm their biases endlessly and divide us all; the labor dynamics of the world are rapidly shifted and people are left without jobs and with no plan B.
In another scenario, Mostaque is like Prometheus unleashing fire; a house might burn down and a shirt might catch on fire, but this new, flickering technology can cook, warm up homes and make winter walks tolerable, light up dark rooms and keep things moving at night, serve as inspiration for stories of all sorts. This second is the scenario that Mostaque is betting on and the one that he describes for me. What he thinks is most important about this scenario is not just the invention of a text-to-image deep learning model; it is ensuring that no one will own the fire.
“The only people who can build this are the big tech guys and us,” Mostaque says, and when he says “us,” he’s referring to people like himself: A well-capitalized entrepreneur who was running his own hedge fund by 23. “And do you really want this all-powerful technology in the hands of the big tech guys?”
Mostaque shows up to our Zoom chat in a T-shirt and unkempt hair. It’s not exactly the vision of someone who’s comparing his startup to Prometheus’ fire, but he’s just finished celebrating Thanksgiving. “My wife is American,” he says. “American wife, American life.”
I’m surprised that Mostaque doesn’t look a little more stressed; like anyone remotely online, I’ve been seeing the news. The day before we’re set to speak, Stable Diffusion 2.0 has just come out. Mostaque has explained publicly that it’s merely an improvement to the existing model, with improvements like higher resolution images than were available before. They’ve announced that they’ve done nothing to create preference around certain results; all they’ve done is moved from CLIP, a model released by OpenAI that was not open about what influenced its dataset, to LAION, which is open, and was funded by Stability AI.
"A woman and the moon in the style of Aubrey Beardsley," made by Shreeda Segan on Hugging Face, using Stable Diffusion
Still, observers and a loyal community that spans from r/stablediffusion to Discord and elsewhere start to notice that they can no longer generate NSFW (not safe for work) images from prompts; others find that this change has influenced how accurately humans turn out. They also find that they can’t use artist’s styles anymore to influence their drawings — a big change after what they had been able to do to their creations before. They wonder if Stability AI is not quite what it stood for, that actually the startup is more like the corporate, gatekept entities that it criticizes and less like a mirror held to the existing structures of the world that Mostaque has spoken about before.
Mostaque responds that, yes, they’ve removed NSFW, primarily to reduce the possibility of people using it to portray children. And the artist thing has to do with the changed datasets, he adds. And any individual can train Stable Diffusion 2.0 on datasets that work for them, whether that’s the work of Thomas Cole or a range of nude artists (see, for example, Unstable Diffusion, which is raising $25M to train its model with 75M high-quality NSFW images — which will still exclude children).
Still, people aren’t happy. The year is 2022 and releasing a new technology is nothing like releasing Google Search in 1998; we’ve all had years of experiences with tech; we all have hopes, dreams, visions for what it might look like, what it can do for us, how it enters our worlds.
“From heroes to villains in just over a month,” writes one HackerNews commenter about Stability AI. Another suggests the team has choked the model in its infancy before it can grow up to become what it should be.
“I get shit from all sides,” Mostaque tells me, but he’s smiling, as if he’s happy people care enough to give feedback and prod the model in the right direction. “But people online are 1% of it. I want to build for the next 99%, and I know this is the right move.”
A week later, a surprise announcement: Stable Diffusion 2.1 is out. “The filter dramatically cut down on the number of people in the dataset and that meant folks had to work harder to get similar results generating people,” the blog states. With 2.1, they’ve kept the NSFW filters on but they’re less aggressive, the blog explained, leading to better fine-tuning of people.
The comments roll in, some positive, a few negative, a handful apathetic. It is a reminder that for a company that is replicating humans in the digital sphere, there are a lot of people involved in the development of Stable Diffusion.
Of course, there is Stability AI’s community, those on Hugging Face and r/stablediffusion and Discord and elsewhere, not a buzzword-y, Silicon Valley community, but an actual group of people who are invested in and interested in what Stability AI is building, who push back on what it builds and are angry and critical, but also hopeful and watchful.
Then, there are the actual people involved in the project. From its own corpus, Stability AI funds and works with teams to create the projects it wants to see out in the world, including Harmonai, which is working on open-source generative audio tools that Mostaque says will release soon, and OpenBioML, which is working on the open space between machine learning and biology. There are popular consumer-facing concepts like DreamStudio, which lets people generate images for a small cost in the vein of DALL-E and others, and more in-progress ones, like CarperAI, a research team that is building a large language model (LLM) that will follow human instructions, similar to OpenAI’s InstructGPT, a sibling model to its wildly-popular ChatGPT.
The difference between CarperAI and what OpenAI has built, however, is that Stability AI is partnering with a range of organizations: Eleuther AI and Multi are applying LLMs to automation to train the model; Hugging Face, ScaleAI, and Humanloop will fine tune them. And eventually, the model itself will be open source.
In the future, says Mostaque, those communities might even include a whole subsidiary just for, say, India or Kenya, individual country-based AIs that are trained on datasets like academic papers and national broadcasts.
And then, there are the countless companies being built on top of Stable Diffusion. Here, it’s important to investigate Stability AI’s open source moniker; when one drills down to the legalese, it becomes clear that the company is not exactly what one assumes from the outside. In fact, it is run on the Creative ML OpenRAIL-M license, which is “permissive.” That makes it different from historic precedents in open source like, say, Linux, which require that anyone building on it remain open source (except for certain proprietary software); the permissive license merely asks that credit is given where it’s due.
This makes Stable Diffusion fertile ground to build on top of. The recently-launched and now-viral Lensa, for example, is built on Stable Diffusion and is making a reported $1M+ in revenue per day shortly after launching.
And yet, people are people; organizations clunk heads and disagree in ways that can often be much messier and slower than interacting with deep learning models.
One small example is October’s fiasco with RunwayML, which, in part, helped develop the core model behind Stable Diffusion 1.0. Rumors circulated around the release; for a little bit, many believed that Stability AI had asked Runway to take down their version for IP reasons; later screenshots annulled these rumors; others put the blame on Runway, calling them a bad actor.
What is of note about Stability AI is that it is extremely expensive to run. Currently, Mostaque’s dream is fueled by capital from his own pockets, a $100M check led by Coatue, and what he describes as an almost-nonsensical lack of fear. Mostaque spent $600,000 of his own money to build the computers for launch; he tells me that the super computer itself costs $5M a month and implies that there are many other places he’s funneling his own money. He also has big plans for the future, including building the fastest supercomputer in the world.
Mostaque tells me that he plans to make money by creating his own mansion on Stable Diffusion’s fertile ground; they’re starting by focusing on Bollywood, the world’s second-largest film industry. Stability AI recently signed a deal with Eros International, one of India’s largest production companies.
“Philosophically and ethically, I’m a little different from the Silicon Valley mindset,” Mostaque says. “People view technology as a tool they can use to achieve objectives. I view it as something that can really improve our systems, and it’s better to give it away in those areas and make the money in very specific areas.”
This, even today, is Mostaque’s core focus — figuring out how to make the world’s systems work so that Stable Diffusion can thrive. He describes himself as a mechanism designer. His team and his community build the tools; he just figures out the rules that need to be in place, how to make people interact, how to get the money for the supercomputers and the data, and how to catalyze a community to get the outcomes he wants.
Inevitably, the question of ethics comes up; there’s no avoiding it. Artificial intelligence, and more importantly, artificial general intelligence — when an AGI can learn on its own — have been steady characters in dystopian, utopian, you-name-it versions of the future, ever-present in our imaginaries. A long list of people have questioned Mostaque’s position, his experience.
In this race to the market, are we forgetting to dot our i’s and cross our t’s along the way? Will we regret this? Who can we even consult?
At that question, Mostaque points to himself, a child born in Jordan, where he spent about 30 days, after which he returned to Dhaka, Bangladesh. At age seven, he moved with his family to the U.K.; his father took a job as a business lecturer. In the U.K., he “slowly learned a British accent.”
When Mostaque was 19 and on a student trip to the U.S, he met his wife, then 18. Mostaque has been diagnosed with Asperger's and ADHD. His eyes light up when he tells me about organizing the British Independent Film Awards and acting as an independent film reviewer.
He lives in London now, ran his own hedge fund by 23, and has approached one major theme in the expanse of his career: He believes that the world is broken because people lack agency — and all of that can be fixed by handing them the right tools.
“I think agency makes people happier,” he says. “I look at children today. The stuff they look at all day feeds on them. Creation beats consumption and I think we need more of that.”
He’s approached the problem of agency from different angles. In his 20s, he began to feel that certain small factions of the Islamic world were starting to experience a crisis that was pushing people to the extremes. Much of this crisis, he believed, was fueled by misinformation and a lack of better options. He created several online forums for Muslim communities. Later, he began to develop what he describes as “Islamic AI,” a guide that would help those who were looking for the next steps on their religious journey to healthy, agency-filled paths.
“That is my ethics,” he says. “That is considered ethics just as much as anything else. Most of the world is invisible right now. Even our colleagues are invisible because they can’t communicate, they can’t create. A tool like Stable Diffusion will change that.”
To criticisms about artist ownership, Mostaque points to those like Interdependence’s Holly Herndon, who has built Spawning, which lets artists remove themselves from training sets.
Where his beliefs seem most sure is that he is looking for an agency-driven solution to an agency-driven problem.
As of November 2022, the team is 120 people. Mostaque’s worst fear is slowing down. “Most companies commit suicide as they scale,” he says. “They don’t get outcompeted. I want us to be Google back in 2012 forever, and I want to make sure that we don’t fall apart or get regulated out of existence.”
"A frog dancing on the beach in Udupi," made by Meghna Rao in Playground.AI
After I finish speaking with Mostaque, I meet my father. I try to describe to him the person I just spoke to, but he can’t seem to grasp what I’m saying. So I pull up Playground.AI, which runs on Stable Diffusion. “Imagine anything,” I tell him. At first, he comes up with simplistic ideas; a bowl of fruits, a row of sneakers. Then, he thinks bigger. “A frog dancing on a beach in Udupi,” he says.
The image pulled up is funny, nonsense almost, but the beach in the background has the same, low curve as the beach he grew up near, the same way of rolling into the ocean as if its lines are being blurred.
And that brief moment, we really felt as if we were witness to a groundbreaking new technology, one that was ready to light a fiery path through the earth.