“AI Town” Deep Dive: The Good, The Bad, And The Ugly
By Jiao Dong
19 November 2023

This is the first post of the inaugural Founder’s Series! I’m Jiao Dong (董骥骜), CEO and co-founder of Nuva Lab. An engineer passionate about AI / AI Infra, content creation, gaming, and consumer tech.

Through this series, we aspire to provide insights that shed light on our product and share our story — where we started, where we stand now, and where we’re headed.

In this blog, you’ll discover:

Join our Discord to influence our roadmap and follow our Twitter to keep up with latest changes!

Who Am I

In my tech career, I started at Facebook, working on video products for content creators and launching their first ‘creator recommendation’ feature in a matchmaking platform. This led me to Facebook’s AI Infrastructure team, where I contributed to scaling large recommendation models, including deploying their first terabyte-sized model.

In 2021, I joined Anyscale to contribute to the open-source Ray project. Here, I focused on two main goals: maximizing industry impact on the machine learning (ML) ecosystem, and pushing the boundaries of large model scalability. My time at Anyscale involved spearheading collaborations for ML platforms and large language models (LLMs) with companies like Airbnb, Pinterest, Instacart, and Nvidia. A highlight of my work includes training a 175-billion-parameter LLM using 1024 A100 GPUs in partnership with Nvidia.

I’m also a heavy gamer. I was deeply hooked into Starcraft II through college and my early career. I was in the master’s league Zerg & Protoss with a peak MMR of 4500, and had won against professional players :)

Fun fact: I almost quit my job at FB after 1.5 years. The only thing that held me back was if I left, our Starcraft II team probably won’t make it to the playoffs. So I stayed for a few more months, we made it to the quarter finals, and at the end of the tournament I found the AI Infra org that I would love to join.

Starcraft II

The Inception Idea

The inception idea of Nuva Lab came from a seemingly silly but an experience that I have deep compassion for – wouldn’t it be super cool if my animal crossing villagers come alive with true human-like interactions ?

Animal Crossing

(My wife and I love that game, particularly through the pandemics. I can proudly say that I caught every single fish in the game with 100% completion!)

Meanwhile, from observations across GTC and GDC in March 2023, we realized that the intersection of entertainment and AI is extremely underserved – everyone believed that modern AI will revolutionize entertainment, but no one seems to have a clear idea of exactly how.

This realization was exciting enough to prompt the start of a new journey.

Initial Prototype V0 And V1

(prototype v0, March 2023, a simple AI-assistant guide, RPGMaker, GGML, Vicuna-7B)

Our first prototype was extremely simple; it was just a test to see if adding a Large Language Model (LLM) to an in-game assistant character would make the gaming experience more immersive. My wife even designed this character to resemble her, ensuring our first agent had a personal touch :)

Looking back, I think this example, though simple, was very engaging. It was the first time I had a guide that walked me through the game world, explained what I could do, and most importantly, responded to my inputs. What makes it more engaging was the “crafting” of the agent and seeing it alive responding to you later.

I also received intriguing validation from the RPGMaker forum. When I asked, “Hey guys, is there a component I can use to take free inputs from users?” the response was: “Why do you even need that?”. Apparently few people cared about tools built around free text input and response. You know you’re onto something interesting when you hear a response like that.

It was Vicuna-7B with GGML running on my laptop, but it opened up a whole new world.

(prototype v1, March - May 2023, an “AI Town”, full open source stack with in-house serving)

Excited by the success of prototype v0, I enlisted the help of a few friends to develop prototype v1. We envisioned it as a larger-scale “AI Town Simulation,” where villagers would come alive, each with their own personality and actions – my living version of “Animal Crossing.”

Note: As of March 2023, terms like “AI Agent” or “AI Town” weren’t popular, to the extent that we weren’t sure what to call our project. We tentatively named it “island Alohomora”…

The town design was straightforward: I created an island named “The Earth,” designed to challenge the common knowledge of existing LLMs. In this simulation, all villagers believed the Earth was round (an oblate spheroid, to be precise). However, there was one exception: “Adam.” He arrived on the island after a shipwreck, stumbled upon an observatory, and discovered that in our simulated world, the Earth was actually flat.

Adam’s job is to passionately engage in conversations with everyone, with a few key objectives: To pique their interest in exploring the shape of “The Earth.”

Again, we built this entirely using an open-source stack, focusing on two main questions:

Fortunately, the answer to both questions is a resounding “YES”. These sparks of discovery have become the cornerstone of Nuva Lab.

“AI Town” Deep Dive

We conducted numerous debugging sessions and playtests on this prototype with friends and family, and even occasionally with professional game level designers. We designed it for an imaginary user who would enjoy watching or interacting with the villagers and making a meaningful impact.

The Good

“You’re designing an experience in a way that I’ve never dreamed of… so I don’t know how my expertise applies here, except to encourage you to keep thinking from first principles.”

These words came from a professional game designer friend I spoke with. The experience is as open-ended as the free input text box of LLM. What should I type? How will they respond? What will happen next?

The room for imagination is vast, and the sparkles of magical moments genuinely make you feel, “Wow, I have never experienced this before.” That’s the most exciting part of this work.

The Bad

User Influence vs. Agent Autonomy

When a user enters your story, what role do they play? An observer? A citizen? Or perhaps … a god? There are products that cater to these roles, but it fundamentally boils down to how we perceive the user and the agents.

We believe that agents should be autonomous and purely AI-driven, living their own lives. Users, however, often like to feel “in control.” They want to clearly sense the influence they exert in the world or the experience they’re having.

An example of agent to agent communication shows why this is an underserved use case by chatbot LLMs: They never really know how to end the conversation, and very rarely challenge others. As a result, by default we would see almost never ending “bye and have a great day” among the agents and it often ends up being way too polite and AI-assistant like than necessary.

This leads to an interesting question: how should a designer balance the roles of “User” and “Agent” to create an engaging experience for users?

Would You Play Monopoly With Infinite Money?

Too many choices equate to no choice. This is the dilemma with free text input offering arbitrary, unexpected responses. People get overwhelmed. I’ve seen numerous user tests where users try a few simple inputs, then just hand the laptop back to me, saying, “Well… I’m not that creative; maybe you can show me how it works?” This indicates a UX design failure for consumers. There are ways to mitigate this, but it requires careful consideration and design.

Cost

During that experiment, we were still heavily focused on infrastructure. We thought that being able to support hundreds or even thousands of agents on a single GPU, by leveraging ideas like multiplexing LoRA, would be a unique edge for further scaling.

We did a number of extensive benchmarks with in-house serving, flattened the cost curve from token to GPU and were quite excited about how our AI Infra expertise can make hosting this town order of magnitudes cheaper and affordable.

Looking back, I think cost is indeed an important factor, as running large numbers of agents efficiently requires deep AI Infra expertise. However, it’s probably not the most pressing issue compared to others that need attention first. I firmly believe that the cost of LLM inference will decrease exponentially, and having “cost” as the main challenge is a sweet problem to have: that means people want your product, and the solution is well-defined on a good trajectory.

The Ugly

Controllability & Consistency in Experience Design

There’s a significant gap between “writing a cool blog post” and “building an engaging product that people love to use EVERY DAY.” The main issue with a random AI Town, from a user’s perspective, is that the likelihood of encountering a magical experience is incredibly low – I would say less than 1%.

We once jokingly said, “Well, we should go all in on this if the villagers start worshiping Adam as a new god due to his knowledge about The Earth!” However, this never happened, not even once in our thousands of simulations. This experience made us realize that LLMs, while powerful, are not as magical as ‘AGI’ – they’re potent but not powerful enough to fulfill everything we dream of.

We could cherry-pick the 1%, record a video, and hope it goes viral on Twitter. That might work for media purposes, but not for a product. Nobody wants to use a product where, 99% of the time, they feel extremely confused and question the creators’ competence.

Experience needs to be carefully designed. Thus, the approach of “throwing all AI Agents into a town and hoping for AI to work its magic” is a naive and irresponsible way to design a product, and we need to put more work into it.

This realization led to our next prototype version, which addresses these issues based on the learnings above.

What’s In The Next Blog Post

I hope you’ve enjoyed reading up to this point. In our next blog post, we will discuss our latest prototype, focusing on experience design, controllability, and consistency.

We are rolling out the next version of our product and are actively looking for beta users.

Again, please join our Discord to influence our roadmap and follow our Twitter to keep up with latest changes!