6 min read

Serverless Generative AI

Offloading compute-intensive workloads to the cloud has never been easier. Serverless 3D model generation using OpenAI Point-E and Modal.
Serverless Generative AI
Photo by Wesley Tingey / Unsplash

Lately, generative AI has been taking over the world, but the cost of training and running these models is still too high for most people. This is where serverless comes in. Serverless is a cloud computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. This means that you only pay for the time your code is running, and you can have as many instances running as you need.

This is perfect for ML workloads, because you can have as many instances running as you need, and you only pay for the time they are running. This is a huge cost saving compared to training and running these models on your own hardware.

Modal, currently in beta, is a serverless framework that can be used to run generative AI models on the cloud. It is built for more than just AI, but it is a great fit for it.

Let's take a look at how to use Modal to run a generative AI model on the cloud.

Getting Started

As I've mentioned before, lately I've been getting into creating "art" with generative AI. You might think that this is going to be another GPT-3 post, but no! This time, we are going to take a look at Point-E, an OpenAI model that generates 3d models from text descriptions.

Running this model on your own hardware is not cheap, especially if you are like me, and do not have a GPU. Luckily, we can use Modal to run this model on the cloud and only pay for the time it is running.

Let's take a look at how Point-E works first of all.

Point-E

Point-E is a model that generates 3d models from text descriptions. It doesn't produce amazing quality results, but it is a great example of the power of generative AI. It is also a great example of how serverless can be used to run these models!

The main part of the code to generate a PLY formatted mesh file using Point-E looks like this:

print("Running PointE in Modal")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print("creating base model...")
base_name = "base40M-textvec"
base_model = model_from_config(MODEL_CONFIGS[base_name], device)
base_model.eval()
base_diffusion = diffusion_from_config(DIFFUSION_CONFIGS[base_name])

print("creating upsample model...")
upsampler_model = model_from_config(MODEL_CONFIGS["upsample"], device)
upsampler_model.eval()
upsampler_diffusion = diffusion_from_config(DIFFUSION_CONFIGS["upsample"])

print("downloading base checkpoint...")
base_model.load_state_dict(load_checkpoint(base_name, device))

print("downloading upsampler checkpoint...")
upsampler_model.load_state_dict(load_checkpoint("upsample", device))

sampler = PointCloudSampler(
    device=device,
    models=[base_model, upsampler_model],
    diffusions=[base_diffusion, upsampler_diffusion],
    num_points=[1024, 4096 - 1024],
    aux_channels=["R", "G", "B"],
    guidance_scale=[3.0, 0.0],
    model_kwargs_key_filter=(
        "texts",
        "",
    ),  # Do not condition the upsampler at all
)

# Produce a sample from the model.
samples = None
for x in tqdm(
        sampler.sample_batch_progressive(
            batch_size=1, model_kwargs=dict(texts=[prompt])
        )
):
    samples = x

pc = sampler.output_to_point_clouds(samples)[0]

print("creating SDF model...")
name = "sdf"
model = model_from_config(MODEL_CONFIGS[name], device)
model.eval()

print("loading SDF model...")
model.load_state_dict(load_checkpoint(name, device))

# Produce a mesh (with vertex colors)
print("generating mesh...")
return marching_cubes_mesh(
    pc=pc,
    model=model,
    batch_size=4096,
    grid_size=32,  # increase to 128 for resolution used in evals
    progress=True,
)

So what happens here? Well, in broad strokes, we create a base model, an upsample model, and an SDF model. We then use the base model to generate a point cloud and then use the SDF model to generate a mesh from the point cloud. The upsample model is used to upsample the point cloud, but it is not used to generate the mesh.

We don't really want to dive deeper than this, because it is not really relevant to the point of this post. The point is that this is a lot of code, and it is not really easy to run on your own hardware. This is where Modal comes in.

Running Point-E on Modal

Modal provides a Python client that can be used to run code on the cloud. It is very easy to use, you can offload your functions using just a few lines of code. Let's take a look at how we can use Modal to run Point-E.

First of all, we need to create a Modal account. They have a generous free tier, which is more than enough for this tutorial. You get 30$ of free credits, and in comparison - generating one mesh with Point-E costs about 0.11$. And that's using a dedicated GPU!

Let's get the boilerplate code out of the way first.

stub = modal.Stub("pointe")

image = (
    modal.Image.debian_slim()
    .apt_install(["git"])
    .pip_install(["git+https://github.com/openai/point-e"])
)

volume = modal.SharedVolume().persist("model-cache-vol")
cache_path = "/root/point_e_model_cache"

stub.image = image

This code creates a Modal stub, and sets the image to use. We are using the Debian slim image, and installing git and the point-e library. We are also creating a shared volume, which will be used to cache the model files. This is important, because the model files are quite large, and we don't want to download them every time we run the model.

Now we can slap a decorator on our function, and offload it to the cloud!

@stub.function(
    gpu="A10G",
    shared_volumes={cache_path: volume},
)
def run_pointe(self, prompt: str):
# generate pretty 3d models from text descriptions!

That's ... it?

Yes, that's it! We can now run our function on the cloud, and it will be executed on a GPU. Now whenever you execute this function in the context of the defined stub, Modal will magically create the base image for it (or use a cached one), and run it on the cloud.

Now the result of this function is a mesh, which we can save to a file, but we need a little bit of extra work in order to make "art". I got inspired by the banner of the original paper, and decided to create spinning 3d models in gif format.

3d models generated by Point-E

To do this we can use Blender and graphicsmagick. Both are old-school tools, but they are still very powerful.

Here's what the core of the code looks like:

bpy.ops.object.select_all(action="DESELECT")
for obj in bpy.data.objects:
    obj.select_set(obj.type == "MESH")

# call the operator once
bpy.ops.object.delete()

# importing the ply file with color
bpy.ops.import_mesh.ply(filepath=input_ply_path)

object_list = bpy.data.objects
meshes = []
for obj in object_list:
    if obj.type == "MESH":
        meshes.append(obj)

for _object in meshes:
    if _object.type == "MESH":
        bpy.context.view_layer.objects.active = _object
        _object.select_set(True)
        mat = bpy.data.materials.new("material_1")
        _object.active_material = mat
        mat.use_nodes = True
        nodes = mat.node_tree.nodes
        mat_links = mat.node_tree.links
        bsdf = nodes.get("Principled BSDF")
        assert bsdf  # make sure it exists to continue
        vcol = nodes.new(type="ShaderNodeVertexColor")
        # vcol.layer_name = "VColor" # the vertex color layer name
        vcol.layer_name = "Col"
        mat_links.new(vcol.outputs["Color"], bsdf.inputs["Base Color"])
    elif _object.type == "CAMERA":
        _object.data.clip_end = 1000000
        _object.data.clip_start = 0.01
        _object.select_set(False)
    else:
        _object.select_set(False)

bpy.ops.wm.save_as_mainfile(filepath=f"{os.getcwd()}/scene.blend")

scene = bpy.context.scene
scene.render.image_settings.file_format = "PNG"  # set output format to .png

frames = range(1, 20)
x_rotation = radians(0)
y_rotation = radians(0)
z_rotation = radians(0)

for frame_nr in frames:
    scene.frame_set(frame_nr)

    context = bpy.context
    scene = context.scene
    _object = context.view_layer.objects.active  # the newly added cylinder.
    _object.name = "tree"
    _object.rotation_euler = Euler((x_rotation, y_rotation, z_rotation), "XYZ")
    scene.render.filepath = f"frame_{frame_nr:04d}.png"
    bpy.ops.render.render(write_still=True)
    z_rotation = z_rotation + radians(20)

subprocess.run(
    f'gm convert -delay 20 -loop 0 "*.png" "output.gif"',
    shell=True,
    check=True,
)

This code is a bit messy, but it does the job. It loads the mesh, and then rotates it around the z axis. It then saves the resulting images to a file, and then uses graphicsmagick to convert them to a gif. If I knew what I was doing I could probably do this in a much more elegant way, but I obviously don't. Anyway, it works!

Let's generate a mesh fooooooor... a car!

The full prompt I'll use is

"A red sportscar."

On the Modal website, we get a neat, live summary of how much our function runs have cost. We can see that each function call costs around 5 to 10 cents. Pretty cheap, considering we are using an A10G GPU for each mesh generation!

After a few seconds our mesh.ply file is ready to be piped into our Blender script!

It's time for the pièce de résistance, let's see our generated animation!

vrooooom

Well ... it kinda, sorta looks like a car, right? Like it or not, this is what peak machine-learning performance looks like!

Anyway ~ prompt engineering is for a later article! Also, the models available currently are fairly simple, hopefully, the folks at OpenAI make the ones used in the paper available soon!

Check the full code here.

Conclusion

This is a very simple example of how Modal can be used to run complex models on the cloud. It is a great way to get started with generative AI, and it is also a great way to get started with Modal. I hope you enjoyed this post, and I hope you will try out Modal and Point-E, both are fun technologies!