How to Host Label Studio on AWS Elastic Beanstalk with Persistent Storage and RDS PostgreSQL

Published on July 06, 2022

By Hyuntaek Park

Senior full-stack engineer at Twigfarm

At Twigfarm, we wanted to host the Label Studio on the AWS Elastic Beanstalk. I googled it for how and found only one article: https://medium.com/@KerleIndia/how-to-host-label-studio-on-aws-elastic-beanstalk-41abcbcced4

It provides a simple method to host the Label Studio on the AWS Elastic Beanstalk but with data persistence problems. The storage is only temporary; login status and files are gone within a few hours.

An alternative solution is to use Heroku. With just a few clicks you can easily deploy the Label Studio application on Heroku. However, we want to control the database and files directly. Thus, we could not give AWS Elastic Beanstalk option.

In this tutorial, I go over how to deploy the Label Studio application on AWS Elastic Beanstalk while keeping persistent storage and separating the database. Custom domains and HTTPs settings are beyond the scope of this tutorial. You can refer to this link for some hints: https://io.twigfarm.net/aws/https-ec2/

Architecture

Here is a simple diagram of how we host the Label Studio is deployed on AWS Elastic Beanstalk setup.

One thing to note is that here I am using the instances that are as small as possible for the demo purpose. You might need to set up with larger instances and load balancers correctly if you are trying to do the production deployment.

Let us start by setting up the RDS Postgres first.

AWS RDS Postgres Setup

Go to RDS and create a PostgreSQL database. Leave the settings as default except for the followings:

Choose PostgreSQL
Set username
Set the master password
Confirm password
Initial database name in Additional configuration (Important!): I used “labelstudio_db”

Then wait for a while until the database creation gets finished.

Docker Build

Go ahead and clone the Label Studio GitHub Repository.

git clone https://github.com/heartexlabs/label-studio.git label-studio-eb

ECR Setup

Now that your docker image is ready. Now it is time for the repository setup for the docker image.

Go to “Amazon Elastic Container Services (ECS)”, then choose Repositories under Amazon ECR on your left panel in the AWS console.

Click “Create repository”.

Repository name: labelstudio

Click “Create repository”.

Once the repository is created, click on “View push commands” and type in the commands as the following:

The following are done sequentially.

Login
Docker build
Tag
Push

After the successful push, you should be able to see the image that was just pushed.

IAM role to connect to ECR from the Elastic Beanstalk

Go to “IAM” in AWS console. Choose “aws-elasticbeanstalk-ec2-role” under Access mangement > Roles. Add “AWSAppRunnerServicePolicyForECRAccess” policy.

EFS

Go to EFS (Elastic File System) in the AWS console. Create a file system by clicking the “Create file system” button. Leave all the settings as they were. We create one with the “Standard” storage class.

Make a note of the “File system ID”, which we will need in the next section.

Elastic Beanstalk Deployment Files

We need to create four files:

.ebextensions/01_mount.config # For EFS mount
.platform/00_nginx.config # For nginx settings
.platform/nginx/conf.d/proxy.conf # For unlimited file size upload

EFS mount

Persistent storage is very important since we upload source files and create output files that they should be retained over time and across the users. I chose EFS is pretty consistent and automatically scales.

First of all, create the following directory and a file.

# .ebextensions/01_mount.config

option_settings:
  aws:elasticbeanstalk:application:environment:
    FILE_SYSTEM_ID: '<YOUR_FILE_SYSTEM_ID>'
    MOUNT_DIRECTORY: '/efs'

##############################################
#### Do not modify values below this line ####
##############################################

packages:
  yum:
    amazon-efs-utils: []

commands:
  01_mount:
    command: '/tmp/mount-efs.sh'

files:
  '/tmp/mount-efs.sh':
    mode: '000755'
    content: |
      #!/bin/bash

      EFS_MOUNT_DIR=$(/opt/elasticbeanstalk/bin/get-config environment -k MOUNT_DIRECTORY)
      EFS_FILE_SYSTEM_ID=$(/opt/elasticbeanstalk/bin/get-config environment -k FILE_SYSTEM_ID)

      echo "Mounting EFS filesystem ${EFS_FILE_SYSTEM_ID} to directory ${EFS_MOUNT_DIR} ..."

      echo 'Stopping NFS ID Mapper...'
      service rpcidmapd status &> /dev/null
      if [ $? -ne 0 ] ; then
          echo 'rpc.idmapd is already stopped!'
      else
          service rpcidmapd stop
          if [ $? -ne 0 ] ; then
              echo 'ERROR: Failed to stop NFS ID Mapper!'
              exit 1
          fi
      fi

      echo 'Checking if EFS mount directory exists...'
      if [ ! -d ${EFS_MOUNT_DIR} ]; then
          echo "Creating directory ${EFS_MOUNT_DIR} ..."
          mkdir -p ${EFS_MOUNT_DIR}
          if [ $? -ne 0 ]; then
              echo 'ERROR: Directory creation failed!'
              exit 1
          fi
      else
          echo "Directory ${EFS_MOUNT_DIR} already exists!"
      fi

      mountpoint -q ${EFS_MOUNT_DIR}
      if [ $? -ne 0 ]; then
          echo "mount -t efs -o tls ${EFS_FILE_SYSTEM_ID}:/ ${EFS_MOUNT_DIR}"
          mount -t efs -o tls ${EFS_FILE_SYSTEM_ID}:/ ${EFS_MOUNT_DIR}
          if [ $? -ne 0 ] ; then
              echo 'ERROR: Mount command failed!'
              exit 1
          fi
          chmod 777 ${EFS_MOUNT_DIR}
          runuser -l  ec2-user -c "touch ${EFS_MOUNT_DIR}/it_works"
          if [[ $? -ne 0 ]]; then
              echo 'ERROR: Permission Error!'
              exit 1
          else
              runuser -l  ec2-user -c "rm -f ${EFS_MOUNT_DIR}/it_works"
          fi
      else
          echo "Directory ${EFS_MOUNT_DIR} is already a valid mountpoint!"
      fi

      echo 'EFS mount complete.'

nginx configuration

We often deal with large files with more than 10 mb in the Label Studio. We manually need to override default nginx settings so that we can upload large-size files. We need two files to achieve the goal.

# .platform/00_nginx.config
files:
  “/etc/nginx/conf.d/proxy.conf”:
     mode: “000644”
     owner: root
     group: root
     content: |
       keepalive_timeout 120s;
       proxy_connect_timeout 120s;
       proxy_send_timeout 120s;
       proxy_read_timeout 120s;
       fastcgi_send_timeout 120s;
       fastcgi_read_timeout 120s;
container_commands:
  nginx_reload:
     command: “sudo service nginx reload”

# .platform/nginx/conf.d/proxy.conf
client_max_body_size 0;

You can consult more details about Elastic Beanstalk platform extensions in this link: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/platforms-linux-extend.html

Dockerrun.aws.json File

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "<YOUR_ECR_REPOSITORY_URI>:latest",
    "Update": "true"
  },
  "Ports": [
    {
      "ContainerPort": "8080"
    }
  ],
  "Volumes": [
    {
      "HostDirectory": "/efs/data/",
      "ContainerDirectory": "/label-studio/data/"
    }
  ]
}

Upload

Now that our Elastic Beanstalk configuration files are ready. Let zip them into one file: deploy.zip

zip -r deploy.zip Dockerrun.aws.json .platform .ebextensions

Almost done. Let us create an Elastic Beanstalk application to deploy the system.

Elastic Beanstalk Creation

Go to Amazon Elastic Beanstalk Environments in the AWS console. Start by clicking “Create a new environment”.

environment tier: Web server environment
Application name: labelstudio-eb
Platform: Docker
Platform branch: Docker running on 64bit Amazon Linux 2
Platform version: 3.4.17 (Recommended)
Application code: Upload your code
Source code origin: Local file
Choose file: deploy.zip

It is not over yet. Click the “Configure more options” button to set environment variables for the database. Find the “Software” section and click the “Edit” button. Then scroll down to input environment variables under “Environment properties”

Networking Setup

Choose the same VPC as the one That EFS is in.

Instance Security Group Setup

Choose the default security group since the default security group allows all the incoming traffic. To be more specific, you should choose the security group that opens the 2049 port to connect with the EFS.

Environment Variable Set

POSTGRE_USER: <DB_USERNAME>
POSTGRE_PASSWORD: <DB_PASSWORD>
POSTGRE_HOST: <HOST_URL> # Find it under RDS > Connectivity & security > Endpoint
POSTGRE_PORT: 5432 # Find it under RDS > Connectivity & security > Port
POSTGRE_NAME: labelstudio_db
DJANGO_DB: default

Click the “Save” button then “Create environment” button. This will take a few minutes.

Now the environment is ready. Click the “Go to environment” button to see if the application works well.