How to Host Label Studio on AWS Elastic Beanstalk with Persistent Storage and RDS PostgreSQL
Published on July 06, 2022
By Hyuntaek Park
Senior full-stack engineer at Twigfarm
At Twigfarm, we wanted to host the Label Studio on the AWS Elastic Beanstalk. I googled it for how and found only one article: https://medium.com/@KerleIndia/how-to-host-label-studio-on-aws-elastic-beanstalk-41abcbcced4
It provides a simple method to host the Label Studio on the AWS Elastic Beanstalk but with data persistence problems. The storage is only temporary; login status and files are gone within a few hours.
An alternative solution is to use Heroku. With just a few clicks you can easily deploy the Label Studio application on Heroku. However, we want to control the database and files directly. Thus, we could not give AWS Elastic Beanstalk option.
In this tutorial, I go over how to deploy the Label Studio application on AWS Elastic Beanstalk while keeping persistent storage and separating the database. Custom domains and HTTPs settings are beyond the scope of this tutorial. You can refer to this link for some hints: https://io.twigfarm.net/aws/https-ec2/
Architecture
Here is a simple diagram of how we host the Label Studio is deployed on AWS Elastic Beanstalk setup.
One thing to note is that here I am using the instances that are as small as possible for the demo purpose. You might need to set up with larger instances and load balancers correctly if you are trying to do the production deployment.
Let us start by setting up the RDS Postgres first.
AWS RDS Postgres Setup
Go to RDS and create a PostgreSQL database. Leave the settings as default except for the followings:
- Choose PostgreSQL
- Set username
- Set the master password
- Confirm password
- Initial database name in Additional configuration (Important!): I used “labelstudio_db”
Then wait for a while until the database creation gets finished.
Docker Build
Go ahead and clone the Label Studio GitHub Repository.
git clone https://github.com/heartexlabs/label-studio.git label-studio-eb
ECR Setup
Now that your docker image is ready. Now it is time for the repository setup for the docker image.
Go to “Amazon Elastic Container Services (ECS)”, then choose Repositories under Amazon ECR on your left panel in the AWS console.
Click “Create repository”.
- Repository name: labelstudio
Click “Create repository”.
Once the repository is created, click on “View push commands” and type in the commands as the following:
The following are done sequentially.
- Login
- Docker build
- Tag
- Push
After the successful push, you should be able to see the image that was just pushed.
IAM role to connect to ECR from the Elastic Beanstalk
Go to “IAM” in AWS console. Choose “aws-elasticbeanstalk-ec2-role” under Access mangement > Roles. Add “AWSAppRunnerServicePolicyForECRAccess” policy.
EFS
Go to EFS (Elastic File System) in the AWS console. Create a file system by clicking the “Create file system” button. Leave all the settings as they were. We create one with the “Standard” storage class.
Make a note of the “File system ID”, which we will need in the next section.
Elastic Beanstalk Deployment Files
We need to create four files:
- .ebextensions/01_mount.config # For EFS mount
- .platform/00_nginx.config # For nginx settings
- .platform/nginx/conf.d/proxy.conf # For unlimited file size upload
EFS mount
Persistent storage is very important since we upload source files and create output files that they should be retained over time and across the users. I chose EFS is pretty consistent and automatically scales.
First of all, create the following directory and a file.
# .ebextensions/01_mount.config
option_settings:
aws:elasticbeanstalk:application:environment:
FILE_SYSTEM_ID: '<YOUR_FILE_SYSTEM_ID>'
MOUNT_DIRECTORY: '/efs'
##############################################
#### Do not modify values below this line ####
##############################################
packages:
yum:
amazon-efs-utils: []
commands:
01_mount:
command: '/tmp/mount-efs.sh'
files:
'/tmp/mount-efs.sh':
mode: '000755'
content: |
#!/bin/bash
EFS_MOUNT_DIR=$(/opt/elasticbeanstalk/bin/get-config environment -k MOUNT_DIRECTORY)
EFS_FILE_SYSTEM_ID=$(/opt/elasticbeanstalk/bin/get-config environment -k FILE_SYSTEM_ID)
echo "Mounting EFS filesystem ${EFS_FILE_SYSTEM_ID} to directory ${EFS_MOUNT_DIR} ..."
echo 'Stopping NFS ID Mapper...'
service rpcidmapd status &> /dev/null
if [ $? -ne 0 ] ; then
echo 'rpc.idmapd is already stopped!'
else
service rpcidmapd stop
if [ $? -ne 0 ] ; then
echo 'ERROR: Failed to stop NFS ID Mapper!'
exit 1
fi
fi
echo 'Checking if EFS mount directory exists...'
if [ ! -d ${EFS_MOUNT_DIR} ]; then
echo "Creating directory ${EFS_MOUNT_DIR} ..."
mkdir -p ${EFS_MOUNT_DIR}
if [ $? -ne 0 ]; then
echo 'ERROR: Directory creation failed!'
exit 1
fi
else
echo "Directory ${EFS_MOUNT_DIR} already exists!"
fi
mountpoint -q ${EFS_MOUNT_DIR}
if [ $? -ne 0 ]; then
echo "mount -t efs -o tls ${EFS_FILE_SYSTEM_ID}:/ ${EFS_MOUNT_DIR}"
mount -t efs -o tls ${EFS_FILE_SYSTEM_ID}:/ ${EFS_MOUNT_DIR}
if [ $? -ne 0 ] ; then
echo 'ERROR: Mount command failed!'
exit 1
fi
chmod 777 ${EFS_MOUNT_DIR}
runuser -l ec2-user -c "touch ${EFS_MOUNT_DIR}/it_works"
if [[ $? -ne 0 ]]; then
echo 'ERROR: Permission Error!'
exit 1
else
runuser -l ec2-user -c "rm -f ${EFS_MOUNT_DIR}/it_works"
fi
else
echo "Directory ${EFS_MOUNT_DIR} is already a valid mountpoint!"
fi
echo 'EFS mount complete.'
nginx configuration
We often deal with large files with more than 10 mb in the Label Studio. We manually need to override default nginx settings so that we can upload large-size files. We need two files to achieve the goal.
# .platform/00_nginx.config
files:
“/etc/nginx/conf.d/proxy.conf”:
mode: “000644”
owner: root
group: root
content: |
keepalive_timeout 120s;
proxy_connect_timeout 120s;
proxy_send_timeout 120s;
proxy_read_timeout 120s;
fastcgi_send_timeout 120s;
fastcgi_read_timeout 120s;
container_commands:
nginx_reload:
command: “sudo service nginx reload”
# .platform/nginx/conf.d/proxy.conf
client_max_body_size 0;
You can consult more details about Elastic Beanstalk platform extensions in this link: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/platforms-linux-extend.html
Dockerrun.aws.json File
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "<YOUR_ECR_REPOSITORY_URI>:latest",
"Update": "true"
},
"Ports": [
{
"ContainerPort": "8080"
}
],
"Volumes": [
{
"HostDirectory": "/efs/data/",
"ContainerDirectory": "/label-studio/data/"
}
]
}
Upload
Now that our Elastic Beanstalk configuration files are ready. Let zip them into one file: deploy.zip
zip -r deploy.zip Dockerrun.aws.json .platform .ebextensions
Almost done. Let us create an Elastic Beanstalk application to deploy the system.
Elastic Beanstalk Creation
Go to Amazon Elastic Beanstalk Environments in the AWS console. Start by clicking “Create a new environment”.
- environment tier: Web server environment
- Application name: labelstudio-eb
- Platform: Docker
- Platform branch: Docker running on 64bit Amazon Linux 2
- Platform version: 3.4.17 (Recommended)
- Application code: Upload your code
- Source code origin: Local file
- Choose file: deploy.zip
It is not over yet. Click the “Configure more options” button to set environment variables for the database. Find the “Software” section and click the “Edit” button. Then scroll down to input environment variables under “Environment properties”
Networking Setup
Choose the same VPC as the one That EFS is in.
Instance Security Group Setup
Choose the default security group since the default security group allows all the incoming traffic. To be more specific, you should choose the security group that opens the 2049 port to connect with the EFS.
Environment Variable Set
- POSTGRE_USER:
<DB_USERNAME>
- POSTGRE_PASSWORD:
<DB_PASSWORD>
- POSTGRE_HOST:
<HOST_URL>
# Find it under RDS > Connectivity & security > Endpoint - POSTGRE_PORT: 5432 # Find it under RDS > Connectivity & security > Port
- POSTGRE_NAME: labelstudio_db
- DJANGO_DB: default
Click the “Save” button then “Create environment” button. This will take a few minutes.
Now the environment is ready. Click the “Go to environment” button to see if the application works well.