Zipping Large Size S3 Folders and Files Using Node.js Lambda And EFS

Published on February 10, 2022

By Hyuntaek Park

Senior full-stack engineer at Twigfarm

AWS S3 is a very convenient cloud storage. You can upload and download files easily in various ways with AWS CLI, SDK, API, etc. But can you download an entire folder and its sub-folders and files recursively? Unfortunately, S3 does not provide such features. We need to develop our own way to recursively zip the folder and make the zip file available for download.

Requirements

Our goal is to zip entire folders, sub-folders, and files under the folders in our S3 bucket while preserving folder structure tree. The files can be large (> 512 MB, which is the size of Lambda temporary storage).

How files are treated in S3

We have created folders and uploaded files as following in S3 bucket.

image

However, to be concise, they are not folders in S3. There are just four files with the following keys:

  • folder1/sub1/image.png
  • folder1/sub2/test.txt
  • folder2/large.mov
  • folder2/test2.pdf

Solution

Although S3 does not have a concept of folders, the key of each file has folder information as prefixes. Each folder level is delimited by ‘/’ and followed by the file name. (i.e., folder1/sub1/image.png)

Using the key that has folder information prefix, we can create folders in EFS and then download the file from S3.

Then Lambda simply does the zipping and upload the zip file back to S3. Following diagram shows the sequence of our implementation and how files are represented differently in S3 and EFS.

image

One thing to keep in mind is that our Lambda and the EFS must be in the same VPC.

Create EFS (Elastic File System) and access point

There are a couple of reasons why Amazon EFS comes in handy.

  • EFS is just like Linux file system. You can use file commands such as mkdir, ls, cp, rm, etc.
  • We could use Lambda’s temporary storage has size limit: < 512MB

Let’s create an EFS. Go to Elastic File System in AWS console and click Create file system. image

Then click Create.

Now it is time for creating an access point which to be used in Lambda function later. Choose the file system we just created. Then click Access points –> Create access point.

Here’s the input values you should enter:

  • Root directory path: /efs
  • POSIX user
    • User ID: 1000
    • Group ID: 1000
  • Root directory creation permissions
    • Owner user ID: 1000
    • Owner group ID: 1000
    • POSIX permissions to apply to the root directory path: 0777

image

Create and configure EFS attached Lambda function

Let’s create a Node.js Lambda function as following:

image

image

Once the Lambda function is created, click Configuration –> File systems –> Add file system.

image

Choose the EFS access point that we have just created. And put /mnt/efs for Local mount path. This important because /mnt/efs will be your EFS folder.

Click Save, now you have access to /mnt/efs from the Lambda function.

Access to S3 from Lambda

VPC Endpoints

According to https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints.html,

A VPC endpoint enables connections between a virtual private cloud (VPC) and supported services, without requiring that you use an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection.

To access to S3 buckets from Lambdas inside a VPC, we need to set up a VPC endpoint for S3. Go to VPC and click Endpoints –> Create endpoint. Then select input as following:

image

image

Then click Create endpoint. Technically the Lambdas within the VPC can access to S3 now but one more step is required to really access to a specific S3 bucket.

Lambda role

A Lambda role is created while creating our Lambda function. You can use the existing role for the Lambda. Here we just created a new role. Go to our Lambda function then click Configuration –> Permissions. Then choose the role under Execution role.

image

Then go to Permissions policies –> click Add permissions –> Create inline policy. On the next screen, choose JSON tab. Then copy and paste following. Replace YOUR_BUCKET_NAME with your own bucket name.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::YOUR_BUCKET_NAME",
        "arn:aws:s3:::YOUR_BUCKET_NAME/*"
      ]
    }
  ]
}

Click Review policy. Enter the policy name you like and then click Create policy.

More Lambda configuration

Since downloading takes time and file sizes can be hundreds of megabytes, Lambda’s default memory size (128 MB) and timeout (3 seconds) are not enough. For this demonstration, memory size and timeout are set to 4096 MB and 2 minutes, respectively in Configuration –> General configuration.

Lambda code

Here’s the final Lambda code. The code implements what we have discussed.

  1. Copies folders / file from S3 to EFS.
  2. Zips downloaded files in EFS
  3. Uploads the zip file back to S3
  4. Removes the temporary EFS files
const { exec } = require("child_process");
const fs = require("fs");
const archiver = require("archiver");

const AWS = require("aws-sdk");
const s3 = new AWS.S3();

const bucketName = "zip-s3-test";
const sourceDir = "/mnt/efs/temp";
const outPath = "/mnt/efs/temp.zip";

exports.handler = async () => {
  await copyFilesFromS3BucketToEfs(bucketName);
  await zipFolders(sourceDir, outPath);
  await uploadZipFileToS3(bucketName, outPath);
  await executeSystemCommand("rm -rf /mnt/efs/*");
};

const copyFilesFromS3BucketToEfs = async (bucket) => {
  const s3Objects = await s3
    .listObjectsV2({
      Bucket: bucket,
    })
    .promise();

  for (const content of s3Objects.Contents) {
    if (content.Size === 0) {
      // only zip the files with size > 0
      continue;
    }
    const s3KeyArray = content.Key.split("/");
    const fileName = s3KeyArray.pop();
    const folderName = s3KeyArray.join("");
    await executeSystemCommand(`mkdir -p '/mnt/efs/temp/${folderName}'`);
    const s3Object = await s3
      .getObject({
        Bucket: bucket,
        Key: content.Key,
      })
      .promise();
    fs.writeFileSync(`/mnt/efs/temp/${folderName}/${fileName}`, s3Object.Body);
  }
};

const zipFolders = async (sourceDir, outPath) => {
  const archive = archiver("zip");
  const stream = fs.createWriteStream(outPath);

  return new Promise((resolve, reject) => {
    archive
      .directory(sourceDir, false)
      .on("error", (err) => reject(err))
      .pipe(stream);

    stream.on("close", () => resolve());
    archive.finalize();
  });
};

const uploadZipFileToS3 = async (bucket, filePath) => {
  await s3
    .upload({
      Bucket: bucket,
      Key: "my-archive.zip",
      Body: fs.readFileSync(filePath),
    })
    .promise();
};

// executes linux command
const executeSystemCommand = async (command) => {
  const promise = new Promise((resolve, reject) => {
    exec(command, (error, stdout, stderr) => {
      if (error) {
        reject(error);
      }
      if (stderr) {
        reject(stderr);
      }
      resolve(stdout);
    });
  });

  return promise;
};

I hope the code itself is self-explanatory. Just one thing to mention is that we used an open source Node.js package called archiver for zipping folders and files. There are many ways that you can zip files in Node.js. You can choose whatever suits you the best.

Obviously there should be try / catch blocks to deal with error cases. But here we just omit them for simplicity.

Results

Let’s go check our S3 bucket.

image

As you can see there is a new zip file, called my-archive.zip. Let’s click the file name and download and unzip the file.

image

Folder and file structure is exactly the same as the one at the top of this article.

We had many steps to follow to achieve this simple requirement, zipping folders and files in S3, but they are pretty standard when you have to deal with AWS.

  • Create and launch AWS service
  • Give appropriate permissions
  • Execute the logic

It took a while for me to get used to it! :)

Thanks for reading.