AWS S3 Loader Using Lambda


 Versions 2.26.5, 2.27.1 and higher

In versions 2.26.5 and 2.27.1, we introduced a revamped S3 loader microservice, which loads data from the AWS Cost and Usage Reports (CURs). The new S3 loader will scale more efficiently, and it dramatically improves performance for customers with large amounts of AWS spend data.

Switching to the new S3 loader requires some changes in your AWS environment, which we cover in the "Required Configuration" section below.

Overview of Changes

The new S3 loader leverages Lambda functions and Amazon Simple Queue Service (SQS) to divide the CUR loading process into smaller chunks, which it can complete more efficiently. The process includes:

  1. Unzip files. The S3 loader will invoke a Lambda function to unzip the CUR files and write the unzipped contents to the S3 reporting bucket.
  2. Poll queue. The SQS queue will monitor the unzipping process. When it reaches either of the following criteria, the process will move to the next step:
    • Confirmation that all files have successfully unzipped in S3, OR
    • The timeout window has elapsed. This timeout window is configurable in the database and is set to five minutes by default. Internal tests completed very large datasets in less than five minutes.
  3. Process report files and generate S3 manifest. The S3 loader will invoke two Lambda functions: one for data in the main CUR table and one for data in the tags table. These Lambda functions will spawn a series of other Lambda functions (50+ in total). The functions process the data in small, manageable chunks that can be processed simultaneously. Then, the S3 manifests will save the results to the S3 bucket.
  4. Load data S3. The final step uses a database trigger to populate the data from S3 to two tables in the RDS database.

Please note that the use of Lambda functions and SQS queue can incur charges in your AWS account, but these costs should be negligible. The entire process will only run when a new CUR is available, which is usually every 12 hours.

Required Configuration

Aside from upgrading to version 2.26.5 or 2.27.1+, you'll need to perform the following manual configuration in AWS to start using the new S3 loader:

In the Account Where is Installed:

  1. Run the s3loader_rds_cft.json CloudFormation template, which is available here. This CFT creates an IAM role giving read/write access to the CUR S3 bucket.
    • Inputs include:
      • ARN of S3 bucket where the payer account stores CURs.
      • Parameter group family. This needs to match the DB engine version (see the AWS docs for more information).
    • What it does:
      • Creates an IAM role giving read/write access to the CUR S3 bucket.
      • Creates an RDS cluster parameters group that allows the following to run: LOAD DATA S3 MANIFEST, aurora_load_from_s3_role, and aws_default_s3_role (which load data and set the new role).
  2. Attach the new IAM role to the RDS cluster.
  3. Attach the new cluster parameters group to the RDS cluster. You'll need to restart the cluster after attaching the parameter group.

In the Payer Account

  1. Run the s3loader_payer_cft.json CloudFormation template, which is available here.
    • Inputs include:
      • ARN of S3 bucket where the payer account stores CURs.
      • IAM role name for the cloudtamer service role (defaults to cloudtamer-service-role).
    • What it does:
      • Creates the SQS queue.
      • Creates an IAM role for the Lambda functions.
      • Creates three Lambda functions, which:
        • Unzip report files.
        • Generate CUR data for the database.
        • Generate CUR tag data for the database.
      • Creates a policy for the cloudtamer service role, which:
        • Allows read/delete access on the SQS queue.
        • Allows invoke on Lambda.
        • Allows read/delete access on the S3 report bucket.
  2. Modify the policy on the S3 report bucket by adding the following code. This will allow RDS to pull directly from S3. Replace the values between and including <>s with the appropriate ARN.
    "Effect": "Allow",
    "Principal": {
    "AWS": <ARN of IAM role created on cloudtamer acccount>
    "Action": [
    "Resource": <ARN of bucket>/*"

In the Database

Run the following query in the database, which activates the new of the S3 loader. You'll also need to restart the database for it to start running.

UPDATE cloudtamer_config SET ct_value = 'v2' WHERE ct_key = 's3_loader_version';
Was this article helpful?
0 out of 0 found this helpful