How I Root Caused a CPU Bottleneck in a RDS Database

An application was behaving very sluggishly, and I decided to take a look at it to identify and fix the cause of it. The problem was narrowed down to the RDS database taking a long time to respond to requests, and the mitigation action decided by the team was to "get a bigger instance." However, my analytical mind wanted to really understand the root of the problem to ensure that throwing more compute at the problem will actually solve it.

Backup Before Modifying a Production AWS RDS Database Managed by Terraform

Periodic changes to production cloud resources should be expected as the cloud offers elasticity to scale in/out with demands. Although some changes are riskier than others, the AWS RDS processes for applying (and rolling back) these changes have been battle-tested. Despite this, it is always good for organizations to have their own backup and restore strategies before riskier changes are applied - after all, the data does belong to them. In this post, I'll propose several methods to backup production AWS RDS databases that are managed via Terraform, as well as their considerations.

How to ssh into Containers in AWS EKS

I was experimenting how I could expose applications in AWS Elastic Kubernetes Service (EKS) via Kubernetes Service resources and AWS load balancers. Out of curiosity, I also wanted to know if I could ssh into containers in EKS without using "kubectl exec" or any container runtime commands (e.g. "docker attach"). One scenario would be when I need to access the container's filesystem to extract a log/config file, but 1) I do not have EKS cluster admin role for more permissive actions, and 2) the kubectl environment is exposed via a structured CI/CD pipeline and is non-interactive. As I could not find any concrete examples/tutorials, here are my implementation setup and steps.

Setting up a JFrog Artifactory 7 and Xray 3 Sandbox in AWS Using minikube and Helm Charts

The JFrog Artifactory and its complementary suite of tools is well known across the industry. As part of a certification preparation, I wanted to find out more about how it is administered. This post is how to install JFrog Artifactory 7 and Xray 3 using Helm Charts in an AWS EC2 instance.

How Amazon Aurora’s Architecture Solves Scale Out Bottlenecks of Traditional Relational Databases

The casual observer would then wonder - if there are existing RDS offerings for both MySQL and PostgreSQL, why would Amazon Aurora be introduced? To understand its unique selling point, and claims for scalability and cost effectiveness, we need to look at how traditional relational databases handle scaling out.

Optimizing Redshift SQL Queries Via Query Plan Estimates

Using SQL queries to generate reports across several days can take a non-trivial amount of time. While it is tempting to simply throw more hardware at the problem, it does little to address the potential problem of inefficient queries. Inefficient queries are precursors to their final production ready counterparts, similar to developing software whereby the … Continue reading Optimizing Redshift SQL Queries Via Query Plan Estimates

AWS Certified Cloud Practitioner Certification

I recently passed the AWS Certified Cloud Practitioner exam and attained its certification which is valid for 2 years. I was sitting on the fence for this one, especially when the official certification site mentioned that the recommended AWS knowledge was "at least six (6) months of experience with the AWS Cloud in any role, including … Continue reading AWS Certified Cloud Practitioner Certification

Query Optimization – Processing 660 Million Rows Twice as Fast

I was recently given the opportunity to optimize a query that processed a total of 660 million rows. The problem with this query was that it took 150 minutes to complete, provided that it did not time out (which it did ~40% of the time). The query timing out caused two key problems, namely: 1) … Continue reading Query Optimization – Processing 660 Million Rows Twice as Fast