Thickening Words in Paint.NET

I stumbled upon Paint.NET back in college when I was looking for a free image editing software for poster creation. It has now been my go-to software whenever I need to do some image/photo editing. A while ago, I helped a friend to thicken/bold some handwritten words in a scan of a concert poster so as … Continue reading Thickening Words in Paint.NET

Optimizing Redshift SQL Queries Via Query Plan Estimates

Using SQL queries to generate reports across several days can take a non-trivial amount of time. While it is tempting to simply throw more hardware at the problem, it does little to address the potential problem of inefficient queries. Inefficient queries are precursors to their final production ready counterparts, similar to developing software whereby the … Continue reading Optimizing Redshift SQL Queries Via Query Plan Estimates

Data Pipeline Architecture for Job Posting Analytics

In my previous post, I presented a data dashboard that allowed the viewer to slice and dice 2017/18 Infocomm Job Postings from the Careers@Gov portal. Here, I will explain the data pipeline that I used to perform the necessary ETL operations for preparing the data dashboard. Architecture Based on the architecture above, you can tell … Continue reading Data Pipeline Architecture for Job Posting Analytics

Data Dashboard: 2017/18 Infocomm Job Postings on Careers@Gov (Part 2)

Foreword All opinions expressed in this post are mine alone, and are not representative of any organization or group regardless of affiliation status The work/effort done here are purely using my own resources and time (also explains the untimeliness of this post) This is a hobby project. You would be ill-advised to make decisions/judgement based … Continue reading Data Dashboard: 2017/18 Infocomm Job Postings on Careers@Gov (Part 2)

Query Optimization – Processing 660 Million Rows Twice as Fast

I was recently given the opportunity to optimize a query that processed a total of 660 million rows. The problem with this query was that it took 150 minutes to complete, provided that it did not time out (which it did ~40% of the time). The query timing out caused two key problems, namely: 1) … Continue reading Query Optimization – Processing 660 Million Rows Twice as Fast

Text Based Custom Named Entity Tagger (TeBaC-NET)

I was recently exploring spaCy for some NLP work, and found that the default model was not sufficient for tagging entities in the domain I was exploring. The documentation was very helpful in explaining how I could train the statistical model of the named entity recognizer, but I needed training and evaluation data. While I could … Continue reading Text Based Custom Named Entity Tagger (TeBaC-NET)