- All opinions expressed in this post are mine alone, and are not representative of any organization or group regardless of affiliation status
- The work/effort done here are purely using my own resources and time (also explains the untimeliness of this post)
- This is a hobby project. You would be ill-advised to make decisions/judgement based on any content/views in this post, whether written or implied
I analyzed job postings in the “InfoComm, Technology, New Media Communications” category on Careers@Gov, from 1 Mar 2017 to 31 Dec 2018. In an earlier analysis, I presented the trends in the form of a static blog post here. This time around, I have created a self-service data dashboard using Google Data Studio.
Please enjoy my interactive dashboard.
In an earlier post, I analyzed job postings in the “InfoComm, Technology, New Media Communications” category on Careers@Gov, from 1 March 2017 to 28 Feb 2018 (1 year duration).
Seeing that there were approximately 1.8K unique job postings, it was fairly easy to slice and dice the data and present my findings in a blog post. The disadvantage of that approach is that the readers (i.e. you) are not able to define custom date ranges that were meaningful to them.
Self-Service Data Dashboard Approach
With more data on hand, I needed an approach that could maximize the data’s value – this meant that adopting a one-size-fits-all approach would not cut it. I considered a range of options (including REST API data vending) before deciding to try the new(er?) Google Data Studio that launched in 2016.
The advantage of adopting a self-service data dashboard paradigm is that the same set of data can benefit the wider general audience. Granted that some of the more in-depth analysis is no longer possible (due to product or platform limitations), these use cases are often few and far between.
New Feature: “Find Similar Jobs”
Besides the normal trend reporting (e.g. postings per month, top employer), I have a new feature called “Find Similar Jobs” that allows the viewer to, well, find similar job postings.
The motivation is that jobs with similar roles and responsibilities may be posted with different job title/designation – for example, a QA Software Developer in Organization X may have the same roles as a Selenium Automation Engineer on Organization Y. However, they would have similar keywords in their job descriptions that uniquely identify them from the other roles. By grouping job postings based on their features (i.e. keywords), we can discover similar jobs despite their high level differences.
As there are over 3000 job postings in the given date range, it would be far too tedious to perform this grouping manually. Therefore, I have used NLTK and scikit-learn on top of Docker to automate this grouping. Although it is a faster approach, I noticed in my random sampling that at least one erroneous groupings where a HR Director position was categorized as being similar to a Software Developer in the same organization. However, this small percentage of error is acceptable for me, as I did save a tremendous amount of time and effort as compared to manually classifying them.
I’ll save the technical details for another blog post – in the meantime, feel free to poke and prod the dashboard here! If you are unable to view the dashboard, you can check out some of the screenshots below: