In this post, I elaborate further on the Software Architecture, and Infrastructure that keeps Paddle-SG up and running. I will start explaining from the lowest level (i.e. hardware), and slowly move up the “stack”.
I started developing and testing the scripts on my laptop. However, for the Minimal Viable Product (MVP), I contemplated between running it on the cloud, or on another physical machine.
The deciding factor was due to the speed of implementation, and I chose the latter. A Raspberry Pi was chosen due to its low power consumption, at the expense of program execution speed.
The major con for this approach is that I am not comfortable to keep the Raspberry Pi running for extended periods of time without any form of supervision. However, this arrangement is suffice for the MVP.
While there are many possible options out there, I chose to go with a supported distro to avoid unnecessary complications. As such, I am using the Raspbian Jessie Lite distro.
I downloaded the image, and used Rufus USB to make a copy of it onto the SD card. Once done, insert the SD card into the Raspberry Pi and turn it on.
Once I logged in as the default sudoer (pi), I changed the keyboard setting to a US layout (using $ sudo raspi-config), and installed the Git package on it ($ sudo apt-get install git).
I initially connected to the Internet via a wireless USB dongle, but poor network connectivity made me switch to a wired connection.
The main functionality of GitHub is to act as a source code repository with versioning control. The versioning control feature is used for data backup within the same repository.
Another source code repository (BitBucket) was considered as an “off-site” backup – but it was not implemented due to the simplicity of the site.
There are two main Python Scripts – those that query and process the raw data, and others that format the information into the appropriate Markdown format required for GitHub Pages.
After which, the updated webpages (i.e. the Markdown files) are pushed to the GitHub repository using git commands.
Python (and its Packages)
I installed Python3 and its corresponding package installer Pip, alongside the preinstalled Python2.7.
$ sudo apt-get install python3 $ sudo apt-get install pip3
Using Pip, I installed the BeautifulSoup4 package.
$ sudo pip3 install beautifulsoup4
Automation Script and Job Scheduling
During the development phase, the scripts were individually executed from the terminal. However, this method is not very sustainable as it required the user intervention (i.e. me) for the update to execute. I wrote a simple shell script to execute the scripts and git commands in the correct order, and set it to be cron‘ed every 2 hours between 10am to 6pm. This timing is based on the opening hours of the operator, where changes to the course listings and/or availability occur.
# Scheduling of Paddle-SG automation script 0 10-18/2 * * * /path/to/script.sh
Some helpful sites for crontab configuration are:
Web Hosting Configuration
GitHub Page allows pages to be hosted directly from a GitHub repository. This feature must be enabled for the desired repository by going into Settings > “GitHub Pages”.
I pointed the “www.paddlesg.com” domain to the GitHub repository, and had to add a CNAME file into the repository that contains the same string. Thankfully, GitHub makes this all very simple within the Settings > “GitHub Pages” section – just enter the domain name under the “Custom Domain” section and it takes care of the rest.
With user experience in mind, I want the site to be accessible via the “paddlesg.com” URL as well. To do this, I added a URL Redirect Record to the DNS to redirect “paddlesg.com” to “www.paddlesg.com”.
Last but not least, I tested the both URLs to ensure that the were bringing me to the right website.
All the steps listed above document the current back-end setup for Paddle-SG. Over time, I hope to make additional changes so as to improve the UX of the site.