An Excuse for MongoDB – Analyzing 10 Years of Activity Schedules

While tidying up 10 years worth of digital documents from a previous volunteer role, I found my excuse to get familiar with MongoDB. For this post, I will primarily focus on answering several “business” related questions. In a subsequent post, I will elaborate further on the technical side of the project.

Background

After I graduated from secondary school (aka high school), I volunteered to return as a mentor for one of the Co-Curricular Activities (aka school club). The activities comprised of a weekly session, and at least 2 local camps.

While I enjoyed my 14 years stint, I always wondered how things have changed over time. With over a decade of weekly schedules, I decided to see if I can identify interesting trends, and answer some questions that I often pondered about.

Overview

Here is a list of simplified concepts before we begin:

  • Cohort – A logical grouping of first year students that joined the school club. They are given a code word from the list of NATO phonetic alphabets for a sense of belonging and ease of reference. A cohort lasts for 4 years starting from the year of joining.
  • Weekly activity schedule – a simple document that shows the key activities of each cohort for the given session.
  • Activity – The smallest measurable unit. Every activity has the same basic information (e.g. duration, topic), and a set of optional fields (e.g. venue, subtopic, person in charge).

As each activity does not have a fixed format, I decided to use a document-orientated database to store and query the records given its nature. After doing a landscape survey, I decided on MongoDB Atlas as it offered a simple and quick cloud offering that suited my needs.

Data Quality

The range of data is between June 2004 and June 2017. There is no data for the following years:

  • Second half of 2008
  • The entire year of 2014

Questions to Answer

1. What was the average duration of the weekly session across the years?

From 2004 to 2006, the average weekly session decreased from 4 hours to 3.5 hours. This average maintained for several years, before declining again from 2010.

The lowest average was recorded in 2013, where each weekly session was only around 2.47 hours. However, the average increased from 2015 to 2017, with the last recorded value at 3 hours.

{ "_id" : 2004, "averageDurationHours" : 4 }
{ "_id" : 2005, "averageDurationHours" : 3.607142857142857 }
{ "_id" : 2006, "averageDurationHours" : 3.5 }
{ "_id" : 2007, "averageDurationHours" : 3.5 }
{ "_id" : 2008, "averageDurationHours" : 3.5 }
{ "_id" : 2009, "averageDurationHours" : 3.5 }
{ "_id" : 2010, "averageDurationHours" : 3.3375 }
{ "_id" : 2011, "averageDurationHours" : 3.234375 }
{ "_id" : 2012, "averageDurationHours" : 2.85 }
{ "_id" : 2013, "averageDurationHours" : 2.477272727272727 }
{ "_id" : 2015, "averageDurationHours" : 2.5 }
{ "_id" : 2016, "averageDurationHours" : 2.909090909090909 }
{ "_id" : 2017, "averageDurationHours" : 3 }

2. What is the cumulative duration spent, per cohort, in their school club activities?

The findings should be taken with a grain of salt as missing data in 2008 and 2014 skews the results for cohorts who were active during those years.

The PAPA cohort came in first (213 hrs), while the LIMA and OSCAR tied in second place (199.25 hrs). QUEBEC followed closely (199 hrs) and NOVEMBER was slightly further back (187.75 hrs).

{ "_id" : "india", "totalActivityDurationHours" : 64.75 }
{ "_id" : "juliet", "totalActivityDurationHours" : 118.25 }
{ "_id" : "kilo", "totalActivityDurationHours" : 176.5 }
{ "_id" : "lima", "totalActivityDurationHours" : 199.25 }
{ "_id" : "mike", "totalActivityDurationHours" : 174.5 }
{ "_id" : "november", "totalActivityDurationHours" : 187.75 }
{ "_id" : "oscar", "totalActivityDurationHours" : 199.25 }
{ "_id" : "papa", "totalActivityDurationHours" : 213 }
{ "_id" : "quebec", "totalActivityDurationHours" : 199 }
{ "_id" : "romeo", "totalActivityDurationHours" : 146 }
{ "_id" : "sierra", "totalActivityDurationHours" : 103 }
{ "_id" : "tango", "totalActivityDurationHours" : 115.75 }
{ "_id" : "uniform", "totalActivityDurationHours" : 126.5 }
{ "_id" : "victor", "totalActivityDurationHours" : 128.25 }
{ "_id" : "whiskey", "totalActivityDurationHours" : 83.75 }
{ "_id" : "xray", "totalActivityDurationHours" : 13.5 }

3. Which cohort had the longest leadership position?

After three and a half years, students take on leadership positions that see them assuming responsibilities for all most club activities (e.g. maintenance of logistics, teaching of juniors, upkeep of administrative records). Based on changing academic schedules, the exact date when this happens differ across the cohorts

TANGO cohort recorded the most hours in leadership roles (64 hrs), followed by NOVEMBER (63.5 hrs), INDIA (58 hrs), and UNIFORM (57 hrs). When interpreting the following results, keep in mind that missing data in 2008 and 2014 will result in some cohorts having very low leadership stints.

{ "_id" : { "cohortName" : "india", "topic" : "leadership role" }, "totalActivityDurationHours" : 58 }
{ "_id" : { "cohortName" : "juliet", "topic" : "leadership role" }, "totalActivityDurationHours" : 54 }
{ "_id" : { "cohortName" : "kilo", "topic" : "leadership role" }, "totalActivityDurationHours" : 49 }
{ "_id" : { "cohortName" : "lima", "topic" : "leadership role" }, "totalActivityDurationHours" : 19.25 }
{ "_id" : { "cohortName" : "mike", "topic" : "leadership role" }, "totalActivityDurationHours" : 41.5 }
{ "_id" : { "cohortName" : "november", "topic" : "leadership role" }, "totalActivityDurationHours" : 63.5 }
{ "_id" : { "cohortName" : "oscar", "topic" : "leadership role" }, "totalActivityDurationHours" : 55.5 }
{ "_id" : { "cohortName" : "papa", "topic" : "leadership role" }, "totalActivityDurationHours" : 42 }
{ "_id" : { "cohortName" : "quebec", "topic" : "leadership role" }, "totalActivityDurationHours" : 47.5 }
{ "_id" : { "cohortName" : "romeo", "topic" : "leadership role" }, "totalActivityDurationHours" : 17.5 }
{ "_id" : { "cohortName" : "sierra", "topic" : "leadership role" }, "totalActivityDurationHours" : 5 }
{ "_id" : { "cohortName" : "tango", "topic" : "leadership role" }, "totalActivityDurationHours" : 64 }
{ "_id" : { "cohortName" : "uniform", "topic" : "leadership role" }, "totalActivityDurationHours" : 57 }

4. What is the most frequently used venue for each year?

There was some diversity in the most frequently used venue during the initial years of the data set. However, in the second half of the data, it had mainly defaulted to the same location. Looking back, I can think of two reasons for this trend: 1) the school was undergoing renovations in late 2000s which caused some usual venues to be unavailable, and 2) after the renovation, the quadrangle provided the most convenient location for teachers to interact with the students.


{ "_id" : { "venue" : "quadrangle", "year" : 2004 }, "count" : 20 }
{ "_id" : { "venue" : "backlane", "year" : 2005 }, "count" : 40 }
{ "_id" : { "venue" : "backlane", "year" : 2006 }, "count" : 43 }
{ "_id" : { "venue" : "outside np room", "year" : 2007 }, "count" : 39 }
{ "_id" : { "venue" : "outside np room", "year" : 2008 }, "count" : 28 }
{ "_id" : { "venue" : "classroom", "year" : 2009 }, "count" : 44 }
{ "_id" : { "venue" : "quadrangle", "year" : 2010 }, "count" : 48 }
{ "_id" : { "venue" : "quadrangle", "year" : 2011 }, "count" : 59 }
{ "_id" : { "venue" : "quadrangle", "year" : 2012 }, "count" : 62 }
{ "_id" : { "venue" : "quadrangle", "year" : 2013 }, "count" : 75 }
{ "_id" : { "venue" : "quadrangle", "year" : 2015 }, "count" : 66 }
{ "_id" : { "venue" : "quadrangle", "year" : 2016 }, "count" : 120 }
{ "_id" : { "venue" : "quadrangle", "year" : 2017 }, "count" : 51 }

5. Who are the top 10 individuals who had the longest duration of being an activity in charge?

Each activity has one or more individuals who are in charge of it. For generic activities, the field is either left empty, or had a cohort instead of individual name(s). For the former, each individual got the same number of hours counted towards them, while records for the latter were not counted.

For privacy purposes, the names have been obfuscated. Though it is interesting to note that all individuals came from a time when the average duration of a weekly session is at least 3.5 hrs (see question #1).

{ "_id" : "Individual 1", "totalActivityDurationHours" : 55 }
{ "_id" : "Individual 2", "totalActivityDurationHours" : 49 }
{ "_id" : "Individual 3", "totalActivityDurationHours" : 46.5 }
{ "_id" : "Individual 4", "totalActivityDurationHours" : 46.5 }
{ "_id" : "Individual 5", "totalActivityDurationHours" : 44.5 }
{ "_id" : "Individual 6", "totalActivityDurationHours" : 44.5 }
{ "_id" : "Individual 7", "totalActivityDurationHours" : 44.25 }
{ "_id" : "Individual 8", "totalActivityDurationHours" : 43.25 }
{ "_id" : "Individual 9", "totalActivityDurationHours" : 43.25 }
{ "_id" : "Individual 10", "totalActivityDurationHours" : 42 }

Opinions

Given the increasing focus in academics, it will be unlikely that the average duration per weekly session will go above 3 hours. This means that future numbers would not exceed those that were during the period from 2004 to 2009.

However, not all hope is lost as the club’s teachers can create more opportunities for students to participate in club activities, allowing them to increase the average in years to come.

Summary

I heard about NoSQL databases and MongoDB for quite a while now, but had rarely found the need to use a document DB for my personal projects. I am glad that this little analysis project gave me the opportunity to do so.

Technically, I could use a relational database to achieve the same results, but it would require some upfront planning and designing to avoid having to change the schema halfway through the project. If this project were to scale, it might be impossible to understand and prepare for all possible combinations.

In this project’s following post, I’ll explain the document schema and the MongoDB queries used for the analysis.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s