Apache Hadoop Data Capacity Planning

Planning capacity for a Hadoop cluster is not easy as there are many factors to consider – from the software, hardware, and data aspect. Planning a cluster with too little data capacity and/or processing power may limit the amount of operations/analytics that can be run on it, while planning for every possible scenario may be […]

Architecting an Environment to Share and Collaborate with Jupyter Notebooks

Jupyter Notebooks are very useful for developing (and sharing) data analytics. In addition, its flexibility allows it to be used for much more than that – teaching materials, self-learning programming languages, and (re)publication of academic papers and ebooks are other interesting uses. A while back, I helped architect and implement a collaborative environment that allowed […]

Remote Access to a Public Jupyter Notebook Server

Jupyter Notebook is a great way to share documents with other collaborators (e.g. team members) to collaborate on analytic use cases. Of course, it is not only limited to that: The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. […]