Big Data Series

Posted on za 02 september 2017 in Academics

I've just added three blog posts I made during the Big Data bachelor course given at the Radboud university. As a master's student I'm allowed to take on one or two bachelor courses if there's a good reason... because no other course really goes into Spark, hadoop and Scala I figured it would be a nice addition to the Python-heavy curriculum. Not that I dislike Python, of course.

There are three posts in total:

Hadoop and the HDFS - an introduction to hadoop and HDFS. Spark - On looking at a Kaggle competition data set in Spark The class project: A solo project about submitting code to a national research cluster and running queries against 1.73 billion web pages.

You can find the posts here: Big Data Series

I learnt a lot and finished the class project with a 9.5, so hoped to share it.