Big Data Parallel Programming

7,5 credits

Course objectives:

Processing huge amounts of data is at the core of data mining, deep learning and real-time autonomous decision making. All these are in turn at the core of modern artificial intelligence applications. Data can reside more or less permanently in the cloud and accessed via distributed le systems and/or be streamed in real time from multiple sensors at very high rates. Access to data as well as processing is done using very well engineered frameworks where both storage and processing is done in parallel. The purpose of this course is to introduce you to this infrastructure including parallel programming for the implementation of these frameworks. This should enable you to judge how to choose a framework for your applications, identify pros and cons, suggest and even implement improvements.

Course content:

The course includes modern techniques, methods and tools for distributed storage for static and streamed massive data, for example distributed and fault tolerant key-value tables including replication and coordination mechanisms. The course also includes modern techniques, methods and tools for distributed processing for static and streamed massive data including frameworks such as MapReduce and Spark. Finally, the course includes concepts, methods and tools for parallel programming for computing clusters, including GPUs.

Big Data Parallel Programming

Education occasions