Support unavailable
Please try again later

Python MapReduce Programming with Pydoop

by Simone Leo for EuroPython 2011

Hadoop is the leading open source implementation of MapReduce, Google’s large scale distributed computing paradigm. Hadoop’s native API is in Java, and its built-in options for Python programming – Streaming and Jython – have several drawbacks: the former allows to access only a small subset of Hadoop’s features, while the latter carries with it all of the limitations of Jython with respect to CPython.

Pydoop is an API for Hadoop that makes most of its features available to Python programmers while allowing CPython development. Its core consists of Boost.Python wrappers for Hadoop’s C/C++ interface.

The talk consists of a MapReduce/Hadoop tutorial and a presentation of the Pydoop API, with the main goal of bridging the gap between the Hadoop and Python communities. A basic knowledge of distributed programming is helpful but not strictly required.

Video

Do you have some questions on this talk? Leave a comment to the speaker!

New comment


Language
EN
Duration
90 minutes (inc Q&A)
Our Sponsors
Spotify
Python Experts
SSL Matrix
Wanna sponsor?