Most of the big-data related posts if not all of them in the free flowing web only demonstrate a part of the overall process. For me, big-data processing begins where the source data is being originated from. And that place starts with a web-browser. In order to be a fully-informed data-engineer, I firmly believe, requires not just command line + programming + SQL skills but also web development, APIs, data viz. skills and beyond.
In that spirit, I wanted to contribute my share in bringing forth all stages of development that are required in making an idea take shape, almost life-like.
This is an overview of multi-part posts in this effort. We cover the following in the upcoming blog posts,
Develop a web-front-end (a.k.a. client) application to capture data.
People like to put labels but in my opinion, you learn the necessary to get the task done. In our case, learn, build, fix and repeat!!
Anyhow, to cover all this ground without losing sight, we better pick an interesting usecase that piques our interest. What’s better than talking about Movies ….
Let’s download IMDB Dataset and explore a tad bit using the code below.
Download dataset
Unzip dataset
Explore dataset
Save this code to imdb_process.py and run as,
The output looks like as showed below. (only few rows are showed …)
Okay, we’ve got our baseline setup. We’ll explore building a web-application in the upcoming Part-1 of this series. Stay tuned!!