Now, “big data” is a hot word, not only in United States, but also in China.

But, what is big data?

People always said, I have TB or PB degrees of data, this is a big data.

To me, above 4GB is big data.

The reason is: The memory of my laptop is 4GB.

I often use R to analysis my data.

R is a tool, mainly use memory to save my data and compute some interesting things.

When I get data about 5.18GB, and I want to load the whole data, what can I do?

I think this is a good chance to use so-called big data tools to analysis my data.

So, I start my exploration trip about spark.

Here is my simple introduction of spark in python.

Welcome your advice and suggestion!

Just record, this article was posted at linkedin, and have 50 views to November 2021.