Happy Hour: 아파치 스파크: 차세대 빅 데이터?

2014년 1월 21일 화요일

Apache Spark: The Next Big Data Thing? http://blog.mikiobraun.de/2014/01/apache-spark.html

Spark의 basic abstraction은 Resilient Distributed Datasets (RDDs)이다.

Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs.
https://github.com/twitter/scalding

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

Discretized Streams: A Fault-Tolerant Model for
Scalable Stream Processing
http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf

Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
http://storm-project.net/

Immutability, MVCC, and garbage collection
http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-and-garbage-collection/

Happy Hour