Apache Spark: The Next Big Data Thing?
http://blog.mikiobraun.de/2014/01/apache-spark.html
Spark의 basic abstraction은 Resilient Distributed Datasets (RDDs)이다.
Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs.
https://github.com/twitter/scalding
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
Discretized Streams: A Fault-Tolerant Model for
Scalable Stream Processing
http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf
Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
http://storm-project.net/
Immutability, MVCC, and garbage collection
http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-and-garbage-collection/
댓글 없음:
댓글 쓰기