2014년 1월 21일 화요일

아파치 스파크: 차세대 빅 데이터?

Apache Spark: The Next Big Data Thing? http://blog.mikiobraun.de/2014/01/apache-spark.html

Spark의 basic abstraction은 Resilient Distributed Datasets (RDDs)이다. 

Scalding is a Scala library that makes it easy to specify Hadoop MapReduce jobs.
https://github.com/twitter/scalding

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for
In-Memory Cluster Computing
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

Discretized Streams: A Fault-Tolerant Model for
Scalable Stream Processing
http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf

Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
http://storm-project.net/

Immutability, MVCC, and garbage collection
http://www.xaprb.com/blog/2013/12/28/immutability-mvcc-and-garbage-collection/

댓글 없음:

댓글 쓰기

Generic interfaces 요점

 https://go.dev/blog/generic-interfaces  Generic interface를 정의할 때 최소한의 제약만을 정의하고 실제 구현체들이 자신만의 필요한 제약을 추가할 수 있도록 하는 것이 좋다. pointer receiver를...