Apache Flink是一个用于对无界和有界数据流中进行有状态计算的框架和分布式处理引擎。Flink被设计为在所有常见集群环境中运行,在任何规模中以内存速度执行计算。

在这里,我们会解释Flink架构的一些重要方面。

处理无界和有界数据

任何类型的数据都是作为事件流产生的。
Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream.

数据可以被当作无界或有界流进行处理。

无界流

有下界(开始),但没有上界(结束)。它们在提供生成的数据时,不会终止。必须持续处理无界流,例如,事件被探测到(摄入)后,必须及时处理。由于输入是无界的并且不会在任何时间点都完成,所以不可能等待所有输入到达。处理无界数据大多要求事件按照特定顺序摄入。 例如事件发生的顺序,以便能够推断出结果的完整性。

有界流

有下界,也有上界。有界流可以在执行任何计算之前,摄入所有数据。处理有界流不关心摄入顺序,因为有界流可以人工排序。

Apache Flink excels at processing unbounded and bounded data sets. Precise control of time and state enable Flink’s runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance.

Convince yourself by exploring the use cases that have been built on top of Flink.

Deploy Applications Anywhere
Apache Flink is a distributed system and requires compute resources in order to execute applications. Flink integrates with all common cluster resource managers such as Hadoop YARN, Apache Mesos, and Kubernetes but can also be setup to run as a stand-alone cluster.

Flink is designed to work well each of the previously listed resource managers. This is achieved by resource-manager-specific deployment modes that allow Flink to interact with each resource manager in its idiomatic way.

When deploying a Flink application, Flink automatically identifies the required resources based on the application’s configured parallelism and requests them from the resource manager. In case of a failure, Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls. This eases the integration of Flink in many environments.

Run Applications at any Scale
Flink is designed to run stateful streaming applications at any scale. Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster. Therefore, an application can leverage virtually unlimited amounts of CPUs, main memory, disk and network IO. Moreover, Flink easily maintains very large application state. Its asynchronous and incremental checkpointing algorithm ensures minimal impact on processing latencies while guaranteeing exactly-once state consistency.

Users reported impressive scalability numbers for Flink applications running in their production environments, such as

applications processing multiple trillions of events per day,
applications maintaining multiple terabytes of state, and
applications running on thousands of cores.
Leverage In-Memory Performance
Stateful Flink applications are optimized for local state access. Task state is always maintained in memory or, if the state size exceeds the available memory, in access-efficient on-disk data structures. Hence, tasks perform all computations by accessing local, often in-memory, state yielding very low processing latencies. Flink guarantees exactly-once state consistency in case of failures by periodically and asynchronously checkpointing the local state to durable storage.

Architecture Applications Operations

文档更新时间: 2019-07-10 03:53   作者:admin