Background 背景

Apache Calcite is a dynamic data management framework.

Apache Calcite是一个动态数据管理框架。

It contains many of the pieces that comprise a typical database management system, but omits some key functions: storage of data, algorithms to process data, and a repository for storing metadata.

它包含了许多组成典型数据管理系统的经典模块,但省略了一些关键功能: 数据存储,数据处理算法和元数据存储库。

Calcite intentionally stays out of the business of storing and processing data. As we shall see, this makes it an excellent choice for mediating between applications and one or more data storage locations and data processing engines. It is also a perfect foundation for building a database: just add data.

Calcite有意地远离了存储和处理数据的任务。如我们所见,这使得它成为在应用程序和一个或多个数据存储位置和数据处理引擎之间的最佳中间层选择。它同样也是构建数据库的完美基础选择: 在它的基础上,我们只需要添加数据。

To illustrate, let’s create an empty instance of Calcite and then point it at some data.

下面为了展示说明,我们建立了一个空的Calcite实例并查询数据。

  1. public static class HrSchema {
  2. public final Employee[] emps = 0;
  3. public final Department[] depts = 0;
  4. }
  5. Class.forName("org.apache.calcite.jdbc.Driver");
  6. Properties info = new Properties();
  7. info.setProperty("lex", "JAVA");
  8. Connection connection =
  9. DriverManager.getConnection("jdbc:calcite:", info);
  10. CalciteConnection calciteConnection =
  11. connection.unwrap(CalciteConnection.class);
  12. SchemaPlus rootSchema = calciteConnection.getRootSchema();
  13. Schema schema = ReflectiveSchema.create(calciteConnection,
  14. rootSchema, "hr", new HrSchema());
  15. rootSchema.add("hr", schema);
  16. Statement statement = calciteConnection.createStatement();
  17. ResultSet resultSet = statement.executeQuery(
  18. "select d.deptno, min(e.empid)\n"
  19. + "from hr.emps as e\n"
  20. + "join hr.depts as d\n"
  21. + " on e.deptno = d.deptno\n"
  22. + "group by d.deptno\n"
  23. + "having count(*) > 1");
  24. print(resultSet);
  25. resultSet.close();
  26. statement.close();
  27. connection.close();

Where is the database? There is no database. The connection is completely empty until ReflectiveSchema.create registers a Java object as a schema and its collection fields emps and depts as tables.

数据库在哪里?这里没有数据库。在我们调用ReflectiveSchema.create注册一个java object作为schema,以及这个集合的成员emps和depts作为表之前,connection都是空的。

Calcite does not want to own data; it does not even have favorite data format. This example used in-memory data sets, and processed them using operators such as groupBy and join from the linq4j library. But Calcite can also process data in other data formats, such as JDBC. In the first example, replace

Calcite并不想管理数据,它甚至没有标准的数据格式。上面的例子使用了内存数据集,并且使用linq4j libaray的groupBy和join操作来对他们进行处理,但Calcite同样也支持以其他标准数据格式对数据进行处理,例如JDBC。在上面的例子中,将下面的代码

  1. Schema schema = ReflectiveSchema.create(calciteConnection,
  2. rootSchema, "hr", new HrSchema());

with

替换成

  1. Class.forName("com.mysql.jdbc.Driver");
  2. BasicDataSource dataSource = new BasicDataSource();
  3. dataSource.setUrl("jdbc:mysql://localhost");
  4. dataSource.setUsername("username");
  5. dataSource.setPassword("password");
  6. Schema schema = JdbcSchema.create(rootSchema, "hr", dataSource,
  7. null, "name");

and Calcite will execute the same query in JDBC. To the application, the data and API are the same, but behind the scenes the implementation is very different. Calcite uses optimizer rules to push the JOIN and GROUP BY operations to the source database.

Calcite就可以通过JDBC来执行同样的查询了。对应用来说,数据和API不会产生任何变化,但底层的实现却差异巨大。Calcite使用优化规则来将JOIN和GROUP BY操作下推到源数据库中进行执行。

In-memory and JDBC are just two familiar examples. Calcite can handle any data source and data format. To add a data source, you need to write an adapter that tells Calcite what collections in the data source it should consider “tables”.

基于内存和基于JDBC只是两个大家较为熟悉的例子。Calcite可以处理任意一种数据源和数据格式。如果想要增加数据源,你需要编写一个适配器来告诉Calcite,它应该将数据源中的什么集合视为“table”来进行操作。

For more advanced integration, you can write optimizer rules. Optimizer rules allow Calcite to access data of a new format, allow you to register new operators (such as a better join algorithm), and allow Calcite to optimize how queries are translated to operators. Calcite will combine your rules and operators with built-in rules and operators, apply cost-based optimization, and generate an efficient plan.

如果想要进一步的集成,我们可以编写自己的优化器规则。优化器规则允许Calcite来处理新格式的数据,并注册新的算子(如更优化的join算法),同时还允许Calcite来对查询转化为算子的过程进行优化。Calcite会结合用户提供的规则和算子和系统内建规则和算子,执行基于成本的优化,生成高效的执行计划

Writing an adapter 编写适配器

The subproject under example/csv provides a CSV adapter, which is fully functional for use in applications but is also simple enough to serve as a good template if you are writing your own adapter.

Calcite在example/csv子项目下提供了CSV的适配器。它能很好地支持应用程序的功能需求,如果你正在编写自己的适配器,它也能作为一个足够简单的例子来作为参考模板。

See the tutorial for information on using the CSV adapter and writing other adapters.

有关使用CSV适配器和编写其他适配器的信息,请参阅Tutorial。

See the HOWTO for more information about using other adapters, and about using Calcite in general.

请参阅HOWTO了解更多关于使用其他适配器的信息,以及如何使用Calcite。

Status 现状

The following features are complete.

以下特性已经完成

  • Query parser, validator and optimizer

    查询解析器、验证器和优化器

  • Support for reading models in JSON format

    支持读取JSON格式的模型

  • Many standard functions and aggregate functions

    许多标准函数和聚合函数

  • JDBC queries against Linq4j and JDBC back-ends

    针对Linq4j和JDBC后端的JDBC查询

  • Linq4j front-end

    Linq4j前端

  • SQL features: SELECT, FROM (including JOIN syntax), WHERE, GROUP BY (including GROUPING SETS), aggregate functions (including COUNT(DISTINCT …) and FILTER), HAVING, ORDER BY (including NULLS FIRST/LAST), set operations (UNION, INTERSECT, MINUS), sub-queries (including correlated sub-queries), windowed aggregates, LIMIT (syntax as Postgres); more details in the SQL reference

    SQL特性:SELECT, FROM (包括JOIN语法), WHERE, GROUP BY (包括GROUPING SETS), 聚合函数 (包括COUNT(DISTINCT…) 和FILTER), HAVING, ORDER BY(包括NULLS FIRST/LAST), 集合操作 (UNION, INTERSECT, MINUS), 子查询(包括相关子查询), 窗口聚合函数, LIMIT (Postgres语法); SQL reference章节中提供了更详细的信息

  • Local and remote JDBC drivers; see Avatica

    本地和远程JDBC驱动器,参考Avatica章节

  • Several adapters

    几个适配器

文档更新时间: 2018-07-26 11:54