อ่าน 3 นาที
Jaql
Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data.
Jaql
| Jaql | |
|---|---|
| Paradigm | Functional |
| Designed by | Vuk Ercegovac (Google) |
| First appeared | October 9, 2008 |
| Stable release | 0.5.1 / July 12, 2010 |
| Implementation language | Java |
| OS | Cross-platform |
| License | Apache License 2.0 |
| Website | code |
| Major implementations | |
| IBM BigInsights | |
Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data.
It started as an open source project at Google[1] but the latest release was on 2010-07-12. IBM[2] took it over as primary data processing language for their Hadoop software package BigInsights.
Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML.
A comparison[3] to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies.
Jaql supports[4]lazy evaluation, so expressions are only materialized when needed.
Syntax
The basic concept of Jaql is
source -> operator(parameter) -> sink ; where a sink can be a source for a downstream operator. So typically a Jaql program has to following structure, expressing a data processing graph:
source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ; Most commonly for readability reasons Jaql programs are linebreaked after the arrow, as is also a common idiom in Twitter Scalding:
source -> operator1(parameter) -> operator2(parameter) -> operator2(parameter) -> operator3(parameter) -> operator4(parameter) -> sink ; Core operators
Source:[5]
Expand
Use the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays [[T]] and produces an output array [T], by promoting the elements of each nested array to the top-level output array.
Filter
Use the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause. Example:
data=[{name:"Jon Doe",income:20000,manager:false},{name:"Vince Wayne",income:32500,manager:false},{name:"Jane Dean",income:72000,manager:true},{name:"Alex Smith",income:25000,manager:false}];data->filter$.manager;[{"income":72000,"manager":true,"name":"Jane Dean"}]data->filter$.income<30000;[{"income":20000,"manager":false,"name":"Jon Doe"},{"income":25000,"manager":false,"name":"Alex Smith"}]Group
Use the GROUP expression to group one or more input arrays on a grouping key and applies an aggregate function per group.
Join
Use the JOIN operator to express a join between two or more input arrays. This operator supports multiple types of joins, including natural, left-outer, right-outer, and outer joins.
Sort
Use the SORT operator to sort an input by one or more fields.
Top
The TOP expression selects the first k elements of its input. If a comparator is provided, the output is semantically equivalent to sorting the input, then selecting the first k elements.
Transform
Use the TRANSFORM operator to realize a projection or to apply a function to all items of an output.
See also
ลิงก์ภายนอก
- นิยามของภาษา JAQL
- JAQL ใน Hadoop – บทนำโดยสังเขป
- การเปรียบเทียบกับ HIVE และ PIG เก็บถาวรเมื่อ 2014-03-01 ที่Wayback Machine
- การประมวลผลแบบปรับตัวของข้อมูลรวมที่ผู้ใช้กำหนดใน Jaql
สรุปเนื้อหา
ข้อมูลสำคัญจากบทความ
ข้อมูลสำคัญเกี่ยวกับ Jaql
Jaql (pronounced "jackal") is a functional data processing and query language most commonly used for JSON query processing on big data.
ลิงก์ภายนอก
นิยามของภาษา JAQL JAQL ใน Hadoop – บทนำโดยสังเขป การเปรียบเทียบกับ HIVE และ PIG เก็บถาวรเมื่อ 2014-03-01 ที่ Wayback Machine การประมวลผลแบบปรับตัวของข้อมูลรวมที่ผู้ใช้กำหนดใน Jaql ดึงข้อมูลมาจาก " https://en.wikipedia.org/w/index.php?