Contents |
Every executing SQL query will be converted to internal query execution plan, expressed as Scimores' DQL language. DQL (Distributed Query Language) is a near-intuitive language that provides experienced developers with a very powerful tool for maximizing the efficiency of database updates or queries. Less experienced or novice developers will also appreciate that it's easy to learn and quick to get started with. SQL (Structured Query Language) defines the methods used to create and manipulate relational databases so that queries can be performed, but DQL gives developers the power to decide how these queries are best performed. This can greatly improve the efficiency of performing distributed queries involving databases hosted on multiple machines or, for example, when tables are opened on one machine but sorted on others. Such scenarios are perfectly normal in today's world of web-orientated database-driven applications, and judicious use of DQL can dramatically improve their performance.
DQL is a pipeline-based query language. "Pipelining" means that multiple inputs (or processes) are transparently combined to produce one output or result. In this context, a process is a C function or class identified by a name, for example, "Join".
In order to execute DQL using SQL interface, use EXECUTE DQL command:
EXECUTE dql BEGIN ...place here DQL text.... END
DQL is essentially a set of processes which have different functions to produce results, and which can be combined according to one or more query definitions.
A process represents a relatively complex communication between standard well-defined classes that control process behaviour. Almost all classes are hierarchically related to each other, but it is the top-level classes that really define the output. All classes communicate internally through a standard representation of a communication layer.
DQL includes an industry standard left-to-right bottom-up parser that conforms to standard practice. It takes full advantage of plug-in technology, which makes it easy to distribute newly developed functions/processes throughout an enterprise, and to develop more advanced functions. DQL syntax descriptions rely on the following notation:
| Token | Production |
|---|---|
| batch | command * command batch; |
| command | function * function to * function operators * function to operators |
| command | command on |
| function | text funcparams; |
| funcvalue | handle * constant; |
| funcparams | funcvalue * funcvalue ',' funcparams; |
| on | 'on' constant; |
| to | 'to' handle; |
| handle | '@' text; |
| operators | operator * operator operators; |
| operator | filter * project * restrict * index * with * 'set' set; |
| with | 'with' conjunction * 'with' order * 'with' constant; |
| set | condition * condition ',' set; |
| filter | 'filter' 'by' disjunction; |
| project | 'project' 'by' projectlist; |
| restrict | 'restrict' 'to' projectlist; |
| index | 'index' 'by' number; |
| disjunction | conjunction * conjunction 'or' disjunction; |
| conjunction | condition * condition 'and' conjunction * '(' conjunction ')'; |
| condition | column comp column; |
| projectlist | column * column ',' projectlist; |
| column | handle '.' text * constant * parameter; |
| constant | '\ text '\; |
| parameter | '?' text; |
| order | column ordertype * column ordertype ',' order; |
| ordertype | ASC * DESC |
| comp | '=' * '!=' * '<' * '<=' * '>' * '>='; |
The process supports several predefined process operators. Process behaviour is controlled by one or more operators, which in DQL are called Process Operators. These are provided from DQL process definitions and made available during the process initialisation phase.
[[file::operators.jpg]]
If you wish to extend the functionality of the process, you can add more defined operators, but this does require detailed knowledge of writing C++ functions. The DQL provides operators for modify, query records:
| Operator | Function |
|---|---|
| project by (projectlist*) | The project by operator is used to list the attributes desired in the output stream of a query. The attributes can be fields of any input/output streams, functions or constants. More simply, "project by list of columns to return to output" (like select colx ... in SQL). |
| restrict to start, number of rows | Restrict allows you to limit output to a specific number of records. If query performs sorting, the "restrict to" operator can have very dramatic effect on query cost. Judicious selection of good parameters can significantly reduce the time taken to process queries. The “restrict to” operator is convenient to use for navigational queries to retrieve desired window of records of processes output stream. Consider the results of search page. First time user execute query, page retrieve [0,10] records of 1000 found, second ]10,20] and so on. The goal is to avoid transfer records not relevant to user. |
| filter by (disjunction*) | Filter rows according to conditions |
| with (conjunction*) | Seek matching rows using index , the index id must be defined using <index by> operator. Example: with myfield > 0 |
| index by number | optimizer hint that index to use. Identified by index id (sysindexes(id)) + 100. |
| set (set*) | Modify fields in a stream |
| ON 'partition group id' | Identify the partition group by (sysinstances). The partition group set by distributed query optimizer. When executing DQL from SQL interface, skip declaration. |
Please note an asterisk corresponds to DQL notation of detailed syntax.
DQL Server highlights the following Data Manipulation Language (DML) topics.
DQL Server supports the following data modification statements.
To insert data into a table, we either specify a record to be inserted or an input stream whose result set of records to be inserted. The syntax of the “insert” process, which cause data values to be inserted as one row:
| Statement | Function |
|---|---|
| insert 'path' | Specify the path of DQL table to insert a single row. |
| set (set*) | Assign values to the collection of all columns of the table. |
The following 'insert' process inserts multiple records into a table:
| Statement | Function |
|---|---|
| insert_bulk @T1,'path' | More generally, we might want to insert records based on the result set of an input stream @T1. |
| set @this.FieldName1 = @T1.Field1, …, @this.FieldNameN = @T1.Field2 | Value to be assigned to column can be either field of input stream, constant or function expression. |
Example1
Inserting a single row to the CUSTOMER table, database MYDB:
EXECUTE dql BEGIN INSERT 'mydb.customer' SET @this.Name = 'Marius', @this.LastName = 'Slyzius', @this.Birthdate = '19/01/1972' END
Example2
Inserting a set of rows (bulk insert).
EXECUTE dql BEGIN open_table 'mydb.account' TO @T1 insert_bulk @T1, 'mydb.customer' SET @this.Name = @T1.Name, @this.FullName = concat(@T1.Name,'-', @T1.LastName), @this.Birthdate = '19/01/1972' END
For each row of ouput @T1, new row will be inserted into mydb.customer table with the values according set operator.
The 'delete' process is expressed in the much same way as an open_table. Except, all successor records will be deleted instead pushed to an output stream. In addition, the 'delete' process can accept an input stream(s) that references the table and a result set which records are to be deleted. In this case, we can delete records that match join condition with other table and to perform multitable deletion in single operation as well. The simple 'delete' clause:
| Required | Statement | Function |
|---|---|---|
| [yes] | delete 'table path' | Specify internal table to delete row(s). |
| [no] | with (conjunction*) | Delete rows using index search. |
| [no] | filter by (disjunction*) | Search condition is table scan |
| [no] | restrict to 'start', 'size' | Limit the number of the rows to update. |
Another approach of 'delete' process:
| Statement | Function |
|---|---|
delete_view @T1, ..., @TN
oid @T1.[primarykeyfield1],..,@T1.[primarykeyfieldN],'mydb.mytable1' oid @TN.[primarykeyfield1],..,@TN.[primarykeyfieldK],'mydb.mytableN' | Deletes from single or multiple tables. [OID] operator references each deleting table. The operators' fields must define tables' be primary key value(s). |
Example1 Delete first 100 customers who has born between ]1940, 1960[:
DELETE 'mydb.customer' ON 'marius2k' WITH @this.Birthdate < '01/01/1960' AND @this.Birthdate > '01/01/1940' RESTRICT TO '0', '100'
Example2
Delete rows from 2 tables matching criteria: orders.product_id = products.product_id
SQL:
DELETE FROM products, orders WHERE orders.product_id = products.product_id
DQL:
EXECUTE dql BEGIN open_table 'mydb.orders(xlock_row)' TO @TT1 INDEX BY '100' project BY @this.[order_id],@this.[product_id],@this.[##rid] open_table 'mydb.products(xlock_row)' TO @TT2 INDEX BY '100' join_inner 'TT1','TT2' TO @TT3 WITH @TT1.[product_id]=@TT2.[product_id] FILTER BY ( @TT1.[product_id]=@TT2.[product_id] ) project BY @TT1.[order_id],@TT1.[product_id],@TT1.[##rid],@TT2.[product_id] AS u1, @TT2.[##rid] AS u2 delete_view 'TT3' oid @TT3.[u1],@TT3.[u2],'mydb.products' oid @TT3.[order_id],@TT3.[##rid],'mydb.orders' END
You must specify magic field [##rid] , that is RID. Because of MVCC, row id's are used to identify the row version.
In certain situation, we may wish to change value of the column in a record. For this purpose the update process can be used. We can choose the records to be updated using conditional operators 'with' and/or 'filter by' applied to the process. The syntax of the update process is:
| Required | Statement | Function |
|---|---|---|
| [yes] | update 'table path' | Specify internal table to update rows with new values set by 'set' operator. |
| [yes] | set @this.Field1='value1', ..., @this.FieldN='valueK' | Assign new values to the fields we are updating. Value to be assigned to a column can be either any field of record we updating, constant or function expression. |
| [no] | with (conjunction*) | search rows by index |
| [no] | filter by (disjunction*) | search rows by table scan, or within range returned by [with] operator. |
| [no] | restrict to 'start', 'size' | Limit the number of the rows to update. More... |
The 'update_view' process can modify multiple pipelined input records in the same manner like 'delete_view' process. Syntax of 'update_view' process is:
| Statement | Function |
|---|---|
update_view @T1, ..., @TN
oid @T1.[primarykeyfield1],..,@T1.[primarykeyfieldN],'mydb.mytable1' oid @TN.[primarykeyfield1],..,@TN.[primarykeyfieldK],'mydb.mytableN' | The rules for input stream(s) appears to be the same as for delete_view process. [OID] operators define the updating tables. |
| set @T1.Field1 = 'value1', ..., @TN.FieldN = @T1.Field1 | Assign new values to the fields we are updating on any input stream. Value to be assigned to a column can be either any field of any input stream record we updating, constant or function expression. |
Example
Update [products] and [orders] tables matching [orders.product_id = products.product_id] criteria.
SQL:
UPDATE products, orders SET products.name = 'new name', orders.ammount = 0 WHERE orders.product_id = products.product_id
DQL:
EXECUTE dql BEGIN open_table 'mydb.orders(xlock_row)' TO @TT1 INDEX BY '100' project BY @this.[order_id],@this.[product_id],@this.[ammount],@this.[##rid] open_table 'mydb.products(xlock_row)' TO @TT2 INDEX BY '100' join_inner 'TT1','TT2' TO @TT3 WITH @TT1.[product_id]=@TT2.[product_id] project BY @TT1.[order_id],@TT1.[product_id],@TT1.[ammount],@TT1.[##rid],@TT2.[product_id] AS u1,@TT2.[name],@TT2.[##rid] AS u2 update_view 'TT3' SET @TT2.[name]='new name', @TT1.[ammount]=0 oid @TT3.[u1],@TT3.[u2],'mydb.products','TT2' oid @TT3.[order_id],@TT3.[##rid],'mydb.orders','TT1' END
To retrieve single or multiple rows from the table use 'open table' process:
| Required | Statement | Function |
|---|---|---|
| [yes] | open_table 'path' to @myoutput | This opens a specified internal table to retrieve the selected output. The selected records are pushed to output stream, named '@myoutput'. Thus, the use of @myoutput.FieldName would indicate a reference to value of column used by operators defined within process definition or by adjacent pipeline process pair. The result will be sent to the client, when output stream name is @Results. |
| [no] with (conjunction*) | The 'with' operator corresponds to the retrieval of matching records to output stream by using index. | |
| [no] | filter by (disjunction*) | Apply filter condition. When 'with' operator is skipped, the full table scan is performed to push to output stream records thus satisfying the filter's selection criteria, causing the query execution cost to be quite costly. In practice combination of 'with' and 'filter by' operators indicate much better performance than use only single filter operator. |
| [no] | restrict to 'start', 'size' | Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned. |
| [no] | project by (projectlist*) | Use this operator to build result set that contain only fields you want. Field names can be repeated or omitted. There are performance benefits to excluding field names. This is especially true if only a few columns are needed from a table with a large number of fields. If 'project by' operator has not been declared within process, all fields of table are included into output stream. |
Examples: Consider the following collection of a table and indexes: table: {Id, Name, LastName, Birthdate, Occupation}, located at 'd:\memodb\data\customers.dat' with indexes: primary index1: {Id} secondary index2: {Name} secondary index3: {Birthdate}
Example1 Similar to the SQL statement: select * from mydb.customers, that display all attributes and records of the table:
EXECUTE dql BEGIN open_table 'mydb.customers' TO @Results END
Example 2 Specifies filter criteria which must meet for rows to be included in the query results. A range search returns all records where birthday between [1972, 1980]. Note: the full table scan is performed even if there is an appropriate secondary index:
EXECUTE dql BEGIN open_table 'mydb.customers' TO @Results FILTER BY @this.Birthdate < '01/01/1980' AND @this.Birthdate > '01/01/1972' project BY @this.Name, @this.LastName END
Example 3
Demonstrates similar to (example 2) a range query by using secondary index on {BirthDate}:
EXECUTE dql BEGIN open_table 'mydb.customers' TO @Results WITH @this.Birthdate < '01/01/1980' AND @this.Birthdate > '01/01/1972' INDEX BY 102 project BY @this.Name, @this.LastName END
Example 4
Limit query output to 10 rows, starting from 5th row:
EXECUTE dql BEGIN open_table 'mydb.customers' TO @Results WITH @this.Birthdate < '01/01/1980' AND @this.Birthdate > '01/01/1972' INDEX BY 102 project BY @this.Name, @this.LastName RESTRICT TO '5', '10' END
The DQL offers the user some control over the order in which records in the process output are displayed. The 'sort' process causes the rows of an input stream to appear in sorted order on the processes output stream.
| Required | Statement | Function |
|---|---|---|
| [yes] | sort @T1 to @output | Sort the input stream @T1 to temporary stream @output. |
| [yes] | with (order*) | The 'with' operator enlist the attributes to sort of an input's stream. To specify the sort order, we may specify DESC for descending order or ASC for ascending order. Furthermore, ordering can be performed on multiple attributes. |
| [no] | restrict to 'start', 'size' | Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned. Note: that the 'restrict to' operator can have very dramatic effects on query optimisation. Judicious selection of good parameters can very significantly reduce the time taken to sort input stream. |
Sort Examples
Consider the table: customer on {Name, LastName, Birthdate}.
Example1 Suppose we wish to list entire customer table in ascend order of Name.
EXECUTE dql BEGIN open_table 'mydb.customer' TO @T1 project BY @this.Name, @this.LastName, @this.Birthdate sort @T1 TO @Results WITH @T1.Name ASC END
Example 2
If several customers have the same Name, we order them in descending order by LastName.
EXECUTE dql BEGIN open_table 'mydb.customer' TO @T1 project BY @this.Name, @this.LastName, @this.Birthdate sort @T1 TO @Results WITH @T1.Name ASC, @T1.LastName DESC END
The 'group_sorted' process causes the rows of an input stream to appear in the groups into which output rows are to be placed and calculates a summary value of an aggregation function for each group. The group attribute is a list of columns on which grouping is performed. In addition, 'group_sorted' process expects the input stream to be sorted on a grouping attribute(s).
| Required | Statement | Function |
|---|---|---|
| [yes] | group_sorted @T1 to @T2 | Group the input stream @T1 by simple sorting approach - we receive stream @T1 sorted on the grouping attribute and then scan it to compute the result of each aggregate operation for each step and then push it to output stream @T2. |
| [yes] | project by (projectlist*) | The 'project by' operator declares grouping attribute(s) and aggregate function(s) to be computed for each group. The grouping attribute may contain multiple columns, functions or constants. Once again, @T1 input stream MUST be sorted by grouping attribute(s). The sort order can be any combination of ascending or descending orders of grouping attributes. |
| [no] | filter by (disjunction*) | At times, it is useful to state a condition that applies to groups rather than to records. To express such a query, we use the 'filter by' operator (like in SQL - having by clause). Predicates in the 'filter by' operator are applied after the formation of groups, so results of aggregation functions may be used to validate if groups can appear into output stream. |
| [no] | restrict to 'start', 'size' | Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned. |
Aggregate Examples Consider the table: customer on {Name, LastName, Birthdate} with primary index on {Name}.
Example1 At times, we wish to treat entire table as a single group. In such cases, we do not use a group_sorted process. The following query retrieves count of rows of the customer's table:
EXECUTE dql BEGIN open_table 'mydb.customer' TO @Results project BY COUNT() END
Example2
Consider the query (Fetch all distinct names of customers and the count of customers share that name):
EXECUTE dql BEGIN open_table 'mydb.customer.dat' TO @T1 group_sorted @T1 TO @Results project BY @T1.Name, COUNT('*') END
In the example above the grouping attribute is a column Name. As the rows are already clustered by Name, we can skip heavy sort process and immediately group @T1 stream rows. Sometimes, when table contains large number of records and group queries are often, you can benefit query performance by choosing a primary index on grouping attribute(s) and therefore skipping temporary sort.
Example3 Consider another query (Fetch all distinct names of customers, shared it more than once):
EXECUTE dql BEGIN open_table 'mydb.customer.dat' TO @T1 group_sorted @T1 TO @Results project BY @T1.Name, COUNT('*') AS total FILTER BY @this.total > 1 END
The total attribute is a reference to value of a count aggregate function per each group. The behavior is similiar to SQL- HAVING BY clause.
Example 4
Consider the query (Fetch all distinct birth dates of all customers and the number of customers share it):
EXECUTE dql BEGIN open_table 'mydb.customer' TO @T1 project BY @this. Birthdate sort @T1 TO @T2 WITH @T1.Birthdate ASC group_sorted @T2 TO @Results project BY @T1.Birthdate, COUNT('*') END
In the example above the grouping attribute is a column Birthdate. The @T1 stream rows are sorted by Name, but the group process expects rows to be sorted by Birthdate, therefore, in addition, we sort @T1 stream to temporary stream @T2 by the column Birthdate, and then group it.
The 'union' process combines multiple input streams into a single output stream. It is the simplest union operation without eliminating any duplicates. The number of records in output stream is equal to sum of number records in each input stream.
| Required | Statement | Function |
|---|---|---|
| [yes] | union @T1, .., @T(N) to @output | There can be up to 8 input streams for 'union' process where they will be combined by simple approach - read all rows from first input until reached end, then from the second input stream and so on. Also if some records are identical, they will not be removed by 'union' process and, therefore, we have to know in advance that all records are distinct in order to return correct result set.
Note: the 'project by' operator is missing, but it is used internally and constructed by inheriting all attributes from the first input's 'project by' operator. Therefore, all input streams must have equal number of attributes and of the same data type according to their positions in the stream |
| [no] | restrict to 'start', 'size' | Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned. |
Union Examples
The SQL query:SELECT * FROM customer WHERE Name IN ('Marius', 'Peter')
in the DQL semantic:
EXECUTE dql BEGIN open_table 'mydb.customer' TO @T1 WITH @this.Name = 'Marius' project BY @this.Name, @this.LastName, @this.Birthdate open_table 'mydb.customer' TO @T2 WITH @this.Name = 'Peter' project BY @this.Name, @this.LastName, @this.Birthdate UNION @T1, @T2 TO @Results END
Both @T1 and @T2 fetch matching rows by using primary B+Tree index on {Name}. When they deliver result set, we combine them into single output @Results, knowing, in advance, that all records are distinct.
The join process dedicated to combine the contents of two tables and produce a result set that incorporates rows and columns from each table. Tables are typically joined using data that they have in common, by, so called, a join condition. The comparison operator in a join condition determines how the two sides of the condition are evaluated and which records returned. The most common comparison operator is equivalence (=), however, range (<, >) operators can be part of join condition too.
The DQL supports a set of join processes, suitable for a specific type of query:
If an index is available on the inner loop's join condition, 'join_inner' process can be used.
| Required | Statement | Function |
|---|---|---|
| [yes] |
| The left-hand stream @T1 must be pipelined and for each record in @T1, the index is used to look up rows in right-hand stream @T2 that will satisfy the join condition. In case of 'join_outer', if the matching rows not found in @T2, the null values for @T2 will be projected. |
| [yes] | with (conjunction*) | The 'with' operator corresponds to the join condition of 'join_inner' process. As record is received for the left-hand side of the join, it is used to create the index key value for the right-hand side stream @T2 and to fetch matching join condition rows by using index on @T2. The join condition right-hand side attributes must satisfy index selection rules and use only AND statements, exactly like in the open_table process. |
| [no] | filter by (disjunction*) | For each matched pair records of @T1 and @T2, the 'filter by' operator evaluates attributes of both streams if they satisfy filter condition. Any attribute of both streams can be a part of filter's condition. |
| [no] | restrict to 'start', 'size' | Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned. |
| [yes] | project by (projectlist*) Use this operator to build result set @output that combine any attributes of both inputs. |
Inner join example
Indexed Nested-Loop Join (using index, the most common join algorithm):
Consider two tables:
customer {Name, LastName, accId}, with two indexes on :{accId} and {Name}
account {accId, balance}, with index on {accId}
For SQL query:
SELECT c.Name, c.LastName, c.accId, a.balance FROM customer c, account a WHERE c.Name='Marius' AND a.balance < '2500' AND c.accId = a.accId
DQL:
EXECUTE dql BEGIN open_table 'mydb.customer' TO @T1 WITH @this.Name = 'Marius' open_table 'mydb.account' TO @T2 INDEX BY 100 join_inner @T1, @T2 TO @Results WITH @T2.accId = @T1.accId FILTER BY @T2.balance < '2500' project BY @T1.Name, @T1.LastName, @T1.accId, @T2.balance END
Note: the (@T2.balance < '2500') condition cannot be declared within with operator, otherwise it will violate index selection rules for primary key. Also NEVER declare filter operator within right-hand stream process (@T2), always use filter condition with join_inner process.
The merge join requires that both inputs be sorted on the merge columns, which are defined by the equality clauses of the join condition. Because each input is sorted, the Merge Join operator gets a row from each input and compares them. For example, for inner join operations, the rows are returned if they are equal. If they are not equal, whichever row has the lower value is discarded and another row is obtained from that input. This process repeats until all rows have been processed. Merge join itself is very fast, but it can be an expensive choice if explicit sorting is required. However, if the inner input is large and they are sorted by join columns, merge join is often the fastest available join algorithm.
| Required | Statement | Function |
|---|---|---|
| [yes] | join_merge @T1, @T2 to @output | Both @T1 and @T2 inputs must be sorted by join fields. |
| [yes] | with (conjunction*) Define the join condition. Only (=) operator is permitted. | |
| [no] | filter by (disjunction*) | For each matched pair records of @T1 and @T2, the 'filter by' operator evaluates attributes of both streams if they satisfy filter condition. Any attribute of both streams can be a part of filter's condition. |
| [no] restrict to 'start', 'size' Restrict output to desired window of records of processes output stream. By skipping this operator all records will be returned. | ||
| [yes] | project by (projectlist*) | Use this operator to build result set @output that combine any attributes of both inputs. |
Merge join example
Since both inputs are sorted by join column {accId}, we can rewrite DQL to use merge join process instead.
EXECUTE dql BEGIN open_table 'mydb.customer' TO @T1 WITH @this.Name = 'Marius' open_table 'mydb.account' TO @T2 join_merge @T1, @T2 TO @Results WITH @T2.accId = @T1.accId FILTER BY @T2.balance < '2500' project BY @T1.Name, @T1.LastName, @T1.accId, @T2.balance END