Scimore Blog
News from the Scimore factory

Indexing 500 millions log entries, and searching the Google way

February 29, 2012 11:52 by scimore
Massive logs are quite common at large enterprises. It is very much used for support, and is essential for running the business. Some industries e.g. finance services must retain activity logs for ever. This drives 2 problems: for supporters - make it easy access/search logs, for administrator: where to store it, and make it available on demand. Imagine a single point/hub, where supporter or developer can type the search query for each log type and get the response instantly. Avoiding logon to machine, copying files, and, manually searching, it's a big saver to have. And, it seems the market is currently booming on the topic.

We have been looking into the problem for a while, and finally, it seems we got it right. Our solution is to combine row-based engine (the traditional RDBMS) with columnar/freetext storage (CLucene) in a distributed environment. Row-based storage engine is ScimoreDB native format, utilizing B+Tree and linear scalability, capable of storing billions of rows at a rate of 500K rows/sec. Hence, Lucene indexing rate is slower (down to 15K per CPU core), which is expected, since it has very powerful query engine, and does more work when indexing; still, the speed is quite all right. To deal with the massive logs, we summarize to the 6 steps to follow to implement scalable log indexing and searching solution:

1. Normalize the Log database: in log table, store attribute Id's rather than value. Attribute values store on separate tables. Have a single index (primary clustered) per table. Remember, you are not going to search the log storage, the freetext index will be used for the search. The goal of normalization is to make as compact as possible the log storage to safe the disk and memory.

2. Use HDD and SSD as a stable storage: HDD to store database tables and SSD for freetext indexes. Freetext indexes is much more hungry for random/high disk IO's, while row-based engine will do much fewer IO's, and, it will be sequential/batched reads/writes making HDD good choice so far.

3. Freetext. Index as denormalized data views. Instead indexing attribute Id's, index the values. Denormalized view could be an UNION ALL of multiple JOIN'ed SQL statements. Following SQL example at the end of the article explains in details.

4. Freetext. Index the most recent data, and, drop no longer used freetext indexes. For example, you can create 2 freetext indexes: one for January 12, and, another for February 2012. When you no longer need searches for January log, drop it. If you will need it again, you can create a new freetext index, and, index January again.

5. Distribute database among CPU's or machines. For example, take a single box with Xeon, 16 cores. Then, create 15 scimoredb nodes, such, each CPU core, will service a single node. In such environment, the log data will be partitioned among 15 distinct nodes. And, when executing search, 15 CPU's will be searching local indexes in parallel. It will be 15 times faster, than a single server on the box. If you still need more speed, you can go further and add more machines to the database cluster.

6. Security. Users, who restricted to login to production server and read the log files or access the SQL database, must not be able to do so in log search solution. ScimoreDb supports windows integrated security, and can grant access to Log tables for the particular windows user or the AD group.

What could you do more with the logs? How about using them to answer BI queries, like the number of errors per day/hour, product application loggings per day per version and much more... To compare with the aditional OLAP, the multiple dimension tables can be replaced with a single freetext index that JOIN's with Log table (Fact).

A simple walk through example to cover installing, indexing and searching...
Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

ScimoreDB TPC-C benchmarks: comparing with Clustrix, MySQL

December 18, 2011 12:00 by Marius

Recently, I had a chance to evaluate ScimoreDB on a single box: running Windows 2008 Server R2, and hardware: CPU 64 cores, 256 GB RAM, 100 GB Intel SSD. The machine was kindly provided by Bayonette AS in Norway with their new cloud effort called Hydra.

First, I installed the database and run the load test that executes a simple "SELECT 1" SQL command. To my big surprise, the stress tool, running 60 threads, could not achieve more than 60K queries per second. The server utilized 6% of CPU, 200 Mbs network. As it seems, there was a plenty of power left, I assumed, I can start one more stress tool, and throughput should double. After starting the second stress tool, the CPU was 35%, network 200 Mbs, and, still the same - 60K queries / sec. Something just eat 29% of CPU! Going through performance counters revealed that it was spin locks, used by the critical sections. Even, when critical section locks as small as code block around a single IF statement - the 64 threads, each running on separate CPU, sooner or later will catch up each other and collide - leading to 1 thread issuing a heavy semaphore lock, while other 63 threads keeps spinning and wasting the CPU. I found a quite good article by VoltDB , describing why traditional databases have a difficulty to scale. According VoltDB, we could end up wasting more than 90% of resources for not database related tasks. This gave me an idea to try to pursue the VoltDB like path: instead of running one giant database per machine, run many distinct databases utilizing a fewer CPU's in a single distributed database cluster. To verify the theory, I have installed a single database using max 7 CPU's. This time, instead of executing SELECT 1, I execute a stored procedure that queries the table of 1 mln. rows on a primary key. A single stress tool executed 90K queries/sec using 8% of CPU. After launching additional stress tool, the throughput increased to 120K and CPU 11%. Now, everything looks right, since the CPU usage follows the throughput. Later, I discovered, it was network bottleneck, preventing to reach higher numbers.

So far, I have done the simple tests. But how about doing something more realistic? TPCC benchmark seems like a good candidate, especially we have it ready and other database vendors do it too. As I already learnt, I can't use a single ScimoreDb instance, instead, I install 8 instances (we call them nodes too) on the machine, and join them to the cluster. Each node configured to use 15GB of RAM and 7 CPU's. Next, I create TPCC distributed database, populate 400 warehouses and restart all nodes to anticipate the data cache. Finally, I run the TPCC test application and compare the results with MySQL and Clustrix TPCC workloads, created by Percona. The TPCC client application tests with 64, 256 and 1024 concurrent threads (user sessions). For each set of user sessions, I gather the number of new order transactions (NOT) executed in 10 seconds; the same way Percona does.

Throughput Comparison:
------------------------------------------------------------------
                Throughput per threads - NOT/10sec.
                 ----------------------------------------------------
System                 64   256   1024
------------------------------------------------------------------
Clustrix - 3 Nodes  2673 3298 3121
Clustrix - 6 Nodes  4583 7590 7022
Intel SSD Server    1880 1946 1798
HP / FusionIO       2608 2287 1554
ScimoreDB            3661 3850 3375
-----------------------------------------------------------------
While ScimoreDB performance for 1024 user sessions slightly drops, the overall results seems still quite all right, to compare with MySQL and Clustrix 3 nodes.

ScimoreDB can scale quite well on a large server using VoltDB approach. However, you need to be careful choosing right database setup. The rule of thumb, if machine has more than 16 cores, install multiple nodes, each utilizing 7 CPU's and join them to a cluster. The client application can connect to any ScimoreDB node, or, list the nodes in a connection string for load balancing purpose

I encourage everyone to try our TPCC benchmarks.


ScimoreDB version 4.0 released

October 25, 2011 11:17 by scimore

After 2 years of development, ScimoreDB version 4.0 is now officially released! New features enhance distributed cluster management, scalability and programmability.

ScimoreDB version 4.0 adds new scalability capabilities to meet new standards by growing NewSQL market; it includes significant enhancements in cluster management scaling reads and writes of large active datasets, monitoring, and self-healing. Now, scaling ScimoreDB is a matter of simple SQL command. Use SQL, to add or remove nodes, while cluster continues to be operational.

The ScimoreDB 4.0 adds new feature: the ability to prioritize the queries. It helps to deal with the issues like - one query kills all. The database dynamically prioritizes queries, and, when query becomes too aggressive, the engine will decrease ("throttle") the priority, preventing to capitalize the database power for a single query.

Database interoperability features improved schema and data importing from SQL Server, and, ScimoreDB platform independent .net provider targeting .NET 2.0 or .NET 4.0 frameworks.

The ScimoreDB is ideal for developers who meet scalability issues, especially when scale-up legacy RDBMS is no longer an option.

 


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

YesSQL! keynote at Community Day 2011

June 2, 2011 16:19 by scimore

We gave a keynote on 26th of May, on the state of ScimoreDB v.4 together with a live database scalling demo. Check out embedded video below (running ~40 min., demo begins at 20 min.)

 

 

 


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Distributed Database v4.0 wiki released

March 29, 2011 16:45 by scimore

A year ago a few of us in Scimore were kicking around ideas for what features we should add into the next version of distributed database. The result is v4.0 Scimore distributed, a simple to use, fault tolerant, SQL, ACID compliant database. There are a bunch of reasons why we think it’s cool, but our favorite is how it helps people to store a large amount of data using SQL.

We launched a beta version with a wiki. To download database, see http://www.scimore.com/wiki and refer to installation page.

We'd love to know what you think.


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

A step by step example of v4.0 distributed/elastic database

December 27, 2010 14:58 by scimore

ScimoreDB is shared nothing distributed database system. It can run 100s' of nodes, and, is highly available with toleration to the hardware failures. Database horizontally distributes data among partitions. Each partition has multiple data nodes (scimoredb.exe), where data is replicated within nodes in the partition. The partition is also called partition group to indicate there are group of the nodes in a partition. If one or more node(s) become unavailable, the remaining partition nodes will vote to take them offline in the cluster (as long as voters can make the majority).  

The limitations: cluster can have maximum 1024 nodes; the partition can have max 32 nodes. The maximum partitions are limited only by max nodes. So, if partitions have only 1 node, there can be max 1024 partitions per cluster. 

How partitioning works? The cluster horizontally partition rows according partition columns' hash value (partition columns set in CREATE TABLE command). The cluster uses hash range to identify the partition owning the row. The maximum clusters' hash range is [0, 1024[, therefore hash value is additionally modulus divided by 1024. Each partition group is responsible for the particular hash range. For example, a cluster with a single partition group, the partition hash range will be [0, 1024[. With 2 partition groups, the ranges are [0,512[ and [512, 1024[. To see partition hash ranges, execute SQL: select * from system.sysinstances, columns v_lower,v_upper.   

 Cluster management

In order for cluster to work at least 1 partition must be created. To create partition, install scimoredb node(s) and then join to the cluster using “CREATE CLUSTER” SQL command.  

Here is a step by step example:

 

1. Installing GUI and DB nodes:

 Download the latest 4.* version MSI. Download path: http://www.scimore.com/download/release/4.0/latest choose either x86 or x64 platform MSI to install. During installation in server configuration dialog set your machine name and port 999. After installing, on your machine will be installed: a single node (scimoredb.exe runs as service), administrator/manager GUI and .NET provider. Next, install 5 more scimoredb nodes. To do so , create ”c:\data\db1”,”c:\data\db2”, ”c:\data\db3”, ”c:\data\db4”, ”c:\data\db5” folders.  Then, in command prompt (NOTE: for win7/vista command prompt run as administrator) run following commands:  

scimoredb.exe -startup/instance=1 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db1" -db/systables="C:\data\db1" -db/data="C:\data\db1" -net/endpoint="localhost:1000" 

scimoredb.exe -startup/instance=2 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db2" -db/systables="C:\data\db2" -db/data="C:\data\db2" -net/endpoint="localhost:1001" 

scimoredb.exe -startup/instance=3 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db3" -db/systables="C:\data\db3" -db/data="C:\data\db3" -net/endpoint="localhost:1002" 

scimoredb.exe -startup/instance=4 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db4" -db/systables="C:\data\db4" -db/data="C:\data\db4" -net/endpoint="localhost:1003" 

scimoredb.exe -startup/instance=5 -startup/console=1 -startup/mode=1 -db/syslog="C:\data\db5" -db/systables="C:\data\db5" -db/data="C:\data\db5" -net/endpoint="localhost:1004" 

The commands will create additional db nodes on the local machine. The ”-net/endpoint="localhost:xxx" parameter indicates the db nodes' connection attributes. When installing node usually connection parameter is ”machineName:port”, such any node from any machine can access it. In the current example, we use ”localhost:xxx” since all cluster nodes will be running on a single machine (which is only for demo purposes). Additionally, parameter –db/cachePage=x sets how many pages (page size is 8kb) database will cache. Default is 3000, which is low for big tables.For more how to setup cluster node and startup parameters check: http://www.scimore.com/blog/post/2010/02/Distributed-database-Installation.aspx  Scimoredb.exe requires ”scimore.data.scimorenativeclient.dll” and ”scimoreagent.exe”, so, before installing node on other machine, copy 3 binaries together. Next, start the new nodes, by executing in command prompt: 

net start scimoredb-1

net start scimoredb-2

net start scimoredb-3

net start scimoredb-4

net start scimoredb-5

2. Creating cluster.

Open managers ‘query window connected to ”localhost”, port ”999”. The managers' query window may return errors (.. node is in stand-by mode..), ignore it, since the most SQL command will fail, if node is not joined to partition group. In query window execute SQL command: 

create cluster endpoint('localhost:999')

go;

add partition group     

add endpoint('localhost:999'),     

add endpoint('localhost:1000'));

go;

commit cluster; 

The command will create a cluster with a single partition containing 2 nodes. To verify cluster state execute: 

select * from system.sysinstances; 

,or, 

select * from system.syscluster; syscluster table is a logging table, that shows all cluster changes. It can also   be used to track failover actions, by checking the most recent cluster against previous version.

3. Altering cluster

Allows adding new partitions, add/remove nodes or splitting existing partition.  a. For example, in a single SQL command, we add 2 new nodes to existing partition and, add new partition with another 2 nodes. Execute following SQL: 

alter cluster;

go;

add partition group repartition with 0

    

add endpoint('localhost:1001')   

,add endpoint('localhost:1002'));

alter partition group 0(      

add endpoint('localhost:1003')    

,add endpoint('localhost:1004'));

commit cluster; 

”add partition group...” will create a new partition (new id 1). ”...repartition with 0” indicates that partition (0) will split data with the new partition (1), so partitions' (0) hash range will change. b. Splitting partition.

In the current example, the partition (0) manages 4 nodes ( can be seen: select * from system.sysinstances): 

localhost:999

localhost:1000

localhost:1003

localhost:1004 

We can split the partition (0) to 2 partitions. Where nodes:

localhost:999

localhost:1000 , remains on partition (0). 

and,

localhost:1003

localhost:1004

nodes will belong to the new partition (2). Note, the command is instant, no data movement is involved. Simply, the nodes will not ”see” rows that don’t belong anymore to the altered partitions' hash range. The SQL command: 

alter cluster;

split partition group 0

commit cluster; 

4. Dropping a node from partition.

To remove a node from the cluster, execute drop node SQL command: 

alter cluster;

alter partition group 0 (     

drop endpoint('localhost:1000'));

commit cluster; 

Drop the node from partition (0). The dropped node goes to stand-by mode and will shut down shortly. If node should be added to cluster again, full node reinstall   is required. 

Creating databases and tables.

To create database, tables read: http://www.scimore.com/doc2/Data_Definition.html 

New feature - allow relation partitioning. This is used when parent table is partitioned on one column and linked with child table(s) partitioned on parents' auto identity column.

En example, execute SQL:

create database test_hierarchy_dist;

go;

use test_hierarchy_dist;

go;

create table t3(name char not null primary key partition,id autobigint not null unique );

go;

create table t3_1(id bigint not null partition,info char,uid autobigint not null primary key,constraint sk foreign key (id) references t3(id));

go;

create table t3_1_1(id bigint not null partition,info2 char,uid autobigint not null primary key,constraint sk foreign key (id) references t3_1(uid)) 

In the example above, T3 table partitioned by the column <name>. The ”AutoBigInt” column <id>, generates auto unique values, the value is is unique and, additionally, it encodes <name> columns' hash value. Therefore, child table T3_1 will be as well partitioned by T3(name), even if it is set to partition by ID field which is linked with T3(ID) ”AutoBigInt” field. For example, execute script to populate rows:

use test_hierarchy_dist;

go;

declare @i int, @name char, @r1 bigint, @r2 bigint 

begin tran

set @i = 0

while @i < 10000

begin

set @name = concat('m',@i)

insert into t3(name) values(@name)

set @r1 = scope_identity()

insert into t3_1(id,info) values(@r1,concat('v',@r1))

set @r2 = scope_identity()

insert into t3_1_1(id,info2) values(@r2,concat('z',@r2))

insert into t3_1_1(id,info2) values(@r2,concat('z',@r2))

set @i = @i +1

end

commit 

Next, verify SQL:  

use test_hierarchy_dist;

go;

declare @total int

set @total =  select count(*) from t3 join t3_1 on (t3.id = t3_1.id) join t3_1_1 on (t3_1.uid = t3_1_1.id)

if @total <> 20000 THEN RAISE EXCEPTION 'test failed' 

Third, check if the following query will be executed only on a single node (the partition calculated by the WHERE clause name=value condition). 

use test_hierarchy_dist;

go;

select t3.* from t3 join t3_1 on (t3.id = t3_1.id) join t3_1_1 on (t3_1.uid = t3_1_1.id)where t3.name = 'm1' 

.NET provider connection string.

Client may connect to any node in a cluster. Here is a link about connection strings: http://www.scimore.com/howto/getstarted/connection-strings/  (skip embedded DB part).  To make a proper and scalable connection, we recommend combining all top nodes from all partitions in connection string. For example, for cluster: 

create cluster endpoint('localhost:999');

go;

add partition group(

add endpoint('localhost:999')

,add endpoint('localhost:1000'))

;go;

add partition group(

add endpoint('localhost:1001')

,add endpoint('localhost:1002'));

go;

add partition group(

add endpoint('localhost:1003')

,add endpoint('localhost:1004'));

go;

commit cluster 

The connection string should be:”server=localhost:999&localhost:1001&localhost:1003; database=mydb” 

The .NET client will load balance the connections over 3 nodes of 6 node cluster.   


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

The new distributed version (beta) is on the way.

September 24, 2010 13:39 by scimore


We are now working on version 4.0 – focusing on the ScimoreDB Distributed.

The main goals is to make ScimoreDB

  • Build in fault tolerance, no more shared storage and clustering software required.
  • Continuously running, dynamically add/remove machines while staying online
  • Massive scalable (100+ machine clusters)
  • User controlled clustering topology, enabling designing clusters for extreme safety, high writes and/or high reads.

With the improvements we want to maintain our current:

  • ACID, to ensure the safety of your data.
  • SQL / T-SQL, to keep low learning entry barrier.
  • ADO.NET interface
  • Upgrade from embedded/server to distributed by simple backup/restore

We are very interested in feedback and are looking for people who will try the early stages.

Please send us an email at support@scimore.com and we will reply with the details.


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Support stored procedure OUTPUT parameters

May 19, 2010 12:22 by scimore

Since version 3.*.*.1933 stored procedures can return output parameters. Use optional OUTPUT keyword to indicate that the parameter is a return parameter. The OUTPUT variable must be defined during procedure creation as well as during use of the parameter.

Example using output parameters:

-- create procedures
create procedure _output_inner_test1 ( @myparam1 int,@myparam2 int output,@myparam3 int output)
as
begin
set @myparam2 = @myparam1 + 100
set @myparam3 = @myparam1 + 1000
end;

create procedure output_test1
as
begin
declare @in int = 100, @p1 int,@p2 int
execute _output_inner_test1 @in,@p1 OUTPUT, @p2 OUTPUT
if @p1 <> 200   RAISE EXCEPTION 'incorrect output'
if @p2 <> 1100 RAISE EXCEPTION 'incorrect output'
select @p1,@p2
end;

--run test
exec output_test1


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Recursive Queries with CTE (Common Table Expressions)

May 10, 2010 15:20 by scimore

Recursive Queries with CTE (Common Table Expressions)

Since 3.*.*.1927 db supports recursive CTE queries. Using recursive CTE it is possible to perform SQL queries on hierarchical data. The SQL syntax is identical to SQL Server and there are many articles about it. For example:

http://sqlblog.com/blogs/linchi_shea/archive/2009/04/16/recursive-sql-queries-how-do-they-work.aspx
http://msdn.microsoft.com/en-us/library/ms186243.aspx

Hierarchical relations can be expressed as a recursive single tables' join. For example, consider table:

create table objects
(
id int not null primary key,
name varchar,
parent_id int
);

And insert:

insert into objects(id,name,parent_id) values(1,'Cars',null);
  insert into objects(id,name,parent_id) values(2,'Porsche',1);
    insert into objects(id,name,parent_id) values(3,'911',2);
    insert into objects(id,name,parent_id) values(4,'Boxster',2);
    insert into objects(id,name,parent_id) values(5,'Cayman',2);
    insert into objects(id,name,parent_id) values(6,'Cayenne',2);
    insert into objects(id,name,parent_id) values(7,'Panamera',2);
  insert into objects(id,name,parent_id) values(8,'Aston Martin',1);
    insert into objects(id,name,parent_id) values(9,'DB7',8);
    insert into objects(id,name,parent_id) values(10,'DB9',8);
    insert into objects(id,name,parent_id) values(11,'Vantage',8);
    insert into objects(id,name,parent_id) values(12,'One',8);

Here we have created hierarchy using parent_id field that refers to the parent row: "Cars" is the root (no parent), 2 brands (Porsche and Aston Martin) belongs to "Cars" and each brand has a list of models. Now, define the recursive CTE SQL to read all porsche models:

WITH Cars(id,parent_id,name,path,level) AS
(
   --initialization. read porsche root
   SELECT id,parent_id,name,name,0 as level
   FROM objects
   WHERE id = 2 -- porsche root
   UNION ALL
   --recursive execution
   SELECT o.id,o.parent_id,o.name,concat(c.path,'/', o.name), c.level+1
   FROM cars c
      INNER JOIN objects o ON c.id = o.parent_id
)
select * from cars;

The result:
id    Parent_id  Name        Path                     Level
----------------------------------------------------------------------------------------
2                1  Porsche     Porsche                      0
3                2  911           Porsche/911               1
4                2  Boxster     Porsche/Boxster          1
5                2  Cayman    Porsche/Cayman         1
6                2  Cayenne   Porsche/Cayenne         1
7                2  Panamera Porsche/Panamera       1

Example using parameterized query and ordering:

Declare @id int
set @id = select id from objects where name = 'porsche';

WITH Cars(id,parent_id,name,path,level) AS
(
   --initialization. read porsche root
   SELECT id,parent_id,name,name,0 as level
   FROM objects
   WHERE id = @id
   UNION ALL
   --recursive execution
   SELECT o.id,o.parent_id,o.name,concat(c.path,'/', o.name), c.level+1
   FROM cars c
      INNER JOIN objects o ON c.id = o.parent_id
)
select * from cars order by name;


Tags:
Categories:
Actions: E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed

Distributed database: Installation

February 4, 2010 14:05 by scimore

Hi,

My name is Marius Slyzius and I will be posting a series of the blobs about distributed database:

1. Installing cluster.
2. Create schema. Replicated/Partitioned tables. Execute distributed queries. DQL query plans.
3. Designing "true" shared-nothing schema. Pros/Cons
4. Setup failover/failback

Today is about how to install the cluster.

ScimoreDB distributed is a database where data is partitioned or replicated among X nodes. The node is a database server what was joined to a cluster by SQL command "ALTER CLUSTER". The number of nodes can be 2^n-1 = {3,7,15,31,63,...}. This number of nodes needed to form the binary tree, where a  node can send requests to 2 its childs and the childs to 2 their childs. For example we have cluster of 7 nodes and the client sends the query to node 4. In such case, the node 4 will send the query to childs 3,5 , and, each child will send to its childs 3->1,2 and 5 ->6,7. Everything happens in parallel. It is  similar to map/reduce, except that the node 4 don't need to send the request to all nodes, only to 2 its childs. The advantage of what is that we can do  reduce in parallel as well. For example, "select count(*) from mytable" sent by client to the node 4. Node 4 will start counting rows and sends SQL to 3 and 5 nodes.  Node 3 will start counting rows and pass to 1 and 2 nodes to count. Nodes 1 and 2 will count rows , but don't send sql anymore, since there are no childs.  The same will happen with node 5. Now, when node finished counting rows, the aggregation phase starts. In parallel, node 3 and 5 will receive the  counted rows from their childs and sum it with local row count. Each node did aggregation of local rows in parallel, and aggregation of aggregated data from  (1,2,6,7) on 3 and 5 in parallel too. Finally 4 will aggregate its counted rows with the aggregations from 3 and 5 nodes.

ScimoreDB uses DQL language to define how distributed query is executed. SQL is always converted to DQL. Before executing DQL on distributed DB it 
will be optimized for distributed query, for example, select rows only from a single node or all, add merge aggregators, stream splitters, etc...
To see DQL for SQL, press show query plan button in query window in the manager.

So, how to create distributed DB? First, you need to install db servers. I have created 4 batch files (to install 3 servers, uninstall 3 servers , start 3,and stop 3). All DB servers will be managed on a single PC, however, it could be 3 different machines preferably. The batch files located in "c:\db" folder together with scimoredb.exe (install ScimoreDB version 3 on machine and copy exe from there).

Install.cmd:
mkdir node1
mkdir node2
mkdir node3
scimoredb.exe -startup/instance=0 -startup/mode=1 -startup/console=1 -net/endpoint="localhost:999"    -db/cachepages=5000 -db/systables="c:\db\node1" -db/syslog="c:\db\node1"  -db/data="c:\db\node1"
scimoredb.exe -startup/instance=1 -startup/mode=1 -startup/console=1 -net/endpoint="localhost:1000"  -db/cachepages=5000 -db/systables="c:\db\node2" -db/syslog="c:\db\node2"  -db/data="c:\db\node2"
scimoredb.exe -startup/instance=2 -startup/mode=1 -startup/console=1 -net/endpoint="localhost:1001"  -db/cachepages=5000 -db/systables="c:\db\node3" -db/syslog="c:\db\node3"  -db/data="c:\db\node3"

Parameters:
-startup/instance={0,1,2} - instance id is the registry key to separate startup parameters of each DB on the same pc. HKEY_LOCALMACHINE\Software\Scimore\Nodes\[0] ... [2]. Also, instance id is added to the end of DB service name. {scimoredb-0,scimoredb-1,scimoredb-2}. Note, instance id's has nothing to do with the nodes in the cluster. If you install databases on distinct pc's set instance=0 for all them.

-net/endpoint="localhost:999-1001" - for single DB server, used only port the database listens. "localhost" has no meaning yet. Database listens on all IP addresses. However, when you join databases to the cluster, each database will become node and connect to each using  net/endpoint host:port value. In my case all 3 runs on the same machine, so it’s fine. But, if nodes are on different machine you need to specify correct either IP:port or host:port. Use "host:port" if IP might changes. Using "IP:port" you can specify different IP for comm between nodes and client/node (f.x. infinity band over socket network) for faster communication between nodes than clients to node(s). If the node can't be accessed anymore you need to update [endpoint] value for this node on all nodes. The information is stored in "system.sysinstances" table(nothing to do with -startup/instance). Again, if at least one node inaccessible you will not be able to connect and update the table with new values. To resolve it, start each node with the following parameters (first kill/stop node) and from cmd:
>scimoredb.exe -startup/instance=0 -startup/console=1 -net/mode=1 -net/endpoint="localhost:999"
When node starts it will ignore cluster setup because of net/mode=1 parameter and start single server listening on 999 port. Then, connect with the manager and update table sysinstances in system catalog with correct IP/Host/Port values: update system.systables set [endpoint] = "newIP:port" where instanceid=x.  Here, " instanceid" is not instance we used to install, but it is the node ID. Repeat it for each node. When done, restart services and you should be able to connect again.

-db/cachepages=5000 - number of the data pages to cache. Each page is 8kb, more you add better it is. The value can be changed in the registry, followed by service restart.

Uninstall.cmd:
scimoredb.exe -startup/instance=0 -startup/mode=2 -startup/console=1
scimoredb.exe -startup/instance=1 -startup/mode=2 -startup/console=1
scimoredb.exe -startup/instance=2 -startup/mode=2 -startup/console=1

-startup/mode=2 - tells to uninstall and delete data files. Startup/mode=20 will uninstall, but leave data files. Note: systables will contain cluster info and later if you install on top new instance without data overwrite, you won't be able to connect to db, because it will "think" it is in the cluster and try to access to other nodes listed in sysinstances table.

Start.cmd: start 3 databases on local pc.
net start scimoredb-0
net start scimoredb-1
net start scimoredb-2

Stop.cmd:
net stop scimoredb-0
net stop scimoredb-1
net stop scimoredb-2

Lets make a cluster now:
1. Stop scimoredb-0 service. Run uninstall.cmd. Check if scimoredb.exe process is not running, if it does kill it and run script again. Because, I have installed scimoredb with MSI and it did installed scimoredb-0 (0-instance), so I uninstall it. I could just create new instances {1,2,and 3} and leave 0 running.
2. Run install.cmd. The script will create 3 databases and each database will place system tables under Node1/2/3 folder. Then install finished, verify if NodeX contains files and if services scimoredb-{0,1,2} created.
3. Run start.cmd. Check if 3 scimoredb.exe processes running. Start manager and verify the connection to each database.
4. With manager connect to localhost/ port 999(database 0 on my pc) and in query window execute SQL command:
alter cluster
(
add endpoint 'localhost:1000',
add endpoint 'localhost:1001'
)
NOTE: make sure endpoint 'localhost:1000' matches endpoint value used when installed databases (-net/endpoint="localhost:999...).

Restart the manager. Connect to any database. Under "Cluster" tree node in the manager you will see 3 servers for each database/node you connected. Your 3 databases now in a cluster and you are ready to create databases/tables.

If you have trouble to create cluster, ideas/suggestion please don't hesitate to contact me:

m a r i u s     a t     s c i m o r e     d o t     c o m