Guide - SQL Editor Commands

Guide - SQL Editor Commands
Introduction
The SQL user guide details the specific SQL syntax required to build the virtual data marts / warehouses queried through the Network Data 
Platform.
The traditional approaches to data integration involve the ELT or ETL processes which incur side effects such as data duplication, the extra 
processing time for the ETL or ELT processes. To commence analytical processing by any enterprise, the data structure needs to be 
harmonised, data types resolved, and relational modeling is required to resolve entities and relationships across the enterprise.
This approach is valid only to build limited static data sources. Today, a large amount of data from heterogeneous data processing systems need 
real-time processing that makes this approach slow, expensive and unresponsive to time-critical business needs.
The Zetaris Network Data Platform is developed to address the needs of the new world of big-fast data. Instead of running analytics after physical 
data ingestion has occurred, the Zetaris Cloud Data Fabric ingests only metadata for the underlying data sources into its Schema Store. The 
Zetaris Cloud Data Fabric provides various commands to ingest or manage this metadata where it can be arranged into virtual data structures 
such as data warehouses or data marts in minutes. 
This guide helps you to explore the data virtualization process, learn how to run queries in the Data Fabric, and improve query performance. Most 
of the commands in this guide can be executed through the Data Platform interface.
Contents
Introduction
Contents
Building a Data Fabric
Register master data source
Add slave nodes for the registered data source (Option for a cluster-based database)
RDBMS Examples
MS SQL Server
My SQL
IBM DB2
Green Plum
Teradata
Amazon Aurora
Amazon Redshift
Register NoSQL data sources
Create Zetaris Cloud Data Fabric Data Base for flat files in a file store(AWS S3, Azure Blob, local file system)
Ingest Metadata
Ingest all tables from the data source
Ingest a table from the data source
Update Schema
Ingest flat files in the file store
Ingest a RESTful service
Update description and materialised table for each relation in a data source
List the tables from the data source
Manage Schema Store
Data Source
Tables
View
Create data source view
Create schema store view
Delete view
Run Query
Materialisation and Cache
Statistics
Table-level statistics
Column-level statistics
Partitioning
User Management
Role-Based Access Control
Privileges
Predefined Roles
Building a Data Fabric
To build a data fabric, you need to identify the database to be connected. The Zetaris Cloud Data Fabric supports all known data sources like 
RDBMS, NoSQL, Rest API, CSV, and JSON.
Register master data source

Ensure that you provide the JDBC driver class, URL and connectivity credentials, and database parameters to register a master data source.

CREATE DATASOURCE ORCL [DESCRIBE BY "Oracle for Product Master"]

OPTIONS (

jdbcdriver "oracle.jdbc.OracleDriver",

jdbcurl "jdbc:oracle:thin:@oracle-master:1521:orcl",

username "scott",

password "tiger",

[schema "system",]

[schema_prepended_table "true",]

[key "value"]*)

Add slave nodes for the registered data source (Option for a cluster-based database)

You can register slave nodes if the registered database supports cluster-based computing such as Massively Parallel Processing (MPP). This

helps the ZetarisNetwork Data Platform to directly query slave nodes rather than running through a master node.

The following example shows the registration of one master node (oracle-master), and three slave nodes (oracle-slave1, oracle-slave2, and

oracle-slave3).

CREATE DATASOURCE FUSIONDB DESCRIBE BY "Zetaris MPP " OPTIONS (

jdbcdriver "org.postgresql.Driver",

jdbcurl "jdbc:postgresql://coordinator:5432/pgrs",

username "admin",

password "password")

ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (

jdbcdriver "org.postgresql.Driver",

jdbcurl "jdbc:postgresql://datanode1:5432/pgrs",

username "admin",

password "password")

ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (

jdbcdriver "org.postgressql.Driver",

jdbcurl "jdbc:postgresql://datanode2:5432/pgrs",

username "admin",

password "password")

ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (

jdbcdriver "org.postgresql.Driver",

jdbcurl "jdbc:postgresql://datanode3:5432/pgrs",

username "admin",

password "password")

To differentiate between the table names across the schema, the schema prepended table is termed as , and the ingested table as true s

.chema name

If the schema is provided, Zetaris Cloud Data Fabric ingests only the metadata from that schema.

RDBMS Examples

MS SQL Server

CREATE DATASOURCE MSSQL DESCRIBE BY "MSSQL-2017-linux " OPTIONS (

jdbcdriver "com.microsoft.sqlserver.jdbc.SQLServerDriver",

jdbcurl "jdbc:sqlserver://localhost:1433 ",

databaseName "DemoData",

username "scott" ,

password "tiger",

schema “dbo”

)

My SQL

CREATE DATASOURCE MY_SQL DESCRIBE BY "MySQL " OPTIONS (

jdbcdriver "com.mysql.jdbc.Driver",

jdbcurl "jdbc:mysql://127.0.0.1/test_db?",

username "scott" ,

password "tiger

)

IBM DB2

CREATE DATASOURCE DB2_DB2INST1 DESCRIBE BY "DB2 Sample DB Schema "

OPTIONS (

jdbcdriver "com.ibm.db2.jcc.DB2Driver",

jdbcurl "jdbc:db2://127.0.0.1:50000/db_name",

username "db2inst1" ,

password "db2inst1-pwd",

schema "DB2INST1",

schema_prepended_table "true"

)

Green Plum

CREATE DATASOURCE GREEN_PLUM DESCRIBE BY "GREEN_PLUM " OPTIONS (

jdbcdriver "org.postgresql.Driver",

jdbcurl "jdbc:postgresql://localhost:5432/postgres",

username "gpadmin" ,

password "pivotal",

schema "public"

)

Teradata

CREATE DATASOURCE TERA_DATA DESCRIBE BY "TERA_DATA " OPTIONS (

jdbcdriver "com.teradata.jdbc.TeraDriver",

jdbcurl "jdbc:teradata://10.128.87.16/DBS_PORT=1025",

username "dbc" ,

password "dbc",

schema "dbcmngr"

)

Amazon Aurora

CREATE DATASOURCE AWS_AURORA DESCRIBE BY "AWS_AURORA " OPTIONS (

jdbcdriver "com.mysql.jdbc.Driver",

jdbcurl "jdbc:mysql://zet-aurora-cluster.cluster-ckh4ncwbhsty.ap-

southeast-2.rds.amazonaws.com/your_db?",

username "your_db_account_name" ,

password "your_db_account_password""

)

Amazon Redshift

CREATE DATASOURCE REDSHIFT DESCRIBE BY "AWS RedShift" OPTIONS (

jdbcdriver "com.amazon.redshift.jdbc.Driver",

jdbcurl "jdbc:redshift://zetaris.cyzoanxzdpje.ap-southeast-2.redshift.

amazonaws.com:5439/your_db_name",

username "your_db_account_name",

password "your_db_account_password"

)

Zetaris Network Data Platform supports all known NoSQLs. Contact to get information about other data sources.[email protected]

NoSQL data sources include:

MongoDB - For MongoDB, host, port, database name, user name, and password must be provided.

CREATE DATASOURCE MONGO DESCRIBE BY "MongoDB" OPTIONS (

lightning.datasource.mongodb.host "localhost",

lightning.datasource.mongodb.port "27017",

lightning.datasource.mongodb.database "lightning-demo",

lightning.datasource.mongodb.username "",

lightning.datasource.mongodb.password ""

)

Cassandra - For Cassandra, there is only one parameter for Zetaris Network Data Platform, which is the keyspace for the connection.

CREATE DATASOURCE CSNDR DESCRIBE BY "Cassandra" OPTIONS (

spark.cassandra.connection.host "localhost",

spark.cassandra.connection.port "9042",

spark.cassandra.auth.username "cassandra",

spark.cassandra.auth.password "cassandra",

lightning.datasource.cassandra.keyspace "lightning_demo"

)

Amazon DynamoDB

Zetaris Network Data Platform needs access to the key and security key to use AWS services.

CREATE DATASOURCE AWS_DYNAMODB DESCRIBE BY "AWS DynamoDB" OPTIONS (

accessKeyId "Your_aws_accessKeyId",

secretKey "Your_aws_SecretAccessKey" ,

region "ap-southeast-2"

)

Create Zetaris Cloud Data Fabric Data Base for flat files in a file store(AWS S3, Azure Blob, local file system)

Files in the file store or RESTful API source are registered under this namespace.

CREATE LIGHTNING DATABASE AWS_S3 DESCRIBE BY "AWS S3 bucket" OPTIONS (

[key "value"]

)

Ingest Metadata

Once a data source is registered in Zetaris Cloud Data Fabric it ingests all table, column, and constraints metadata.

Ingest all tables from the data source

This command connects you to the ORCL database and ingests all metadata (tables, columns, foreign key, index, and all other constraints) into

the Schema Store.

Ingest a table from the data source

This command registers the as an alias.user table

Update Schema

When changes are made to the target data source, a user can reflect them using the update schema command.

UPDATE DATASOURCE SCHEMA ORCL

Ingest flat files in the file store

You can ingest flat files like CSV, JSON, ORC, and Parquet in a file store. For example, AWS S3, Azure Blob, or local (remote) file system.

CREATE LIGHTNING FILESTORE TABLE pref FROM HR FORMAT (CSV | JSON)

OPTIONS (path "file path", header "true", inferSchema "true", [key

value pair]);

(AWS S3)

CREATE . LIGHTNING . FILESTORE TABLE customer FROM TPCH_S3 FORMAT CSV

(JSON) OPTIONS (

PATH "s3n://zetaris-lightning-test/csv-data/tpc-h/customer.csv",

inferSchema "true",

AWSACCESSKEYID "AKIAITGIWHBIPE3NU5GA",

AWSSECRETACCESSKEY "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"

)

(Azure Blob)

CREATE LIGHTNING FILESTORE TABLE customer FROM TPCH_AZBLB FORMAT CSV

(JSON) OPTIONS (

PATH "wasb://[email protected].

windows.net/customer.csv",

inferSchema "true",

fs.azure.account.key.zettesstorage.blob.core.windows.net

"bHLzau36KlZ6cYnSrvPzSJVniBDtu819nHTR/+hRyDZEVScQ3wuesst9P5

/I7vqG+4czeimuHSrPe2ZtK+b+BQ=="

)

Ingest a RESTful service

For the RESTful service, you need to provide an endpoint.

CREATE LIGHTNING REST TABLE SAFC_USERS FROM ORCL SCHEMA (

uid Long,

gender String,

age Integer,

job String,

ts String)

OPTIONS(

endpoint "http://localhost:9998/example/users",

method "GET",

requesttype "URLENCODED"

)

Update description and materialised table for each relation in a data source

For other API parameters like the security key, provide an field.option

UPDATE DATASOURCE TABLE SET ORCL.movies OPTIONS (

description"some description",

materializedtable"FusionDB.movies"

)

List the tables from the data source

Before registering one or more tables from a database, you can list all tables that the data source contains.

LIST DATABASE TABLES OPTIONS (

jdbcdriver"oracle.jdbc.OracleDriver",

jdbcurl"jdbc:oracle:thin:@oracle-master:1521:orcl",

username"scott",

password"tiger",

[key "value"]*

)

Manage Schema Store

Zetaris Networked Data Platform stores all metadata for the external data sources in the Schema Store.

You can manage the schema store using the following commands:

Data Source

This command shows the registered data sources in the schema store.

SHOW DATASOURCES

This command drops the registered data sources and tables.

DROP DATASOURCE ORCL

This command describes the data source.

DESCRIBE DATASOURCE ORCL

This command describes the slave data source.

DESCRIBE SLAVE DATASOURCE ORCL

Tables

This command describes the data source table.

DESC ORCL.USERS

This command shows all tables.

SHOW TABLES

This command shows data source tables.

SHOW DATASOURCE TABLES ORCL

This command drops the table. However, this command doesn't delete the table in the target data source but only deletes ingested metadata in

the Networked Data platform schema store.

DROP TABLE ORCL.USERS

View

The Networked Data platform supports the view capability with query definition on a single data source or across multiple data sources.

Create data source view

CREATE DATASOURCE VIEW TEEN_AGER FROM ORCL AS

SELECT*FROM USERS WHERE AGE >=13AND AGE <20

the TEEN_AGER view belongs to ORCL data source.

With this capability, a user can create a view with DBMS native query:

CREATE DATASOURCE VIEW SALARY_RANK FROM ORCL AS

SELECT department_id, last_name, salary, RANK() OVER (PARTITION BY

department_id ORDER BY salary) RANK

FROM employees

WHERE department_id = 60

ORDER BY RANK, last_name

Create schema store view

CREATE DATASOURCE VIEW TOP10_MOVIES_FOR_TEENS AS

SELECT movies_from_oracle.title, user_rating.count, user_rating.min,

user_rating.max, user_rating.avg

FROM(

SELECT iid, count(*) count, min(pref) min, max(pref) max, avg(pref)

avg

FROM TRDT.ratings ratings_from_teradata, PGRS.users

users_from_postgres

WHERE users_from_postgres.age >=13AND users_from_postgres.age <20

AND ratings_from_teradata.uid = users_from_postgres.uid

GROUP BY ratings_from_teradata.iid

ORDER BY avg DESC

LIMIT20

) AS user_rating, ORCL.movies movies_from_oracle

WHERE movies_from_oracle.iid = user_rating.iid

This view can be queried like normal table :

SELECT*FROM TOP10_MOVIES_FOR_TEENS

Delete view

DROP VIEW ORCL.TEEN_AGER

Run Query

The Networked data platform supports SQL2003 and it can also run all 99 TPC-DS queries. As long as a data source is registered into the

schema store, a query can be built that spans across all data sources.

For example, the following query joins across three different data sources (Teradata Oracle Cassandra):

SELECT users_from_cassandra.age, users_from_cassandra.gender,

movies_from_oracle.title title, ratings_from_teradata.pref,

ratings_from_teradata.ts

FROM TRDT.ratings ratings_from_teradata, ORCL.movies

movies_from_oracle, CSNDR.users users_from_cassandra

WHERE users_from_postgres.gender ='F'

AND ratings_from_teradata.uid = users_from_postgres.uid

AND movies_from_oracle.iid = ratings_from_teradata.iid

Materialisation and Cache

The following query materialises all data from RESTful Service to the USER_FOR_COPY table in the fusion DB.

INSERT INTO FUSIONDB.USERS_FOR_COPY

SELECT uid, gender, age, job, ts FROM SAFC.SAFC_USERS

You can load or unload all data into the main memory by leveraging cached or uncached commands.

CACHE TABLE AWS_S3.pref;

CACHE TABLE ORCL.movies;

The pref table in the AWS s3 bucket and the movies table in oracle is now cached into memory. You can run the following query:

SELECT movies_from_oracle.title, hdfs_pref.count, hdfs_pref.min,

hdfs_pref.max, hdfs_pref.avg

FROM (

SELECT iid, count(*) count, min(pref) min, max(pref) max, avg(pref)

avg

FROMAWS_S3.pref

GROUPBY iid

) AS hdfs_pref, ORCL.movies movies_from_oracle

WHERE movies_from_oracle.iid = hdfs_pref.iid

And, uncache :

UNCACHE TABLE AWS_S3.pref

UNCACHE TABLE ORCL.movies

Statistics

The Networked Data Platform comes with a Cost-Based Optimiser (CBO) to reduce data shuffling across clusters and data sources. It keeps the

table-level statistics and column-level statistics for all data sources defined in its Schema Store.

Table-level statistics

ANALYZE DATASOURCE TABLE ORCL.MOVIES

This command shows the generated table statistics, such as size in bytes and cardinality for the table.

SHOW DATASOURCE TABLE STATISTICS ORCL.MOVIES

Column-level statistics

ANALYZE DATASOURCE TABLE ORCL.MOVIES COMPUTE STATISTICS FOR COLUMNS

(IID, TITLE)

This command generates statistics for a column such as cardinality, number of null, min, max, and average value.

SHOW DATASOURCE COLUMN STATISTICS ORCL.MOVIES

Partitioning

Partitioning tables improve query performance. It allows all records in the table to split into multiple chunks and process in parallel.

CREATEPARTITION ON ORCL.USERS OPTIONS (

COLUMN "UID",

COUNT "2",

LOWERBOUND "1",

UPPERBOUND "6040")

This command makes two partitions based on the "UID" column. If required, you can remove the partition by running the command :

DROPPARTITION ON ORCL.USERS

User Management

You can add or remove a user in an admin role by giving the following commands:

To a user:add

ADD USER WITH (

email '[email protected]',

name 'someone',

level 'general',

password '1234567'

)

To the password for a user:update

UPDATE USER user_id SET PASSWORD 'new_password'

To a user:describe

DESCRIBE USER user_id

To a user:remove

DROP USER user_id

To all users:show

SHOW USERS

Role-Based Access Control

The Zetaris Networked Data Platform provides Role-Based Access (RBA) which limits a user to access a specific set of data. This is applied at

the data source level or table level in each data source. Admin users or equivalents can run these commands.

Privileges

SELECT privilege - Give read access to the data source or relation

INSERT privilege - Give insert access to the data source or relation

CACHE privileges - Give cache access to a relation : (UN)CACHE DATASOURCE TABLE

Predefined Roles

The role is case insensitive.

admin

none

all

default

To a role, run this command:create

CREATE ROLE role_name [DESCRIBE BY "this is good~~~"]

To roles, run this command:show

SHOW ROLES

To a role to the user, run this command:assign

ASSIGN USER user_name [, user_name] ...TO ROLE role_name

To the role of a user, run this command:revoke

REVOKE USER user_name[, user_name] ...FROM ROLE role_name

To show all the roles assigned to a user, run this command:

SHOW ROLE ASSIGNED TO USER user_name