Guide - SQL Editor Commands
Introduction
The SQL user guide details the specific SQL syntax required to build the virtual data marts / warehouses queried through the Network Data
Platform.
The traditional approaches to data integration involve the ELT or ETL processes which incur side effects such as data duplication, the extra
processing time for the ETL or ELT processes. To commence analytical processing by any enterprise, the data structure needs to be
harmonised, data types resolved, and relational modeling is required to resolve entities and relationships across the enterprise.
This approach is valid only to build limited static data sources. Today, a large amount of data from heterogeneous data processing systems need
real-time processing that makes this approach slow, expensive and unresponsive to time-critical business needs.
The Zetaris Network Data Platform is developed to address the needs of the new world of big-fast data. Instead of running analytics after physical
data ingestion has occurred, the Zetaris Cloud Data Fabric ingests only metadata for the underlying data sources into its Schema Store. The
Zetaris Cloud Data Fabric provides various commands to ingest or manage this metadata where it can be arranged into virtual data structures
such as data warehouses or data marts in minutes.
This guide helps you to explore the data virtualization process, learn how to run queries in the Data Fabric, and improve query performance. Most
of the commands in this guide can be executed through the Data Platform interface.
Contents
Introduction
Contents
Building a Data Fabric
Register master data source
Add slave nodes for the registered data source (Option for a cluster-based database)
RDBMS Examples
MS SQL Server
My SQL
IBM DB2
Green Plum
Teradata
Amazon Aurora
Amazon Redshift
Register NoSQL data sources
Create Zetaris Cloud Data Fabric Data Base for flat files in a file store(AWS S3, Azure Blob, local file system)
Ingest Metadata
Ingest all tables from the data source
Ingest a table from the data source
Update Schema
Ingest flat files in the file store
Ingest a RESTful service
Update description and materialised table for each relation in a data source
List the tables from the data source
Manage Schema Store
Data Source
Tables
View
Create data source view
Create schema store view
Delete view
Run Query
Materialisation and Cache
Statistics
Table-level statistics
Column-level statistics
Partitioning
User Management
Role-Based Access Control
Privileges
Predefined Roles
Building a Data Fabric
To build a data fabric, you need to identify the database to be connected. The Zetaris Cloud Data Fabric supports all known data sources like
RDBMS, NoSQL, Rest API, CSV, and JSON.
Register master data source
Ensure that you provide the JDBC driver class, URL and connectivity credentials, and database parameters to register a master data source.
CREATE DATASOURCE ORCL [DESCRIBE BY "Oracle for Product Master"]
OPTIONS (
jdbcdriver "oracle.jdbc.OracleDriver",
jdbcurl "jdbc:oracle:thin:@oracle-master:1521:orcl",
username "scott",
password "tiger",
[schema "system",]
[schema_prepended_table "true",]
[key "value"]*)
Add slave nodes for the registered data source (Option for a cluster-based database)
You can register slave nodes if the registered database supports cluster-based computing such as Massively Parallel Processing (MPP). This
helps the ZetarisNetwork Data Platform to directly query slave nodes rather than running through a master node.
The following example shows the registration of one master node (oracle-master), and three slave nodes (oracle-slave1, oracle-slave2, and
oracle-slave3).
CREATE DATASOURCE FUSIONDB DESCRIBE BY "Zetaris MPP " OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://coordinator:5432/pgrs",
username "admin",
password "password")
ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://datanode1:5432/pgrs",
username "admin",
password "password")
ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (
jdbcdriver "org.postgressql.Driver",
jdbcurl "jdbc:postgresql://datanode2:5432/pgrs",
username "admin",
password "password")
ADD SLAVE DATASOURCE TO FUSIONDB OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://datanode3:5432/pgrs",
username "admin",
password "password")
To differentiate between the table names across the schema, the schema prepended table is termed as , and the ingested table as true s
.chema name
If the schema is provided, Zetaris Cloud Data Fabric ingests only the metadata from that schema.
RDBMS Examples
MS SQL Server
CREATE DATASOURCE MSSQL DESCRIBE BY "MSSQL-2017-linux " OPTIONS (
jdbcdriver "com.microsoft.sqlserver.jdbc.SQLServerDriver",
jdbcurl "jdbc:sqlserver://localhost:1433 ",
databaseName "DemoData",
username "scott" ,
password "tiger",
schema “dbo”
)
My SQL
CREATE DATASOURCE MY_SQL DESCRIBE BY "MySQL " OPTIONS (
jdbcdriver "com.mysql.jdbc.Driver",
jdbcurl "jdbc:mysql://127.0.0.1/test_db?",
username "scott" ,
password "tiger
)
IBM DB2
CREATE DATASOURCE DB2_DB2INST1 DESCRIBE BY "DB2 Sample DB Schema "
OPTIONS (
jdbcdriver "com.ibm.db2.jcc.DB2Driver",
jdbcurl "jdbc:db2://127.0.0.1:50000/db_name",
username "db2inst1" ,
password "db2inst1-pwd",
schema "DB2INST1",
schema_prepended_table "true"
)
Green Plum
CREATE DATASOURCE GREEN_PLUM DESCRIBE BY "GREEN_PLUM " OPTIONS (
jdbcdriver "org.postgresql.Driver",
jdbcurl "jdbc:postgresql://localhost:5432/postgres",
username "gpadmin" ,
password "pivotal",
schema "public"
)
Teradata
CREATE DATASOURCE TERA_DATA DESCRIBE BY "TERA_DATA " OPTIONS (
jdbcdriver "com.teradata.jdbc.TeraDriver",
jdbcurl "jdbc:teradata://10.128.87.16/DBS_PORT=1025",
username "dbc" ,
password "dbc",
schema "dbcmngr"
)
Amazon Aurora
CREATE DATASOURCE AWS_AURORA DESCRIBE BY "AWS_AURORA " OPTIONS (
jdbcdriver "com.mysql.jdbc.Driver",
jdbcurl "jdbc:mysql://zet-aurora-cluster.cluster-ckh4ncwbhsty.ap-
southeast-2.rds.amazonaws.com/your_db?",
username "your_db_account_name" ,
password "your_db_account_password""
)
Amazon Redshift
CREATE DATASOURCE REDSHIFT DESCRIBE BY "AWS RedShift" OPTIONS (
jdbcdriver "com.amazon.redshift.jdbc.Driver",
jdbcurl "jdbc:redshift://zetaris.cyzoanxzdpje.ap-southeast-2.redshift.
amazonaws.com:5439/your_db_name",
username "your_db_account_name",
password "your_db_account_password"
)
Register NoSQL data sources
Zetaris Network Data Platform supports all known NoSQLs. Contact to get information about other data sources.[email protected]
NoSQL data sources include:
MongoDB - For MongoDB, host, port, database name, user name, and password must be provided.
CREATE DATASOURCE MONGO DESCRIBE BY "MongoDB" OPTIONS (
lightning.datasource.mongodb.host "localhost",
lightning.datasource.mongodb.port "27017",
lightning.datasource.mongodb.database "lightning-demo",
lightning.datasource.mongodb.username "",
lightning.datasource.mongodb.password ""
)
Cassandra - For Cassandra, there is only one parameter for Zetaris Network Data Platform, which is the keyspace for the connection.
CREATE DATASOURCE CSNDR DESCRIBE BY "Cassandra" OPTIONS (
spark.cassandra.connection.host "localhost",
spark.cassandra.connection.port "9042",
spark.cassandra.auth.username "cassandra",
spark.cassandra.auth.password "cassandra",
lightning.datasource.cassandra.keyspace "lightning_demo"
)
Amazon DynamoDB
Zetaris Network Data Platform needs access to the key and security key to use AWS services.
CREATE DATASOURCE AWS_DYNAMODB DESCRIBE BY "AWS DynamoDB" OPTIONS (
accessKeyId "Your_aws_accessKeyId",
secretKey "Your_aws_SecretAccessKey" ,
region "ap-southeast-2"
)
Create Zetaris Cloud Data Fabric Data Base for flat files in a file store(AWS S3, Azure Blob, local file system)
Files in the file store or RESTful API source are registered under this namespace.
CREATE LIGHTNING DATABASE AWS_S3 DESCRIBE BY "AWS S3 bucket" OPTIONS (
[key "value"]
)
Ingest Metadata
Once a data source is registered in Zetaris Cloud Data Fabric it ingests all table, column, and constraints metadata.
Ingest all tables from the data source
This command connects you to the ORCL database and ingests all metadata (tables, columns, foreign key, index, and all other constraints) into
the Schema Store.
REGISTER DATASOURCE TABLES FROM ORCL
Ingest a table from the data source
This command registers the as an alias.user table
REGISTER DATASOURCE TABLE "USER" [USER_ALIAS] FROM ORCL
Update Schema
When changes are made to the target data source, a user can reflect them using the update schema command.
UPDATE DATASOURCE SCHEMA ORCL
Ingest flat files in the file store
You can ingest flat files like CSV, JSON, ORC, and Parquet in a file store. For example, AWS S3, Azure Blob, or local (remote) file system.
CREATE LIGHTNING FILESTORE TABLE pref FROM HR FORMAT (CSV | JSON)
OPTIONS (path "file path", header "true", inferSchema "true", [key
value pair]);
(AWS S3)
CREATE . LIGHTNING . FILESTORE TABLE customer FROM TPCH_S3 FORMAT CSV
(JSON) OPTIONS (
PATH "s3n://zetaris-lightning-test/csv-data/tpc-h/customer.csv",
inferSchema "true",
AWSACCESSKEYID "AKIAITGIWHBIPE3NU5GA",
AWSSECRETACCESSKEY "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"
)
(Azure Blob)
CREATE LIGHTNING FILESTORE TABLE customer FROM TPCH_AZBLB FORMAT CSV
(JSON) OPTIONS (
PATH "wasb://[email protected].
windows.net/customer.csv",
inferSchema "true",
fs.azure.account.key.zettesstorage.blob.core.windows.net
"bHLzau36KlZ6cYnSrvPzSJVniBDtu819nHTR/+hRyDZEVScQ3wuesst9P5
/I7vqG+4czeimuHSrPe2ZtK+b+BQ=="
)
Ingest a RESTful service
For the RESTful service, you need to provide an endpoint.
CREATE LIGHTNING REST TABLE SAFC_USERS FROM ORCL SCHEMA (
uid Long,
gender String,
age Integer,
job String,
ts String)
OPTIONS(
endpoint "http://localhost:9998/example/users",
method "GET",
requesttype "URLENCODED"
)
Update description and materialised table for each relation in a data source
For other API parameters like the security key, provide an field.option
UPDATE DATASOURCE TABLE SET ORCL.movies OPTIONS (
description"some description",
materializedtable"FusionDB.movies"
)
List the tables from the data source
Before registering one or more tables from a database, you can list all tables that the data source contains.
LIST DATABASE TABLES OPTIONS (
jdbcdriver"oracle.jdbc.OracleDriver",
jdbcurl"jdbc:oracle:thin:@oracle-master:1521:orcl",
username"scott",
password"tiger",
[key "value"]*
)
Manage Schema Store
Zetaris Networked Data Platform stores all metadata for the external data sources in the Schema Store.
You can manage the schema store using the following commands:
Data Source
This command shows the registered data sources in the schema store.
SHOW DATASOURCES
This command drops the registered data sources and tables.
DROP DATASOURCE ORCL
This command describes the data source.
DESCRIBE DATASOURCE ORCL
This command describes the slave data source.
DESCRIBE SLAVE DATASOURCE ORCL
Tables
This command describes the data source table.
DESC ORCL.USERS
This command shows all tables.
SHOW TABLES
This command shows data source tables.
SHOW DATASOURCE TABLES ORCL
This command drops the table. However, this command doesn't delete the table in the target data source but only deletes ingested metadata in
the Networked Data platform schema store.
DROP TABLE ORCL.USERS
View
The Networked Data platform supports the view capability with query definition on a single data source or across multiple data sources.
Create data source view
CREATE DATASOURCE VIEW TEEN_AGER FROM ORCL AS
SELECT*FROM USERS WHERE AGE >=13AND AGE <20
the TEEN_AGER view belongs to ORCL data source.
With this capability, a user can create a view with DBMS native query:
CREATE DATASOURCE VIEW SALARY_RANK FROM ORCL AS
SELECT department_id, last_name, salary, RANK() OVER (PARTITION BY
department_id ORDER BY salary) RANK
FROM employees
WHERE department_id = 60
ORDER BY RANK, last_name
Create schema store view
CREATE DATASOURCE VIEW TOP10_MOVIES_FOR_TEENS AS
SELECT movies_from_oracle.title, user_rating.count, user_rating.min,
user_rating.max, user_rating.avg
FROM(
SELECT iid, count(*) count, min(pref) min, max(pref) max, avg(pref)
avg
FROM TRDT.ratings ratings_from_teradata, PGRS.users
users_from_postgres
WHERE users_from_postgres.age >=13AND users_from_postgres.age <20
AND ratings_from_teradata.uid = users_from_postgres.uid
GROUP BY ratings_from_teradata.iid
ORDER BY avg DESC
LIMIT20
) AS user_rating, ORCL.movies movies_from_oracle
WHERE movies_from_oracle.iid = user_rating.iid
This view can be queried like normal table :
SELECT*FROM TOP10_MOVIES_FOR_TEENS
Delete view
DROP VIEW ORCL.TEEN_AGER
Run Query
The Networked data platform supports SQL2003 and it can also run all 99 TPC-DS queries. As long as a data source is registered into the
schema store, a query can be built that spans across all data sources.
For example, the following query joins across three different data sources (Teradata Oracle Cassandra):
SELECT users_from_cassandra.age, users_from_cassandra.gender,
movies_from_oracle.title title, ratings_from_teradata.pref,
ratings_from_teradata.ts
FROM TRDT.ratings ratings_from_teradata, ORCL.movies
movies_from_oracle, CSNDR.users users_from_cassandra
WHERE users_from_postgres.gender ='F'
AND ratings_from_teradata.uid = users_from_postgres.uid
AND movies_from_oracle.iid = ratings_from_teradata.iid
Materialisation and Cache
The following query materialises all data from RESTful Service to the USER_FOR_COPY table in the fusion DB.
INSERT INTO FUSIONDB.USERS_FOR_COPY
SELECT uid, gender, age, job, ts FROM SAFC.SAFC_USERS
You can load or unload all data into the main memory by leveraging cached or uncached commands.
CACHE TABLE AWS_S3.pref;
CACHE TABLE ORCL.movies;
The pref table in the AWS s3 bucket and the movies table in oracle is now cached into memory. You can run the following query:
SELECT movies_from_oracle.title, hdfs_pref.count, hdfs_pref.min,
hdfs_pref.max, hdfs_pref.avg
FROM (
SELECT iid, count(*) count, min(pref) min, max(pref) max, avg(pref)
avg
FROMAWS_S3.pref
GROUPBY iid
) AS hdfs_pref, ORCL.movies movies_from_oracle
WHERE movies_from_oracle.iid = hdfs_pref.iid
And, uncache :
UNCACHE TABLE AWS_S3.pref
UNCACHE TABLE ORCL.movies
Statistics
The Networked Data Platform comes with a Cost-Based Optimiser (CBO) to reduce data shuffling across clusters and data sources. It keeps the
table-level statistics and column-level statistics for all data sources defined in its Schema Store.
Table-level statistics
ANALYZE DATASOURCE TABLE ORCL.MOVIES
This command shows the generated table statistics, such as size in bytes and cardinality for the table.
SHOW DATASOURCE TABLE STATISTICS ORCL.MOVIES
Column-level statistics
ANALYZE DATASOURCE TABLE ORCL.MOVIES COMPUTE STATISTICS FOR COLUMNS
(IID, TITLE)
This command generates statistics for a column such as cardinality, number of null, min, max, and average value.
SHOW DATASOURCE COLUMN STATISTICS ORCL.MOVIES
Partitioning
Partitioning tables improve query performance. It allows all records in the table to split into multiple chunks and process in parallel.
CREATEPARTITION ON ORCL.USERS OPTIONS (
COLUMN "UID",
COUNT "2",
LOWERBOUND "1",
UPPERBOUND "6040")
This command makes two partitions based on the "UID" column. If required, you can remove the partition by running the command :
DROPPARTITION ON ORCL.USERS
User Management
You can add or remove a user in an admin role by giving the following commands:
To a user:add
ADD USER WITH (
name 'someone',
level 'general',
password '1234567'
)
To the password for a user:update
UPDATE USER user_id SET PASSWORD 'new_password'
To a user:describe
DESCRIBE USER user_id
To a user:remove
DROP USER user_id
To all users:show
SHOW USERS
Role-Based Access Control
The Zetaris Networked Data Platform provides Role-Based Access (RBA) which limits a user to access a specific set of data. This is applied at
the data source level or table level in each data source. Admin users or equivalents can run these commands.
Privileges
SELECT privilege - Give read access to the data source or relation
INSERT privilege - Give insert access to the data source or relation
CACHE privileges - Give cache access to a relation : (UN)CACHE DATASOURCE TABLE
Predefined Roles
The role is case insensitive.
admin
none
all
default
To a role, run this command:create
CREATE ROLE role_name [DESCRIBE BY "this is good~~~"]
To roles, run this command:show
SHOW ROLES
To a role to the user, run this command:assign
ASSIGN USER user_name [, user_name] ...TO ROLE role_name
To the role of a user, run this command:revoke
REVOKE USER user_name[, user_name] ...FROM ROLE role_name
To show all the roles assigned to a user, run this command:
SHOW ROLE ASSIGNED TO USER user_name