14 May 2015

Postgresql Advanced Data Types

NoSQL is buzz word in the database and computer world with its rapid growth. NoSQL databases range from key-value store to columunar databasees to document databases to graph databases.

RDBMS is best way to store data so that modren web application can make use of ORM to independently connect to database via objects. I am sure people who worked on RDBMS will agree that it is good to use advanced data types like ARRAY, HSTORE, JSON so that we an store data for better access and performance reasons.

Postgres has advanced data types so that we can use with relational database to store data, below are some advantages

They are easy to access
They have a lot of functionality around them
They are durable
They perform well (caution: we have to use it correctly)

Here are the list of data Types

ARRAY

Allows columns of the table to be defined as variable length multidimentional arrays. Arrays of any built-in tyoe or user defined type can be created.

CREATE TABLE bookmarks (
   id	SERIAL,
   link VARCHAR(512),
   tags VARCHAR(128)[]
);

Lets populate some data

INSERT INTO bookmarks VALUES (1, 'http://perl.org', '{"Perl", "C"}');
INSERT INTO bookmarks VALUES (1, 'http://php.net', '{"PHP", "C"}');
INSERT INTO bookmarks VALUES (1, 'http://amazon.com', '{"PHP", "C", "Perl"}');

SELECT * FROM bookmarks;
 id |       link        |     tags     
----+-------------------+--------------
  1 | http://perl.org   | {Perl,C}
  1 | http://php.net    | {PHP,C}
  1 | http://amazon.com | {PHP,C,Perl}
(3 rows)

Array functions

Postgres have a bunch of array functions that you can use for fitering based on the array value.

Bookmarks tagged with Perl

SELECT * FROM bookmarks WHERE ('Perl' = ANY(tags));
 id |       link        |     tags     
----+-------------------+--------------
  1 | http://perl.org   | {Perl,C}
  1 | http://amazon.com | {PHP,C,Perl}
(2 rows)

SELECT * FROM bookmarks WHERE tags @> '{Perl}';
 id |       link        |     tags     
----+-------------------+--------------
  1 | http://perl.org   | {Perl,C}
  1 | http://amazon.com | {PHP,C,Perl}
(2 rows)

Bookmarks not tagged with Perl

SELECT * FROM bookmarks WHERE NOT ('Perl' = ANY(tags));
 id |      link      |  tags   
----+----------------+---------
  1 | http://php.net | {PHP,C}
(1 row)

SELECT * FROM bookmarks WHERE ('Perl' != ALL(tags));
 id |      link      |  tags   
----+----------------+---------
  1 | http://php.net | {PHP,C}
(1 row)

Convert Array to String

SELECT array_to_string(ARRAY[1,2,NULL,4], ',', '*');
 array_to_string 
-----------------
 1,2,*,4
(1 row)

Convert Array to Rows

SELECT unnest(ARRAY[1,2,3]);
 unnest 
--------
      1
      2
      3
(3 rows)

Convert columnar value to Array value

SELECT array_agg(link) FROM bookmarks;
                     array_agg                      
----------------------------------------------------
 {http://perl.org,http://php.net,http://amazon.com}
(1 row)

INDEXING

You can index an array but you have to use the array operators and the GIN index type.

CREATE INDEX idx_bookmarks on bookmarks USING GIN (tags);

Below are the list of operators that can be used on indexed queries

<@
@>
=
&&

HSTORE

Posgtgres HSTORE is column type for storing key->value data (Schemaless data) sometimes called as documnet store. HStore is often compared to NoSQL document store.

Here is SQL statements to create table with HSTORE datatype, populating data etc..

ENABLE HSTORE Extension

CREATE EXTENSION hstore;

CREATE TABLE book (
    id SERIAL,
    name VARCHAR(512),
    author HSTORE
);

INSERT INTO book VALUES (1, 'First Book', 
	'first_name => "Bob", 
	last_name => "White", 
	title => "Mr"'
);
INSERT INTO book VALUES (1, 'Second Book', 
	'first_name => "Sally", 
	last_name => "White", 
	title => "Mrs"'
);

SELECT * FROM book WHERE author->'last_name' = 'White';

HSTORE functions

Refer here

INDEXING

Hstore has GiST and GIN index support for the @>, ?, ?& and ?

operators.

CREATE INDEX authoridx ON book USING GIST (author);

CREATE INDEX authoridx ON book USING GIN (author);

CREATE INDEX book_autor_first_name
	ON book USING GIN ( ((author->'first_name') );

JSON

The JSON datatype is meant for storing JSON-structured data. It will validate that the input JSON string is correct JSON.

JSON can be used when you need to support nested objects but not just text or number when compared with HSTORE.

CREATE TABLE events (
  name varchar(200),
  visitor_id varchar(200),
  properties json,
  browser json
);

INSERT INTO events VALUES (
  'pageview', '1',
  '{ "page": "/" }',
  '{ "name": "Chrome", "os": "Mac", "resolution": { "x": 1440, "y": 900 } }'
);
INSERT INTO events VALUES (
  'pageview', '2',
  '{ "page": "/" }',
  '{ "name": "Firefox", "os": "Windows", "resolution": { "x": 1920, "y": 1200 } }'
);
INSERT INTO events VALUES (
  'pageview', '1',
  '{ "page": "/account" }',
  '{ "name": "Chrome", "os": "Mac", "resolution": { "x": 1440, "y": 900 } }'
);
INSERT INTO events VALUES (
  'purchase', '5',
  '{ "amount": 10 }',
  '{ "name": "Firefox", "os": "Windows", "resolution": { "x": 1024, "y": 768 } }'
);
INSERT INTO events VALUES (
  'purchase', '15',
  '{ "amount": 200 }',
  '{ "name": "Firefox", "os": "Windows", "resolution": { "x": 1280, "y": 800 } }'
);
INSERT INTO events VALUES (
  'purchase', '15',
  '{ "amount": 500 }',
  '{ "name": "Firefox", "os": "Windows", "resolution": { "x": 1280, "y": 800 } }'
);

Some Example Queries

Browser Usage

SELECT browser->>'name' AS browser, count(browser)
FROM events
GROUP BY browser->>'name';

 browser | count 
---------+-------
 Firefox |     4
 Chrome  |     2
(2 rows)

Total Revenue per visitor

SELECT visitor_id, SUM(CAST(properties->>'amount' AS integer)) AS total
FROM events
WHERE CAST(properties->>'amount' AS integer) > 0
GROUP BY visitor_id;

 visitor_id | total 
------------+-------
 5          |    10
 15         |   700
(2 rows)

JSON functions

array_to_json

SELECT array_to_json(ARRAY[1,2,3]);
 array_to_json 
---------------
 [1,2,3]
(1 row)

row_to_json

SELECT row_to_json(bookmarks) from bookmarks;
                          row_to_json                          
---------------------------------------------------------------
 {"id":1,"link":"http://perl.org","tags":["Perl","C"]}
 {"id":1,"link":"http://php.net","tags":["PHP","C"]}
 {"id":1,"link":"http://amazon.com","tags":["PHP","C","Perl"]}
(3 rows)

json_extract_path

SELECT json_extract_path('{"a": 1, "b": 2, "c": [1,2,3]}'::json, 'c', '1');
 json_extract_path 
-------------------
 2
(1 row)

json_extract_path_text

SELECT json_extract_path_text('{"a": 1, "b": 2, "c": [1,2,3]}'::json, 'c', '1');
 json_extract_path_text 
------------------------
 2
(1 row)

For complete list of functions click here

INDEXING

CREATE INDEX events_browser_name_idx ON events (json_extract_path_text(browser, ‘name’));

JSONB

Since it is difficult to serach within JSON we have new data type called as JSONB (Binary) which gives more operators so that searching becomes effective.

JSONB gives us flexibility and speed.

JSONB functions

JSON and JSONB function are pretty much similar but some function has jsonb equivalent to work on binary json format.

Refer here for complete list

INDEXING

JSONB supports GIN indexing which gives better performance in SELECT statements

Which one should I use? ARRAY/HSTORE/JSON/JSONB?

It is often difficult to select a datatype for our production use but it all depends on your use case. Below is set of rules that you can follow to select an advanced data type.

ARRAY is straight forward selection if you just have array of values to be stored in a column. Eg: Hourly data in a day
HSTORE provides more structure and if you have just key-value store use HSTORE
JSON/JSONB is more versatile than HSTORE
use JSON, If you need any of the following
- Storage of validated JSON, without processing/indexing
- preservation of white space in json text
- Preservation of object key order
- preservation of duplicate object keys
- Maximum input/output speed
For any other cases, use JSONB