Skip to content

Sub command: Custom

Faisal Ali edited this page Jul 15, 2021 · 9 revisions

Introduction

Mock Data tool is designed with mocking tables based on the datatype of a column, it's not smart in determining if that is a name column or a email column etc. With custom sub command mock data tool provides the control to the user and lets the user decide the lifecycle of mocking the data to the tables, i.e

  1. User can pick which column to skip and let mock data tool decide the best data for it
  2. User can control what kind of data goes to a column i.e user can feed in custom dataset to mock ( i.e picked randomly during mocking )
  3. User can select from the list of supported realistic data key

NOTE: For all the realistic key, checkout the page

Under the custom subcommand the user is provided with a file and a plan of how the data will be loaded to the columns, the file can then be modified and fed to the tool to control the dataset to mock.

Short Hand: The short hand of the schema subcommand is c

Preference Order

There are 3 ways to load data using the custom tool,

  1. User provided dataset
  2. Realistic dataset
  3. Random dataset

The order of selection (in case two or more option is set for a column) of what kind of data to be used to mock the table is determined by the order mentioned above i.e user generated dataset is give preference over realistic dataset etc.

Usage

The usage of table subcommand is

[gpadmin@gpdb-m ~]$ mock custom --help
Control the data being written to the tables

Usage:
  mock custom [flags]

Aliases:
  custom, c

Flags:
  -f, --file string         Mock the tables provided in the yaml file
  -h, --help                help for custom
  -t, --table-name string   Provide the table name whose skeleton need to be copied to the file

Global Flags:
  -a, --address string    Hostname where the postgres database lives
  -d, --database string   Database to mock the data (default "gpadmin")
  -q, --dont-prompt       Run without asking for confirmation
  -i, --ignore            Ignore checking and fixing constraints
  -w, --password string   Password for the user to connect to database
  -p, --port int          Port number of the postgres database (default 3000)
  -r, --rows int          Total rows to be faked or mocked (default 10)
  -u, --username string   Username to connect to the database
  -v, --verbose           Enable verbose or debug logging

Example

As indicated above, you have choice of three ways to control the data to be loaded onto a table, click below if you want to quickly jump to the one you are interested

User Generated Dataset

  • Lets take a example of table that has a check constraint ( for eg.s partition in greenplum database or create have your own postgres database tables)
  • Now lets build a plan of this table
    mock custom --table-name sales
    -- OR --
    mock c -t sales
    

    NOTE:

    • If the table is not on the default public schema then use mock c -t <schema-name>.<table-name>
    • If you want to generate plan for multiple table then use mock c -t <schema-name1>.<table-name1>,<schema-name2>.<table-name2>...<schema-nameN>.<table-nameN>
  • Once the plan is generated you will received the location and yaml file at the end The YAML is saved to file: <PATH>/<FILENAME> creating-custom-files
  • Edit the file generated using any text editor of your choice
    • On the column you want to take control add array of value you would like to mock data to randomly pick under the UserData key, for eg we take control of date column below
      Custom:
      - Schema: public
        Table: sales
        Column:
        - Name: id
          Type: integer
          UserData: []
          Realistic: ""
        - Name: date
          Type: date
          UserData: 
          - 2016-01-01
          - 2016-03-01
          - 2016-04-01
          Realistic: ""
        - Name: amt
          Type: numeric(10,2)
          UserData: []
          Realistic: ""
      
    • Continue this procedure for the rest of the columns you are interested
  • Using the custom generated plan, feed the yaml to the mock tool
    mock custom --file <filename or path/filename> 
    -- OR --
    mock c -f <filename or path/filename>
    
    loading-data-via-custom-file
  • If you want more rows use the row flag
    mock custom --file <filename or path/filename> --row <total rows number>
    -- OR --
    mock c -f <filename or path/filename> -r <total rows number>
    

Realistic Dataset

  • Lets create a table eg.s
    CREATE TABLE employee
    (
       name    VARCHAR(100),
       email   VARCHAR(120),
       mobile  VARCHAR(50),
       gender  VARCHAR(2),
       address VARCHAR(500)
    ); 
    
  • Let's generate a plan for the table
    mock custom --table-name employee
    -- OR --
    mock c -t employee
    
  • Edit the yaml generated using the above command to include realistic keys like below, for the complete list of realistic keys available check out this part of the code available here
    Custom:
    - Schema: public
      Table: employee
      Column:
      - Name: name
        Type: character varying(100)
        UserData: []
        Realistic: "NameFullName"
      - Name: email
        Type: character varying(120)
        UserData: []
        Realistic: "InternetEmail"
      - Name: mobile
        Type: character varying(50)
        UserData: []
        Realistic: "PhoneNumberString"
      - Name: gender
        Type: character varying(2)
        UserData: []
        Realistic: "NameGenderAbbrev"
      - Name: address
        Type: character varying(500)
        UserData: []
        Realistic: "AddressString"
    
  • Using the custom generated plan, feed the yaml to the mock tool
    mock custom --file <filename or path/filename> 
    -- OR --
    mock c -f <filename or path/filename>
    
    realistic-data-loading

Random / User Generated / Realistic Dataset

If you combine all the three i.e power of random generated data / user provided & realistic you can have N possibilities of loading the data, let's take a example

  • Let us create a table

    CREATE TABLE employee
    (
       name         VARCHAR(100),
       password_hash VARCHAR(30),
       gender       VARCHAR
    ); 
    
  • Let's generate a plan for the table

    mock custom --table-name employee
    -- OR --
    mock c -t employee
    
  • Edit the yaml generated using the above command, here we will use

    • name column will be fed by realistic data
    • password_hash column will be generated randomly by the tool
    • gender column will be inserted by user generated dataset

    so our yaml now looks like

    Custom:
    - Schema: public
      Table: employee
      Column:
      - Name: name
        Type: character varying(100)
        UserData: []
        Realistic: "NameFullName"
      - Name: password_hash
        Type: character varying(30)
        UserData: []
        Realistic: ""
      - Name: gender
        Type: character varying
        UserData: ["M", "F", "O"]
        Realistic: ""
    
  • Using the custom generated plan, feed the yaml to the mock tool

    mock custom --file <filename or path/filename> 
    -- OR --
    mock c -f <filename or path/filename>
    

    all-custom-command-options