Inserting dummy data into Postgres table

492 Views Asked by At

I have got the following Postgres table:

create table test (
    id serial, 
    contract varchar, 
    amount0 int, 
    amount1 int, 
    price double precision
);

I would like to insert 100 rows of dummy data that conforms to the following:

– In column 'contract' there should be values out of 'abc', 'klm' and 'xyz'.

– In columns 'amount0' and 'amount1' there should be integer values out of 50, 60, 80, 100, 200.

– In column 'price' there should be values out of 1.5, 1.8, 2.1, 2.5.

What I have come up with so far is this:

INSERT INTO test (amount0, amount1, price)
SELECT
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[1.5, 1.8, 2.1, 2.5]) AS val ORDER BY random() LIMIT 1)
FROM generate_series(1, 100);

But this doesn't do the trick. This statement always only uses '100' for column 'amount0', '50' for column 'amount1' and '2.1' for column 'price'.

Can anyone help me out?

3

There are 3 best solutions below

6
Zegarek On BEST ANSWER

You can randomly subscript the array of choices: (demo)

INSERT INTO test (contract,amount0, amount1, price)
SELECT (ARRAY['abc','klm','xyz'])[(random()*2+1)::int],
       (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
       (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
       (ARRAY[1.5, 1.8, 2.1, 2.5])[(random()*3+1)::int]
FROM generate_series(1, 1e2, 1);

SELECT id, contract,amount0, amount1, price FROM test LIMIT 6;
id contract amount0 amount1 price
1 klm 50 100 2.1
2 xyz 100 100 1.5
3 abc 80 100 1.8
4 xyz 80 80 2.1
5 xyz 200 50 1.8
6 xyz 60 60 1.8

Note that this does what you intended to do, making each selection independently random. In effect, it's quite likely that you'll get duplicate entries. If you prefer to keep getting unique combinations until you run out, and only then start over, you can do something similar to Mike's suggestion: (demo2)

INSERT INTO test (contract,amount0, amount1, price)
SELECT contract,amount0, amount1, price
FROM (values (50),(60),(80),(100),(200)) a(amount0),
     (values (50),(60),(80),(100),(200)) b(amount1),
     (values (1.5), (1.8), (2.1), (2.5)) c(price),
     (values ('abc'),('klm'),('xyz')   ) d(contract),
     generate_series(1, 1e2, 1) duplicator(n)
order by n, random()
limit 100;

Here's a function if you find yourself in need of a weighted random, meaning that you want to define exactly how likely an option is, compared to the others.


The reason why your solution didn't work is that since your scalar subqueries don't depend in any way on the outer query, they were evaluated only once and re-used. You can check the plan by running it with explain analyze verbose. You could trick the planner into thinking they somehow rely on the outer query by adding an outside reference, even if it does nothing (demo3) but the above does the same, in less code.

1
Umut TEKİN On

The problem is everytime you run your command PostgreSQL took a seed number and send it to srand function. That is why with the same seed you get same values from your arrays. Therefore, you need call function as many times you want another value from your arrays, and you can do that creating functions. That will cause context switch, but that is the only thing that I can think of. Maybe someone else can correct me. Here is the code:

create table test (
    id serial, 
    contract varchar, 
    amount0 int, 
    amount1 int, 
    price double precision
);

CREATE OR REPLACE FUNCTION random_amount()
  RETURNS int
  LANGUAGE sql VOLATILE PARALLEL SAFE AS
$func$
  SELECT ('[0:4]={50, 60, 80, 100, 200}'::int[])[trunc(random()  * (4 - 1 + 1) + 1)::int];
$func$;

CREATE OR REPLACE FUNCTION random_price()
  RETURNS int
  LANGUAGE sql VOLATILE PARALLEL SAFE AS
$func$
  (SELECT ('[0:3]={1.5, 1.8, 2.1, 2.5}'::float[])[trunc(random()  * (3 - 1 + 1) + 1)::int]);
$func$;

INSERT INTO test (amount0, amount1, price)
SELECT
  random_amount(), random_amount(), random_price()
FROM generate_series(1, 100);

select * from test;

Fiddle.

0
Kihara On

Your query is on the right track for generating random values for the 'amount0', 'amount1', and 'price' columns. To also include the 'contract' column with values 'abc', 'klm', and 'xyz', you can modify your query as follows:

INSERT INTO test (contract, amount0, amount1, price)
SELECT
  CASE 
    WHEN random() < 0.33 THEN 'abc'
    WHEN random() < 0.66 THEN 'klm'
    ELSE 'xyz'
  END AS contract,
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[1.5, 1.8, 2.1, 2.5]) AS val ORDER BY random() LIMIT 1)
FROM generate_series(1, 100);

SELECT id, contract, amount0, amount1, price FROM test LIMIT 6;

This query uses the CASE statement to randomly select values for the 'contract' column based on probabilities. It assigns 'abc', 'klm', or 'xyz' with roughly equal probabilities to each row.