Inserting dummy data into Postgres table

Question

Inserting dummy data into Postgres table

492 Views Asked by Westcoaster At 06 October 2023 at 13:39

I have got the following Postgres table:

create table test (
    id serial, 
    contract varchar, 
    amount0 int, 
    amount1 int, 
    price double precision
);

I would like to insert 100 rows of dummy data that conforms to the following:

– In column 'contract' there should be values out of 'abc', 'klm' and 'xyz'.

– In columns 'amount0' and 'amount1' there should be integer values out of 50, 60, 80, 100, 200.

– In column 'price' there should be values out of 1.5, 1.8, 2.1, 2.5.

What I have come up with so far is this:

INSERT INTO test (amount0, amount1, price)
SELECT
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[1.5, 1.8, 2.1, 2.5]) AS val ORDER BY random() LIMIT 1)
FROM generate_series(1, 100);

But this doesn't do the trick. This statement always only uses '100' for column 'amount0', '50' for column 'amount1' and '2.1' for column 'price'.

Can anyone help me out?

Original Q&A

There are 3 best solutions below

Umut TEKİN On 06 October 2023 at 15:14

The problem is everytime you run your command PostgreSQL took a seed number and send it to srand function. That is why with the same seed you get same values from your arrays. Therefore, you need call function as many times you want another value from your arrays, and you can do that creating functions. That will cause context switch, but that is the only thing that I can think of. Maybe someone else can correct me. Here is the code:

create table test (
    id serial, 
    contract varchar, 
    amount0 int, 
    amount1 int, 
    price double precision
);

CREATE OR REPLACE FUNCTION random_amount()
  RETURNS int
  LANGUAGE sql VOLATILE PARALLEL SAFE AS
$func$
  SELECT ('[0:4]={50, 60, 80, 100, 200}'::int[])[trunc(random()  * (4 - 1 + 1) + 1)::int];
$func$;

CREATE OR REPLACE FUNCTION random_price()
  RETURNS int
  LANGUAGE sql VOLATILE PARALLEL SAFE AS
$func$
  (SELECT ('[0:3]={1.5, 1.8, 2.1, 2.5}'::float[])[trunc(random()  * (3 - 1 + 1) + 1)::int]);
$func$;

INSERT INTO test (amount0, amount1, price)
SELECT
  random_amount(), random_amount(), random_price()
FROM generate_series(1, 100);

select * from test;

Fiddle.

Kihara On 08 October 2023 at 03:33

Your query is on the right track for generating random values for the 'amount0', 'amount1', and 'price' columns. To also include the 'contract' column with values 'abc', 'klm', and 'xyz', you can modify your query as follows:

INSERT INTO test (contract, amount0, amount1, price)
SELECT
  CASE 
    WHEN random() < 0.33 THEN 'abc'
    WHEN random() < 0.66 THEN 'klm'
    ELSE 'xyz'
  END AS contract,
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[1.5, 1.8, 2.1, 2.5]) AS val ORDER BY random() LIMIT 1)
FROM generate_series(1, 100);

SELECT id, contract, amount0, amount1, price FROM test LIMIT 6;

This query uses the CASE statement to randomly select values for the 'contract' column based on probabilities. It assigns 'abc', 'klm', or 'xyz' with roughly equal probabilities to each row.

**Zegarek** · Accepted Answer · 2023-10-06T17:04:52.477000

You can randomly subscript the array of choices: (demo)

INSERT INTO test (contract,amount0, amount1, price)
SELECT (ARRAY['abc','klm','xyz'])[(random()*2+1)::int],
       (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
       (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
       (ARRAY[1.5, 1.8, 2.1, 2.5])[(random()*3+1)::int]
FROM generate_series(1, 1e2, 1);

SELECT id, contract,amount0, amount1, price FROM test LIMIT 6;

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8

Note that this does what you intended to do, making each selection independently random. In effect, it's quite likely that you'll get duplicate entries. If you prefer to keep getting unique combinations until you run out, and only then start over, you can do something similar to Mike's suggestion: (demo2)

INSERT INTO test (contract,amount0, amount1, price)
SELECT contract,amount0, amount1, price
FROM (values (50),(60),(80),(100),(200)) a(amount0),
     (values (50),(60),(80),(100),(200)) b(amount1),
     (values (1.5), (1.8), (2.1), (2.5)) c(price),
     (values ('abc'),('klm'),('xyz')   ) d(contract),
     generate_series(1, 1e2, 1) duplicator(n)
order by n, random()
limit 100;

Here's a function if you find yourself in need of a weighted random, meaning that you want to define exactly how likely an option is, compared to the others.

The reason why your solution didn't work is that since your scalar subqueries don't depend in any way on the outer query, they were evaluated only once and re-used. You can check the plan by running it with explain analyze verbose. You could trick the planner into thinking they somehow rely on the outer query by adding an outside reference, even if it does nothing (demo3) but the above does the same, in less code.

Inserting dummy data into Postgres table

There are 3 best solutions below

Related Questions in SQL

Related Questions in DATABASE

Related Questions in POSTGRESQL

Related Questions in DUMMY-DATA

Trending Questions

Popular # Hahtags

Popular Questions

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8

id	contract	amount0	amount1	price
1	klm	50	100	2.1
2	xyz	100	100	1.5
3	abc	80	100	1.8
4	xyz	80	80	2.1
5	xyz	200	50	1.8
6	xyz	60	60	1.8