convert incoming text timestamp from rsyslog to timestamp for postrgesql

737 Views Asked by At

I have logs from various linux servers being fed by rsyslog to a PostgreSQL database. The incoming timestamp is an rsyslog'd RFC3339 formatted time like so: 2020-10-12T12:01:18.162329+02:00.

In the original test setup of the database logging table, I created that timestamp field as 'text'. Most things I need parsed are working right, so I was hoping to convert that timestamp table column from text to a timestamp datatype (and retain the subseconds and timezone if possible).

The end result should be a timestamp datatype so that I can do date-range queries using PostgreSQL data functions.

Is this doable in PostgreSQL 11? Or is it just better to re-create the table with the correct timestamp column datatype to begin with?

Thanks in advance for any pointers, advice, places to look, or snippets of code.

Relevant rsyslog config:

$template CustomFormat,"%timegenerated:::date-rfc3339% %syslogseverity-text:::uppercase% %hostname% %syslogtag% %msg%\n"
$ActionFileDefaultTemplate CustomFormat

...

template(name="rsyslog" type="list" option.sql="on") {
  constant(value="INSERT INTO log (timestamp, severity, hostname, syslogtag, message)
    values ('")
    property(name="timegenerated" dateFormat="rfc3339")  constant(value="','")
    property(name="syslogseverity-text" caseConversion="upper")  constant(value="','")
    property(name="hostname")  constant(value="','")
    property(name="syslogtag")  constant(value="','")
    property(name="msg")  constant(value="')")
}

and the log table structure:

CREATE TABLE public.log
(
    id integer NOT NULL DEFAULT nextval('log_id_seq'::regclass),
    "timestamp" text COLLATE pg_catalog."default" DEFAULT timezone('UTC'::text, CURRENT_TIMESTAMP),
    severity character varying(10) COLLATE pg_catalog."default",
    hostname character varying(20) COLLATE pg_catalog."default",
    syslogtag character varying(24) COLLATE pg_catalog."default",
    program character varying(24) COLLATE pg_catalog."default",
    process text COLLATE pg_catalog."default",
    message text COLLATE pg_catalog."default",
    CONSTRAINT log_pkey PRIMARY KEY (id)
)

some sample data already fed into the table (ignore the timestamps in the messsage, they are done with an independent handmade logging system by my predecessor): enter image description here

1

There are 1 best solutions below

2
On BEST ANSWER

You can in theory convert the TEXT column to TIMESTAMP WITH TIME ZONE with ALTER TABLE .. ALTER COLUMN ... SET DATA TYPE ... USING, e.g.:

postgres=# CREATE TABLE tstest (tsval TEXT NOT NULL);
CREATE TABLE

postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1

postgres=# ALTER TABLE tstest
             ALTER COLUMN tsval SET DATA TYPE TIMESTAMP WITH TIME ZONE
             USING tsval::TIMESTAMPTZ;
ALTER TABLE

postgres=# \d tstest
                       Table "public.tstest"
 Column |           Type           | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
 tsval  | timestamp with time zone |           | not null |


postgres=# SELECT * FROM tstest ;
             tsval
-------------------------------
 2020-10-12 12:01:18.162329+02
(1 row)

PostgreSQL can parse the RFC3339 format, so subsequent inserts should just work:

postgres=# INSERT INTO tstest values('2020-10-12T12:01:18.162329+02:00');
INSERT 0 1

postgres=# SELECT * FROM tstest ;
             tsval
-------------------------------
 2020-10-12 12:01:18.162329+02
 2020-10-12 12:01:18.162329+02
(2 rows)

But note that any bad data in the table (i.e. values which cannot be parsed as timestamps) will cause the ALTER TABLE operation to fail, so you should consider verifying the values before converting the data. Something like SELECT "timestamp"::TIMESTAMPTZ FROM public.log would fail with an error like invalid input syntax for type timestamp with time zone: "somebadvalue".

Also bear in mind this kind of ALTER TABLE requires a table rewrite which may take some time to complete (depending on how large the table is), and which requires a ACCESS EXCLUSIVE lock, rendering the table inaccessible for the duration of the operation.

If you want to avoid a long-running ACCESS EXCLUSIVE lock, you could probably do something like this (not tested):

  • add a new TIMESTAMPTZ column (adding a column doesn't rewrite the table and is fairly cheap provided you don't use a volatile default value)
  • creating a trigger to copy any values inserted into the original column
  • copy the existing values (using a bunch of batched updateds like UPDATE public.foo SET newlog = log::TIMESTAMPTZ
  • (in a single transaction) drop the trigger and the existing column, and rename the new column to the old one