make tsvector tokenize by space only

293 Views Asked by At

I need to create a tsvector that does not split its content by hyphens but ideally only by whitespace.

select to_tsvector('simple','7073-03-001-01 7072-05-003-06')

creates

'-001':3 '-003':7 '-01':4 '-03':2 '-05':6 '-06':8 '7072':5 '7073':1

where I rather want

'7072-05-003-06':2 '7073-03-001-01':1

is this possible somehow?

1

There are 1 best solutions below

3
On

There is a simple example of a parser called test_parser which seems to do what you want. It was last in the documents in 9.4, after that it was moved to only be documented in the source tree. These test extensions aren't always installed, so you might need to take special steps (depending on how you installed PostgreSQL and what your OS is and whether you are really using an EOL version) to get it.

create extension test_parser ;
create text search configuration test ( parser = testparser);
ALTER TEXT SEARCH CONFIGURATION test ADD MAPPING FOR word WITH simple;

SELECT * FROM to_tsvector('test', '7073-03-001-01 7072-05-003-06');
              to_tsvector              
---------------------------------------
 '7072-05-003-06':2 '7073-03-001-01':1