For the CsvCruncher project, I am loading a CSV file into HSQLDB.
CREATE TEXT TABLE concat_1 ( Op VARCHAR(255), id VARCHAR(255), uuid VARCHAR(255), session_id VARCHAR(255) )
SET TABLE concat_1 SOURCE '.../concat_1.csv;encoding=UTF-8;cache_rows=50000;cache_size=10240000;ignore_first=true;fs=,;qc=\quote'
At the time of creating the table and loading, I don't know anything about the column values.
To speed the SELECTs up, I am trying to convert the columns (after loading) to other types, relying on this HSQLDB feature:
"HyperSQL allows changing the type if all the existing values can be cast into the new type without string truncation or loss of significant digits."
ALTER TABLE concat_1 ALTER COLUMN id SET DATA TYPE BIGINT
But when I try that, I get:
operation is not allowed on text table with data in statement
Is this possible with HSQLDB without duplicating the TEXT table into a normal (native) table?
Here's the code, for your imagination:
for (String colName : colNames) {
String sqlTypeUsed = null;
for (String sqlType : new String[]{"TIMESTAMP","UUID","BIGINT","INTEGER","SMALLINT","BOOLEAN"}) {
String sqlCol = String.format("ALTER TABLE %s ALTER COLUMN %s SET DATA TYPE %s",
tableName, colName, sqlTypeUsed = sqlType);
log.info("Column change attempt SQL: " + sqlCol);
try (Statement st = this.conn.createStatement()) {
st.execute(sqlCol);
log.info(String.format("Column %s.%s converted to to %s", tableName, colName, sqlTypeUsed));
} catch (SQLException ex) {
log.info(String.format("Column %s.%s values don't fit to %s.\n %s",
tableName, colName, sqlTypeUsed, ex.getMessage()));
}
}
}
I figured out. Although it's not documented, TEXT tables can't be altered while bound to a CSV file.
What I did:
1) Instead of trying
ALTER
with each type, I queriedSELECT CAST (<col> AS <type>)
.2) I collected all types that the column can fit in and chose the most specific and smallest.
3) Then I detached the table -
SET TABLE <table> SOURCE OFF
.4) Then I did the
ALTER COLUMN
.5) Lastly, reattach -
SET TABLE <table> SOURCE ON
.This way the table ends up with the most fitting type and the caches and indexes work more optimally.
For large tables, though, it could be worth flipping the resulting table into a native
CACHED
(disk-based) table.Code coming when I clean it up.