Reading Un-delimited text file in java via flatpack

529 Views Asked by At

I want to read data from text file in java, but text file doesn't contain any delimiter like space or comma after some text. Some guy told me that its possible via flatpack.

So how can I read text and parse it as delimited and stored them.

Eg of text file data

"Prod Name" "City" "Price" "zipcode" "Date"

samsungA London 65001402110/07/2018  
samsungA California 35001202122/08/2018  
samsungA Delhi 44001202112/08/2018

I want to store: as:

Name in string  
City in string  
Price in int  
zipcode in int  
date as date

Any view on how to achieve this?

3

There are 3 best solutions below

0
On BEST ANSWER
    Well you can use parser, and xml schema to define the length of the required variables that way one can extract the required varaibles. But yes, those variables will have predefined length.
    String data= "samsungA500";
    String schema = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n" + 
                    "<!-- DTD can be pulled from the Jar or over the web -->\r\n" + 
                    "<!DOCTYPE PZMAP SYSTEM  \"flatpack.dtd\" >\r\n" + 
                    "<!--<!DOCTYPE PZMAP SYSTEM \"http://flatpack.sourceforge.net/flatpack.dtd\"> -->\r\n" + 
                    "<PZMAP>\r\n" + 
                    "   <COLUMN name=\"std_name\" length=\"9\" />\r\n" + 
                    "   <COLUMN name=\"std_price\" length=\"3\" />\r\n" +  
                    "</PZMAP>";

InputStream mapping = new ByteArrayInputStream(schema.getBytes());
        InputStream dataStream = new ByteArrayInputStream(data.getBytes());    
Parser pzparser = DefaultParserFactory.getInstance().newFixedLengthParser(mapping, dataStream);
            DataSet ds = pzparser.parse();
while (ds.next()) {
                System.out.println(ds.getString("std_name"));
                System.out.println(ds.getInt("std_price"));
                System.out.println(ds.getString("std_name"));
            }
1
On

You can do this with a simple file reader. Your file is delimited by spaces; each row ends with a newline character according to your example.

As such, you just need to do a bit of arithmetic to calculate the indexes as you have price, post code and date information in the third piece of each row.

public static void main(String...args) throws IOException {
    final File file = new File("/home/william/test.txt");
    final String delimiter = " ";
    final int dateStrLen = 10;
    final int postCodeLen = 6;

    BufferedReader br = new BufferedReader(new FileReader(file));
    String tmp;
    while ((tmp = br.readLine()) != null) {
        String[] values = tmp.split(delimiter);

        String name = values[0];
        String city = values[1];
        int dateStartPos = values[2].length() - dateStrLen;
        int postCodeStartPos = dateStartPos - postCodeLen;

        String date = values[2].substring(dateStartPos);
        String postCode = values[2].substring(postCodeStartPos, dateStartPos);
        String price = values[2].substring(0, postCodeStartPos);
        // do something with the data
        // you could store it with a dto or in arrays, one for each "column"
        System.out.println(String.format("name: %s; city: %s; price: %s; post-code: %s; date: %s", name, city, price, postCode, date));
    }
}
1
On

I think that using a flatpack or not is not the problem. If the file does not contain delimiters, then you should view the table as a file built by data-columns and read it with character position definition.

You should say then that at the start of the file you have position 0 and then the next character is position 1 and then 2 ... and so on.

Then all rows that have data between inclusive 0 and 7 characters wide is the "Prod Name" and will return samsungA.

From character 9 to 18 (assuming 18 is the maximum position) you should read records of "City".

So prerequisite is to know how many characters wide is each data column. For example row 1 has "London" but then is "California" and you could have wider names. So you need to know or you need to find the maximum position that ends the data for each data-column.

And you can do it without flatpack.