Read text file from specific position and store in two arrays

636 Views Asked by At

I have text file which contains line like this:

@relation SMILEfeatures
@attribute pcm_LOGenergy_sma_range numeric
@attribute pcm_LOGenergy_sma_maxPos numeric
@attribute pcm_LOGenergy_sma_minPos numeric...

Where are about 6000 lines of these attributes, after attributes where are lines like this:

@data
1.283827e+01,3.800000e+01,2.000000e+00,5.331364e+00
1.850000e+02,4.054457e+01,4.500000e+01,3.200000e+01...

I need to seperate these strings in two different arrays. So far I only managed to store everything in one array.

Here is my code for storing in array:

 using (var stream = new FileStream(filePath, FileMode.OpenOrCreate))
                {
                using (var sr = new StreamReader(stream))
                {
                    String line;

                   while ((line = sr.ReadLine()) != null)
                    {
                            sb.AppendLine(line);
                    }
                 }
               string allines = sb.ToString();
               Console.WriteLine(sb);
                 }
3

There are 3 best solutions below

4
On BEST ANSWER

All strings after @relation SMILEfeatures and contains @attribute are stored in first array. All the strings after @data should are stored in the second array. Hope this is what you wanted.

        var relationLineNumbers = new List<int>();
        var dataLineNumbers = new List<int>();
        var relation = new StringBuilder();
        var data = new List<string>();

        using (var stream = new FileStream(filepath, FileMode.OpenOrCreate))
        {
            using (var sr = new StreamReader(stream))
            {
                string line;
                bool isRelation = false;
                bool isData = false;

                int lineNumber = 0;
                while ((line = sr.ReadLine()) != null)
                {
                    lineNumber++;

                    if (line.StartsWith("@relation SMILEfeatures"))
                    {
                        isRelation = true;
                        isData = false;
                        continue;
                    }

                    if (line.StartsWith("@data"))
                    {
                        isData = true;
                        isRelation = false;
                        continue;
                    }

                    if (isRelation)
                    {
                        if (line.StartsWith("@attribute"))
                        {
                            relation.Append(line);
                            relationLineNumbers.Add(lineNumber);
                        }
                    }

                    if (isData)
                    {
                        data.AddRange(line.Split(','));
                        dataLineNumbers.Add(lineNumber);
                    }
                }
            }

            Console.WriteLine("Relation");
            Console.WriteLine(relation.ToString());
            Console.WriteLine("Data");
            data.ForEach(Console.WriteLine);
0
On

The question is not really very clear. But my take is, collect all lines that start with @relation or @attribute in one bucket, then collect all number lines in another bucket. I have chosen to ignore the @data lines, as they do not seem to contain any extra information.

Error checking may be performed by making sure that the data lines (i.e. number lines) contain comma separated lists of parsable numerical values.

var dataLines = new List<string>();
var relAttLines = new List<string>();

foreach (var line in File.ReadAllLines())
{
    if (line.StartsWith("@relation") || line.StartsWith("@attribute"))
        relAttLines.Add(line);
    else if (line.StartsWith("@data"))
        //ignore these
        continue;
    else
        dataLines.Add(line);
}
1
On

All strings which starts with @relation SMILEfeatures and contains @attribute should be stored in first array. Numbers which starts with @data should be stored in second array.

Use string.Contains() and string.StatsWith() for checking.

Read every line and decide in wich array / list you want to put this line

void ReadAndSortInArrays(string fileLocation)
{
    List<string> noData = new List<string>();
    List<string> Data = new List<string>();

    using(StreamReader sr = new StreamReader(fileLocation))
    {
        string line;

        while(!sr.EndOfStream)
        {
            line = sr.ReadLine();

            if(line.StartsWith("@relation") && line.Contains("@attribute"))
            {
                noData.Add(line);
            }
            else if(line.StartsWith("@data")
            {
                Data.Add(line);
            }
            else
            {
                // This is stange
            }
        }
    }

    var noDataArray = noData.ToArray();
    var DataArray = Data.ToArray();
}

But i think that not every line is beginning with "@data"

So you may want to Read all lines and do somethink like this:

string allLines;
using(StreamReader sr = new StreamReader(yourfile))
{
    allLines = = sr.ReadToEnd();
}


var arrays = allLines.Split("@data");

// arrays[0] is the part before @data
// arrays[1] is the part after @data (the numbers)
// But array[1] does not contain @data