ValueError, though check has already be performed for this

Question

ValueError, though check has already be performed for this

89 Views Asked by iFunction At 13 December 2016 at 15:46

Getting a little stuck with NaN data. This program trawls through a folder in an external hard drive loads in a txt file as a dataframe, and should reads the very last value of the last column. As some of the last rows do not complete for what ever reason, i have chosen to take the row before (or that's what i hope to have done. Here is the code and I have commented the lines that I think are giving the trouble:

#!/usr/bin/env python3

import glob
import math
import pandas as pd
import numpy as np

def get_avitime(vbo):
    try:
        df = pd.read_csv(vbo,
                         delim_whitespace=True,
                         header=90)
        row = next(df.iterrows())
        t = df.tail(2).avitime.values[0]
        return t
    except:
        pass

def human_time(seconds):
        secs = seconds/1000
        mins, secs = divmod(secs, 60)
        hours, mins = divmod(mins, 60)
        return '%02d:%02d:%02d' % (hours, mins, secs)
def main():
    path = 'Z:\\VBox_Backup\\**\\*.vbo'
    events = {}
    customers = {}

    for vbo_path in glob.glob(path, recursive=True):
        path_list = vbo_path.split('\\')
        event = path_list[2].upper()
        customer = path_list[3].title()
        avitime = get_avitime(vbo_path)
        if not avitime:             # this is to check there is a number
            continue
        else:
            if event not in events:
                events[event] = {customer:avitime}
                print(event)
            elif customer not in events[event]:
                events[event][last_customer] = human_time(events[event][last_customer])
                print(events[event][last_customer])
                events[event][customer] = avitime
            else:
                total_time = events[event][customer]
                total_time += avitime
                events[event][customer] = total_time
        last_customer = customer



    events[event][customer] = human_time(events[event][customer])
    df_events = pd.DataFrame(events)
    df.to_csv('event_track_times.csv')

main()

I put in a line to check for a value, but I am guessing that NaN is not a null value, hence it hasn't quite worked.

C:\Users\rob.kinsey\AppData\Local\Continuum\Anaconda3) c:\Users\rob.kinsey\Pro
ramming>python test_single.py
BARCELONA
03:52:42
02:38:31
03:21:02
00:16:35
00:59:00
00:17:45
01:31:42
03:03:03
03:16:43
01:08:03
01:59:54
00:09:03
COTA
04:38:42
02:42:34
sys:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
04:01:13
01:19:47
03:09:31
02:37:32
03:37:34
02:14:42
04:53:01
LAGUNA_SECA
01:09:10
01:34:31
01:49:27
03:05:34
02:39:03
01:48:14
SILVERSTONE
04:39:31
01:52:21
02:53:42
02:10:44
02:11:17
02:37:11
01:19:12
04:32:21
05:06:43
SPA
Traceback (most recent call last):
  File "test_single.py", line 56, in <module>
    main()
  File "test_single.py", line 41, in main
    events[event][last_customer] = human_time(events[event][last_customer])
  File "test_single.py", line 23, in human_time

The output is starting out correctly, except for the sys:1 error, but at least it carries on, and the final error that stalls the program completely. How can I get past this NaN issue, all variables I am working with should be of float data type or should have been ignored. All data types should only be strings or floats until the time conversion which are integers.

Original Q&A

There are 1 best solutions below

**iFunction** · Answer 1 · 2016-12-14T11:54:06.003000

Ok, even though no one answered, I am compelled to answer my own question as I am not convinced I am the only person that has had this problem.

There are 3 main reasons for receiving NaN in a data frame, most of these revolve around infinity, such as using 'inf' as a value or dividing by zero, which will also provide NaN as a result, the wiki page was the most helpful for me in solving this issue: https://en.wikipedia.org/wiki/NaN

One other important point about NaN it that is works a little like a virus, in that anything that touches it in any calculation will result in NaN, so the problem can get exponentially worse. Actually what you are dealing with is missing data, and until you realize that's what it is, NaN is the least useful and frustrating thing as it comes under a datatype not an error yet any mathematical operations will end in NaN. BEWARE!!

The reason on this occasion is because a specific line was used to get the headers when reading in the csv file and although that worked for the majority of these files, some of them had the headers I was after on a different line, as a result, the headers being imported into the data frame either were part of the data itself or a null value. As a result, trying to access a column in the data frame by header name resulted in NaN, and as discussed earlier, this proliferated though the program causing a few problems which had used workarounds to combat, one of which was actually acceptable which is to add this line:

df = df.fillna(0)

after the first definition of the df variable, in this case:

df= pd.read_csv(vbo,
               delim_whitespace=True,
               header=90)

The bottom line is that if you are receiving this value, the best thing really is to work out why you are getting NaN in the first place, then it is easier to make an informed decision as to whether or not replacing NaN with '0' is a viable choice.

I sincerely hope this helps anyone who finds it. Regards iFunction

ValueError, though check has already be performed for this

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in NAN

Trending Questions

Popular # Hahtags

Popular Questions