I am trying to query the pandas data frame with SQL syntax. I imported "import pandasql as ps", and loaded a dataframe. Then i entered this simple syntax:
ps.sqldf(
"""
SELECT department, first_name, AVG(salary) OVER (PARTITION BY department) AS average
FROM employee
GROUP BY department, first_name, salary
"""
).head()
it returned this error:
PandaSQLException Traceback (most recent call last)
<ipython-input-6-2ecbea651075> in <module>
----> 1 ps.sqldf(
2 """
3 SELECT department, first_name, AVG(salary) OVER (PARTITION BY department) AS average
4 FROM employee
5 GROUP BY department, first_name, salary
~/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandasql/sqldf.py in sqldf(query, env, db_uri)
154 >>> sqldf("select avg(x) from df;", locals())
155 """
--> 156 return PandaSQL(db_uri)(query, env)
~/.pyenv/versions/3.9.0/lib/python3.9/site-packages/pandasql/sqldf.py in __call__(self, query, env)
61 result = read_sql(query, conn)
62 except DatabaseError as ex:
---> 63 raise PandaSQLException(ex)
64 except ResourceClosedError:
65 # query returns nothing
PandaSQLException: (sqlite3.OperationalError) near "(": syntax error
[SQL:
SELECT department, first_name, AVG(salary) OVER (PARTITION BY department) AS average
FROM employee
GROUP BY department, first_name, salary
]
(Background on this error at: http://sqlalche.me/e/14/e3q8)
I tried rewriting the entire query, copy and paste again from notebook, but still can't fix the "(" problem. Would you guys please tell me how to fix ti?