I am unable to import VariantSpark 0.5.2 into a Google Colab notebook running Python 3.9.16, with Hail version 0.2.112 and Apache Spark version 3.3.2.
Here is the pip install:
pip install variant-spark
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting variant-spark
Downloading variant_spark-0.5.2-py2.py3-none-any.whl (65.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.0/65.0 MB 6.5 MB/s eta 0:00:00
Collecting typedecorator==0.0.5
Downloading typedecorator-0.0.5.tar.gz (5.9 kB)
Preparing metadata (setup.py) ... done
Building wheels for collected packages: typedecorator
Building wheel for typedecorator (setup.py) ... done
Created wheel for typedecorator: filename=typedecorator-0.0.5-py3-none-any.whl size=6189 sha256=1f412a09a88d820140a9a3e0de93860ee6868b15dbcae509507c9c4be5c0574f
Stored in directory: /root/.cache/pip/wheels/01/dc/6d/47993e6461d1198f57452fb57f750bfc83c831aa1603bf4433
Successfully built typedecorator
Installing collected packages: typedecorator, variant-spark
Successfully installed typedecorator-0.0.5 variant-spark-0.5.2
And this is the error message when I tried to import:
import hail as hl
import varspark.hail as vshl
vshl.init()
using variant-spark jar at '/usr/local/lib/python3.9/dist-packages/varspark/jars/variant-spark_2.12-0.5.2-all.jar'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-3d2ff0083f18> in <cell line: 3>()
1 import hail as hl
2 import varspark.hail as vshl
----> 3 vshl.init()
4 frames
<decorator-gen-1907> in init(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, skip_logging_configuration, local_tmpdir, _optimizer_iterations, backend, driver_cores, driver_memory, worker_cores, worker_memory, gcs_requester_pays_configuration, regions)
<decorator-gen-1909> in init_spark(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, skip_logging_configuration, local_tmpdir, _optimizer_iterations, gcs_requester_pays_configuration)
/usr/local/lib/python3.9/dist-packages/hail/context.py in init_spark(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, skip_logging_configuration, local_tmpdir, _optimizer_iterations, gcs_requester_pays_configuration)
425 app_name = app_name or 'Hail'
426 gcs_requester_pays_project, gcs_requester_pays_buckets = convert_gcs_requester_pays_configuration_to_hadoop_conf_style(gcs_requester_pays_configuration)
--> 427 backend = SparkBackend(
428 idempotent, sc, spark_conf, app_name, master, local, log,
429 quiet, append, min_block_size, branching_factor, tmpdir, local_tmpdir,
TypeError: SparkBackend__init__() got an unexpected keyword argument 'gcs_requester_pays_project'
I tried to install an older Apache Spark version (3.1.1), but it still didn't work.
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz
!tar -xvf spark-3.1.1-bin-hadoop2.7.tgz
!pip install -q findspark
!pip install pyspark
I am new to coding in general and I have no idea what is wrong. Could it be an error with VariantSpark's backend? or is my Apache Spark version still too new?
Sincerely,