I am unable to import VariantSpark 0.5.2 into a Google Colab notebook running Python 3.9.16, with Hail version 0.2.112 and Apache Spark version 3.3.2.

Here is the pip install:

pip install variant-spark
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting variant-spark
  Downloading variant_spark-0.5.2-py2.py3-none-any.whl (65.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.0/65.0 MB 6.5 MB/s eta 0:00:00
Collecting typedecorator==0.0.5
  Downloading typedecorator-0.0.5.tar.gz (5.9 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: typedecorator
  Building wheel for typedecorator (setup.py) ... done
  Created wheel for typedecorator: filename=typedecorator-0.0.5-py3-none-any.whl size=6189 sha256=1f412a09a88d820140a9a3e0de93860ee6868b15dbcae509507c9c4be5c0574f
  Stored in directory: /root/.cache/pip/wheels/01/dc/6d/47993e6461d1198f57452fb57f750bfc83c831aa1603bf4433
Successfully built typedecorator
Installing collected packages: typedecorator, variant-spark
Successfully installed typedecorator-0.0.5 variant-spark-0.5.2

And this is the error message when I tried to import:

import hail as hl
import varspark.hail as vshl
vshl.init()
using variant-spark jar at '/usr/local/lib/python3.9/dist-packages/varspark/jars/variant-spark_2.12-0.5.2-all.jar'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-3d2ff0083f18> in <cell line: 3>()
      1 import hail as hl
      2 import varspark.hail as vshl
----> 3 vshl.init()

4 frames
<decorator-gen-1907> in init(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, skip_logging_configuration, local_tmpdir, _optimizer_iterations, backend, driver_cores, driver_memory, worker_cores, worker_memory, gcs_requester_pays_configuration, regions)

<decorator-gen-1909> in init_spark(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, skip_logging_configuration, local_tmpdir, _optimizer_iterations, gcs_requester_pays_configuration)

/usr/local/lib/python3.9/dist-packages/hail/context.py in init_spark(sc, app_name, master, local, log, quiet, append, min_block_size, branching_factor, tmp_dir, default_reference, idempotent, global_seed, spark_conf, skip_logging_configuration, local_tmpdir, _optimizer_iterations, gcs_requester_pays_configuration)
    425     app_name = app_name or 'Hail'
    426     gcs_requester_pays_project, gcs_requester_pays_buckets = convert_gcs_requester_pays_configuration_to_hadoop_conf_style(gcs_requester_pays_configuration)
--> 427     backend = SparkBackend(
    428         idempotent, sc, spark_conf, app_name, master, local, log,
    429         quiet, append, min_block_size, branching_factor, tmpdir, local_tmpdir,

TypeError: SparkBackend__init__() got an unexpected keyword argument 'gcs_requester_pays_project'

I tried to install an older Apache Spark version (3.1.1), but it still didn't work.

!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz
!tar -xvf spark-3.1.1-bin-hadoop2.7.tgz
!pip install -q findspark
!pip install pyspark

I am new to coding in general and I have no idea what is wrong. Could it be an error with VariantSpark's backend? or is my Apache Spark version still too new?

Sincerely,

0

There are 0 best solutions below