Wikipedia extractor problem ValueError: cannot find context for 'fork'

Question

Wikipedia extractor problem ValueError: cannot find context for 'fork'

3.6k Views Asked by Shurup At 27 July 2025 at 17:39

My aim is to get plain text (without links, tags, parameters and other trash, only articles text) from wikipedia xml dumps (https://dumps.wikimedia.org/backup-index.html). I found WikiExtractor python script on GitHub (https://github.com/attardi/wikiextractor). After downloaing and installing it (i use PyCharm IDE, Windows 10) i tried to get it started with

wikiextractor -cb 250K -o extracted D:\Wiki_dumps\ruwiktionary-20211120-pages-articles-multistream.xml.bz2

but then (after preprocessing) i got following error

raise ValueError('cannot find context for %r' % method) from None ValueError: cannot find context for 'fork'

I tried to change the parameter in the following function from "fork" to "spawn" (advice from the internet)

Process = get_context("fork").Process

but this only leads to

TypeError: cannot pickle '_io.BufferedWriter' object

I have no idea how to fix it or what it might be related to

Here is full stack trace:

INFO: Preprocessing 'D:\Wiki_dumps\ruwiktionary-20211120-pages-articles-multistream.xml.bz2' to collect template definitions: this may take some time.

INFO: Preprocessed 100000 pages

...

INFO: Preprocessed 2300000 pages

INFO: Loaded 36839 templates in 209.9s

INFO: Starting page extraction from D:\Wiki_dumps\ruwiktionary-20211120-pages-articles-multistream.xml.bz2.

Traceback (most recent call last):

File "C:\Users\Shurup\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None,

File "C:\Users\Shurup\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals)

File "C:\Users\Shurup\PycharmProjects\pythonProject\venv\Scripts\wikiextractor.exe_main_.py", line 7, in

File "c:\users\shurup\pycharmprojects\pythonproject\venv\lib\site-packages\wikiextractor\WikiExtractor.py", line 640, in main process_dump(input_file, args.templates, output_path, file_size,

File "c:\users\shurup\pycharmprojects\pythonproject\venv\lib\site-packages\wikiextractor\WikiExtractor.py", line 359, in process_dump Process = get_context("fork").Process

File "C:\Users\Shurup\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 239, in get_context return super().get_context(method)

File "C:\Users\Shurup\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 193, in get_context raise ValueError('cannot find context for %r' % method) from None

ValueError: cannot find context for 'fork'

Here is stack trace with "spawn" instead of "fork" parameter

"spawn" parameter stack trace

Original Q&A

There are 2 best solutions below

**JKocerka** · Answer 1

You can run it in Docker. It works like a charm.

dockerfile:

FROM python:slim

WORKDIR /app
RUN pip install wikiextractor

COPY Wikipedia-20211212095544.xml /app/

CMD python -m wikiextractor.WikiExtractor --output /app/output /app/Wikipedia-20211212095544.xml

Build: docker build --pull --rm -f "Dockerfile" -t wikiextractor:latest
Run: docker run --rm -it --mount type=bind,source="$(PWD)\output",target=/app/output wikiextractor:latest

Make sure you have an output folder in your current working directory.

**Encore** · Answer 2

Encore On 13 September 2023 at 06:57

I directly pip install wikiextractor locally, and then pip install wikiextractor==0.1, and it can be extracted normally.

Wikipedia extractor problem ValueError: cannot find context for 'fork'

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in MULTIPROCESSING

Related Questions in FORK

Related Questions in EXTRACTOR

Related Questions in WIKIMEDIA-DUMPS

Trending Questions

Popular # Hahtags

Popular Questions