Installing PySpark on Linux Mint for PyCharm or VS Code Using pipx

By: Rich Dudley On: Mon 02 June 2025
In: data engineering
Tags: #data-engineering

At the time of writing I'm using Linux Mint 22.1 Cinnamon. PySpark 4.0.0 has just been released, and I work in either VS Code or PyCharm. The installation is not as straightforward as it seems.

Update: If you want to use pip instead of pipx, see my next post.

If you tried following the standard install (pip install pyspark), you probably received the following error message

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.

    See /usr/share/doc/python3.12/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

Long story short, Windows developers (and users) may remember "DLL hell", where one installer wrecks the installation environment created by another installer. Linux users will use both apt and pip for installing packages, which in the past caused issues if they overwrote or removed dependencies managed by the other installer. There's a good explanation with more links in the first answer for pip install -r requirements.txt is failing: "This environment is externally managed".

The easiest way to install PySpark (or any pip installed package) on Mint is to install and use pipx, which creates a venv (virtual environment) for the PySpark installation. Two advantages of using pipx is the virtual environment is automatically created for you, and you don't have to actvate the environment to use it. Instead of activating, you just need to configure the python interpreter in your IDE, which involves rooting around in a hidden folder. Do this one though, and you're good to go.

Pipx can be installed either using Mint's package manager, or using the Ubuntu commands at https://pipx.pypa.io/stable/installation/#on-linux.

Once pipx is installed, you just need to run pipx install pyspark to install pyspark in its own virtual environment. The nice thing about venvs is you can have multiple versions living side-by-side, so you can test your code between 3.5.5 and 4.0.0.

You don't activate pipx created venvs like you do traditional ones. Instead, you configure the venv python as the interpreter in your IDE. In PyCharm, create a new project and add a .py file with the following code:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

print(spark.version)

You should see a warning about pyspark being a missing dependency, which it is to the default python interpreter. In PyCharm, look in the lower right hand corner for the python version and click it. Then choose to add a local interpreter. pycharm step 1

We're going to use an existing interpreter, and navigate to the pyspark venv.

pycharm step 2

Pipx creates virtual environments in a hidden folder (similar to how Windows usses ApPData). Click the "eyeball" button to show hidden files.

pycharm step 3

Once the hidden folders are displayed, drill down home/{username}/.local/share/pipx/venvs/pyspark/bin, and select the proper version of the python interpreter, and OK back to the top. Pyspark shold no longer be seen as a missing dependency and you can run the little sample. If the PySpark version is outputted, it's go time! If you need to, you can switch back to the default interpreter by clicking the selected one in the lower right hand corner.

pycharm step 4

For VS Code, the process is similar. Open (or create) the python file we used above in VS Code. Again, in the lower right corner is the current interpreter. The codefile language is close by, so it can get a little confusing.

vs code step 1

We'll choose to enter an interpreter path, and then browse to the venv.

vs code step 2 vs code step 3

A file window will open, and once again we need to show hidden files. Right-click in the whitespace, and select "Show hidden files". Then, navigate to the same folder as before, select the python interpreter and OK back to the top.

vs code step 4

You should now be able to run our sample code. If the version prints out, it's go time!


If you found the article helpful, please share or cite the article, and spread the word: