Projects Panel

Projects:

Friday, 17 March 2017

AI Quickstart - Getting the SciPy Stack for machine learning and AI

Hi guys, here's a quick guide to getting the necessary packages and dependencies up and running on your computers in order to begin working with the Iris Dataset in Python. Generally, what you'll need is the SciPy Stack, and a few other packages if you plan to follow the IIT AI Study group sessions.

Keep in mind I wrote this especially for members of the IIT AI Study Group, so not all the information on this page may apply to a wider audience, or even make sense if you haven't attended the session.

Your options, in order of complexity
  • Installing Python, then installing pip, then finally installing the necessary packages with pip.

    You can't best the classics. If this is your first try, I fully recommend trying this out just to make sure it doesn't work for you. Linux and Mac users should have no problems, Windows users may run into a multitude of problems, depending on their Python version and system architecture, and graphics card.

    What you'll need to do is install Python, download either version 2.7 or 3.6, then update pip to the latest version by opening the command line and typing:
    python -m pip install --upgrade pip

    then use pip install to install all the needed packages:
    pip install --user numpy scipy sklearn matplotlib pandas
  • Installing one of the recommended distributions (Windows)

    Try out one of the recommended distributions suggested by the SciPy development team. This option is recommended for Windows users, since most of the distributions come with their own executable installers. Unfortunately, most of the distributions are very heavy in terms of download size, ranging from 400MB to around 700MB. If you feel this is not for you, move on to the next step.

  • Downloading Miniconda and installing all the necessary packages using the command line

    One of the recommended distributions above is the Anaconda distribution, which clocks in at about 400MB to download, and comes with a trial period (meh) and a freemium business model (ew). It is based on Conda, which is an open-source, cross-platform, language-agnostic binary package manager. It's basically a piece of software that handles all your builds for you in contained spaces called environments.

    If you are a Windows user with a limited internet connection, instead of the Anaconda distro, you can get the smaller, competely free Miniconda, which offers all of the same functionality through the command line, and clocks in at around 50MB. Install it, then you can install all the necessary packages using the command conda install <package name> (which works similarly to pip install, from above), using the command line, so you can feel like a hacker, you Windows plebeian.

    Note - when using conda, the sklearn library is referred to as scikit-learn. Using conda install sklearn will not yield a result. The command is conda install scikit-learn.
  •  Downloading the numpy+mkl wheel and then using pip.

    And here we get to the crux of the issue - why all the hate on Windows? Well, most of the problems Windows has with installing the SciPy stack has to do with the SciPy library itself, which is part of the SciPy stack, obviously. The SciPy lib depends on the numpy+mkl library, which isn't available through pip.

    If you've decided that you're too pro for all the above methods, then the only options left to you is obtaining the wheel (.whl) files for your particular Python version and system architecture, which is basically a giant offline installer that downloads the lib onto your local system so you can run it offline. They can be found at Unofficial Windows Binaries for Python Extension Packages - simply pick the appropriate .whl file and download it. You'll need both the numpy+mkl wheel and the SciPy wheel and then you'll need to use pip install to perform a offline install from local source.
Finally, don't forget to get the Iris Dataset (iris.data) from http://archive.ics.uci.edu/ml/machine-learning-databases/iris/

If anyone has any problems, please leave a message on the IIT AISG WhatsApp Group, and I'll try to get back to you as soon as possible.