Skip to content

    Easy parallel loops in Python, R, Matlab and Octave

    on August 7, 2014

    The Domino platform makes it trivial to run your analysis in the cloud on very powerful hardware (up to 32 cores and 250GB of memory), allowing massive performance increases through parallelism. In this post, we'll show you how to parallelize your code in a variety of languages to utilize multiple cores. This may sound intimidating, but Python, R, and Matlab have features that make it very simple.

    Read on to see how you can get over 3000% CPU output from one machine.

    per_snapshot
    Perf stats from some parallelized Python code running on a single, 32-core machine

    Is my code parallelizable?

    For the purpose of this post, we assume a common analysis scenario: you need to perform some calculation on many items, and the calculation for one item does not depend on any other. More precisely:

    1. Your analysis processes a list of things, e.g., products, stores, files, people, species. Let's call this the inputs.
    2. You can structure your code such that you have a function which takes one such thing and returns a result you care about. Let's call this function processInput. (After this step, you can then combine your results however you want, e.g., aggregating them, saving them to a file — it doesn't matter for our purposes.)

    Normally you would loop over your items, processing each one:

    for i in inputs
    results[i] = processInput(i)
    end
    // now do something with results

    Instead of processing your items in a normal a loop, we'll show you how to process all your items in parallel, spreading the work across multiple cores.

    To make our examples below concrete, we use a list of numbers, and a function that squares the numbers. You would use your specific data and logic, of course.

    Let's get started!

    Python

    Python has a great package, [joblib] that makes parallelism incredibly easy.

    [python]
    from joblib import Parallel, delayed
    import multiprocessing

    # what are your inputs, and what operation do you want to
    # perform on each input. For example...
    inputs = range(10)
    def processInput(i):
    return i * i

    num_cores = multiprocessing.cpu_count()

    results = Parallel(n_jobs=num_cores)(delayed(processInput)(i) for i in inputs)
    [/python]

    results is now [1, 4, 9 ... ]

    R

    Since 2.14, R has included the Parallel library, which makes this sort of task very easy.

    [code language="r"]
    library(parallel)

    # what are your inputs, and what operation do you want to
    # perform on each input. For example...

    inputs amp;lt;- 1:10
    processInput amp;lt;- function(i) {
    i * i
    }

    numCores amp;lt;- detectCores()

    results = mclapply(inputs, processInput, mc.cores = numCores)

    # the above won't work on Windows, but this will:
    cl amp;lt;- makeCluster(numCores)
    results = parLapply(cl, inputs, processInput)
    stopCluster(cl)
    [/code]

    You can find some more info on the difference between mclapply and parLapply on this StackOverflow post

    As an alternative, you can also use the foreach package, which lets you use a familiar for loop syntax, automatically parallelizing your code under the hood:

    [code language="r"]
    library(foreach)
    library(doParallel)
    library(parallel)

    numCores amp;lt;- detectCores()
    cl amp;lt;- makeCluster(numCores)
    registerDoParallel(cl)

    inputs amp;lt;- 1:10
    processInput amp;lt;- function(i) {
    i * i
    }

    results amp;lt;- foreach(i=inputs) %dopar% {
    processInput(i)
    }
    [/code]

    Matlab

    Matlab's Parallel Computing Toolbox makes it trivial to use parallel for loops using the parfor construct. For example:

    [code language="matlab"]
    inputs = 1:10;
    results = [];
    % assumes that processInput is defined in a separate function file
    parfor i = inputs
    results(i) = processInput(i);
    end
    [/code]

    Note that if your inputs are not integers (e.g., they are file names or item identifiers), you can use the parcellfun function, which operates on cell inputs, rather than array inputs.

    Octave

    Unfortunately, Octave doesn't have a nice parfor equivalent — but it does have its own Parallel package. Here's how you can use it:

    [code language="matlab"]
    if exist('OCTAVE_VERSION') ~= 0
    % you'll need to run this once, to install the package:
    % pkg install -forge parallel
    pkg load parallel
    end

    inputs = 1:10;
    numCores = nproc();

    % assumes that processInput is defined in a separate function file
    [result] = pararrayfun (numCores, @processInput, inputs);
    [/code]

    Note that you can use the parcellfun function if your inputs are not numbers (e.g., if they are file names or product identifiers).

    Conclusion

    Modern statistical languages make it incredibly easy to parallelize your code across cores, and Domino makes it trivial to access very powerful machines, with many cores. By using these techniques, we've seen users speed up their code over 32x while still using a single machine.

    If you'd like to try applying this approach to your analysis, please let us know, we're happy to help!

    Other posts you might be interested in