Python: Multicore processing?

I've been reading about Python's multiprocessing module. I still don't think I have a very good understanding of what it can do.

Let's say I have a quadcore processor and I have a list with 1,000,000 integers and I want the sum of all the integers. I could simply do:

list_sum = sum(my_list)

But this only sends it to one core.

Is it possible, using the multiprocessing module, to divide the array up and have each core get the sum of it's part and return the value so the total sum may be computed?

Something like:

core1_sum = sum(my_list[0:500000]) #goes to core 1 core2_sum = sum(my_list[500001:1000000]) #goes to core 2 all_core_sum = core1_sum + core2_sum #core 3 does final computation

Any help would be appreciated.

--------------Solutions-------------

Yes, it's possible to do this summation over several processes, very much like doing it with multiple threads:

from multiprocessing import Process, Queue

def do_sum(q,l):
q.put(sum(l))

def main():
my_list = range(1000000)

q = Queue()

p1 = Process(target=do_sum, args=(q,my_list[:500000]))
p2 = Process(target=do_sum, args=(q,my_list[500000:]))
p1.start()
p2.start()
r1 = q.get()
r2 = q.get()
print r1+r2

if __name__=='__main__':
main()

However, it is likely that doing it with multiple processes is likely slower than doing it in a single process, as copying the data forth and back is more expensive than summing them right away.

Welcome the world of concurrent programming.

What Python can (and can't) do depends on two things.

  1. What the OS can (and can't) do. Most OS's allocate processes to cores. To use 4 cores, you need to break your problem into four processes. This is easier than it sounds. Sometimes.
  2. What the underlying C libraries can (and can't) do. If the C libraries expose features of the OS AND the OS exposes features of the hardware, you're solid.

To break a problem into multiple processes -- especially in GNU/Linux -- is easy. Break it into a multi-step pipeline.

In the case of summing a million numbers, think of the following shell script. Assuming some hypothetical sum.py program that sums either a range of numbers or a list of numbers on stdin.

( sum.py 0 500000 & sum.py 50000 1000000 ) | sum.py

This would have 3 concurrent processes. Two are doing sums of a lot of numbers, the third is summing two numbers.

Since the GNU/Linux shells and the OS already handle some parts of concurrency for you, you can design simple (very, very simple) programs that read from stdin, write to stdout, and are designed to do small parts of a large job.

You can try to reduce the overheads by using subprocess to build the pipeline instead of allocating the job to the shell. You may find, however, that the shell builds pipelines very, very quickly. (It was written directly in C and makes direct OS API calls for you.)

Sure, for example:

from multiprocessing import Process, Queue

thelist = range(1000*1000)

def f(q, sublist):
q.put(sum(sublist))

def main():
start = 0
chunk = 500*1000
queue = Queue()
NP = 0
subprocesses = []
while start < len(thelist):
p = Process(target=f, args=(queue, thelist[start:start+chunk]))
NP += 1
print 'delegated %s:%s to subprocess %s' % (start, start+chunk, NP)
p.start()
start += chunk
subprocesses.append(p)
total = 0
for i in range(NP):
total += queue.get()
print "total is", total, '=', sum(thelist)
while subprocesses:
subprocesses.pop().join()

if __name__ == '__main__':
main()

results in:

$ python2.6 mup.py
delegated 0:500000 to subprocess 1
delegated 500000:1000000 to subprocess 2
total is 499999500000 = 499999500000

note that this granularity is too fine to be worth spawning processes for -- the overall summing task is small (which is why I can recompute the sum in main as a check;-) and too many data is being moved back and forth (in fact the subprocesses wouldn't need to get copies of the sublists they work on -- indices would suffice). So, it's a "toy example" where multiprocessing isn't really warranted. With different architectures (use a pool of subprocesses that receive multiple tasks to perform from a queue, minimize data movement back and forth, etc, etc) and on less granular tasks you could actually get benefits in terms of performance, however.

Category:python Time:2009-07-25 Views:1

Related post

  • How can I send python multiprocessing Process output to a Tkinter gui 2010-11-19

    I'm trying to get output from a python multiprocessing Process displayed in a Tkinter gui. I can send output from Processes via a gui to a command shell, for example by running the fllowing tiny script at a shell prompt: from multiprocessing import P

  • How to rewrite my R code for multicore processing? 2011-08-17

    I have R code that I need to get to A "parallelization" stage. Im new at this so please forgive me if I use the wrong terms. I have a process that just has to chug through individual by individual one at a time and then average across individuals in

  • Python Daemon Process Memory Management 2012-01-18

    I'm currently writing a Python daemon process that monitors a log file in realtime and updates entries in a Postgresql database based on their results. The process only cares about a unique key that appears in the log file and the most recent value i

  • Python html processing 2012-02-10

    I have a html file with russian text. How i can get all words in text without html tags, special symbols, etc ? Example: <html>...<body>...<div id='text'>Foo bar! Foo, bar.</div></body></html> I need: ['foo','bar',

  • Python Form Processing alternatives 2009-11-10

    django.forms is very nice, and does almost exactly what I want to do on my current project, but unfortunately, Google App Engine makes most of the rest of Django unusable, and so packing it along with the app seems kind of silly. I've also discovered

  • How do I open all files of a certain type in Python and process them? 2010-01-02

    I'm trying to figure out how to make python go through a directory full of csv files, process each of the files and spit out a text file with a trimmed list of values. In this example, I'm iterating through a CSV with lots of different types of colum

  • Python parallel processing libraries 2010-01-12

    Python seems to have many different packages available to assist one in parallel processing on an SMP based system or across a cluster. I'm interested in building a client server system in which a server maintains a queue of jobs and clients (local o

  • Python image processing of picture directly from the web 2010-01-27

    I am writing python code to take an image from the web and calculate the standard deviation, ... and do other image processing with it. I have the following code: from scipy import ndimage from urllib2 import urlopen from urllib import urlretrieve im

  • Python multiprocessing process vs. standalone Python VM 2010-02-16

    Aside from the ease of use of the multiprocessing module when it comes to hooking up processes with communication resources, are there any other differences between spawning multiple processes using multiprocessing compared to using subprocess to lau

  • Calling another process from python/child process need to access shell 2010-04-21

    I'm calling a C/C++ program from python with Popen, python code should observe behavior of child process and collect some data for his own work. Problem is that C code already uses pipes for calling some shell commands - so after my execution from py

  • Dynamically loading modules in Python (+ multi processing question) 2010-05-31

    I am writing a Python package which reads the list of modules (along with ancillary data) from a configuration file. I then want to iterate through each of the dynamically loaded modules and invoke a do_work() function in it which will spawn a new pr

  • Aptana Studio is opening but not ever closing a python.exe process 2010-06-01

    I am developing a small testing website using Django 1.2 in Aptana Studio build 2.0.4.1268158907. I have a Django project that I test by running the command "runserver 8001" on my project. This command runs the project on a small server that comes wi

  • Python: Changing process name with setproctitle 2010-08-23

    I have a python script which launches a number of C++ programs, each program is passed a command line parameter as shown below process_path "~/test/" process_name "test" num_process = 10 for p in range(1, num_processes, 1): subprocess.Popen([process_

  • Is it safe to mix readline() and line iterators in python file processing? 2011-01-21

    Is it safe to read some lines with readline() and also use for line in file, and is it guaranteed to use the same file position? Usually, I want to disregard the first line (headers), so I do this: FI = open("myfile.txt") FI.readline() # disregard th

  • Python string processing, Unicode & Beautiful Soup 2011-02-23

    I've been searching for solutions to a bug that I have but haven't found/understood one that will work. Essentially, if I use the string functions (translate, strip, etc.) I get Unicode errors (ascii' codec can't encode character 'x' in position y: o

  • perl, python, parallel processing,fork 2011-03-31

    From Perl or python,Which language takes less overhead,less memory consumption and time while creating a process or parallel processing when sending or receiving a mail or messages? My client want to use any one technology perl or python? Thanks in A

  • upload csv/excel file to appengine (python) for processing 2011-05-10

    I need to be able to upload an excel or csv file to appengine so that the server can process the rows and create objects. Can anyone provide or point me to an example of how this is done? Thanks for your help. --------------Solutions------------- Upl

  • is python multiprocessing.Process start() doing the same job as subprocess.Popen 2011-05-25

    I am a newbie in Python. I recently tried to use Python script to call an console exe which is a process need long time. I will allow the exe being called as many times as the CPU can permit. And when the exe has finish its job. The CPU can be alloca

  • Python Multi-Processing Question? 2011-06-30

    I have a folder with 500 input files (total size of all files is ~ 500[MB]). I'd like to write a python script that does the following: (1) load all of the input files to memory (2) initializes an empty python list that will later be used ... see bul

Copyright (C) pcaskme.com, All Rights Reserved.

processed in 0.323 (s). 13 q(s)