As far as I know, if you want to do heavy number crunching in Python, you generally implement it in C and then call it from Python. But cguy knows this area much better than me.
EDIT: Possibly a stupid question, but are you sure that you get identical results for identical input?
I’ve found Python to be great for number crunching as long as you don’t actually use any Python.
Also, in the above scenario NumPy is going to be using optimized vector instructions, possibly multithreading too, depending on settings. Going from compiled to interpreted scalar code is usually a 15-70x slow down, but going from compiled vector code to interpreted scalar code is in the order of 200x slower.
Last edited: