Python vs C#

As far as I know, if you want to do heavy number crunching in Python, you generally implement it in C and then call it from Python. But cguy knows this area much better than me.

EDIT: Possibly a stupid question, but are you sure that you get identical results for identical input?

I’ve found Python to be great for number crunching as long as you don’t actually use any Python. :) Same for R, if one can keep everything as NumPy calls (or vector ops in R), it’s efficient (because it’s effectively C as you say). The moment you do a big loop in Python or R that does data movement or Math, perf falls through the floor.

Also, in the above scenario NumPy is going to be using optimized vector instructions, possibly multithreading too, depending on settings. Going from compiled to interpreted scalar code is usually a 15-70x slow down, but going from compiled vector code to interpreted scalar code is in the order of 200x slower.
 
Last edited:
The moment you do a big loop in Python or R that does data movement or Math, perf falls through the floor.

Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.


df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-3): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1


* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.
 
Last edited:
Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.


df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-4+1): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1


* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.
One thing that pops out is that you could probably pull out your data frame references: eg, l3 = df[‘l3’], and then reference l3 inside the loop. This way it’s not searching for the column on every access.

Also, for the rest of it, are you really comparing and setting ilocs, or the contents of the columns?

If the latter, I would expect something like this to be much faster than a for loop and if:
df['Flag4'] = df['val']< df['I5']
 
With a bit of tweaking and improving I was able to get it down to 36 minutes (from 52)
 
Wowzers, that result difference is crazy. I always thought python was faster and that's why its preferred in progrannubg compettions. I guess it depends on the use case.
 
Wowzers, that result difference is crazy. I always thought python was faster and that's why its preferred in progrannubg compettions. I guess it depends on the use case.
Understandable that in alternative universe with progrannubg compettions python would be faster.

Most of the reason people use python for stuff like this is that it's quicker to set up and write in python and all the heavy lifting is done in C++ libraries.
It might change now that C# is starting to treat scripting as a first class citizen with C# 9: https://anthonygiretti.com/2020/06/21/introducing-c-9-top-level-programs/

In terms of actual execution time, python is very, very slow, this one is Java vs Python: https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/python3-java.html again, don't take benchmarks as a holy grail, they're average stuff, and you need to pick what's good for your use-case.
 
A lot of Python libraries, e.g. numpy are written in C, making them fast. In these cases using Python gives the advantages of the Python env as well as C’s fast execution. This is why Python is very popular in the scientific community.

Always check if there are libraries that you can use and do a bit of research into those libraries. Chances are, this being 2020, there is a library you can use.
 
Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.


df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-3): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1


* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.
Numba might be able to help with the looping part, although you might need to pull out the data into a numpy array to work well.
Python:
from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

print(go_fast(x))


Python:
from numba import jit
import pandas as pd

x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

@jit
def use_pandas(a): # Function will not benefit from Numba jit
    df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
    df += 1                        # Numba doesn't understand what this is
    return df.cov()                # or this!

print(use_pandas(x))

https://numba.readthedocs.io/en/stable/user/5minguide.html

Numba does a JIT compilation of the functions you decorate into LLVM, which is pretty much as fast as C.
 
Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.


df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-3): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1


* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.

What did you do to get it down to 36 minutes?

Some things I'd do is change the loop to not need the i >= 2 statement, because that gets run every iteration.

I'd also rewrite how you calculate the min value, to use a sliding window instead of calling the min function on a range in every iteration. A simple stack data structure with a fixed size would work for this - you just need to track when values go out of the window. In fact, the only slightly complicated thing is if the current min value drops out of the window, then you need to search through the current window to find a new one. But this is much less work than what you are currently doing.
 
What did you do to get it down to 36 minutes?

Some things I'd do is change the loop to not need the i >= 2 statement, because that gets run every iteration.

I'd also rewrite how you calculate the min value, to use a sliding window instead of calling the min function on a range in every iteration. A simple stack data structure with a fixed size would work for this - you just need to track when values go out of the window. In fact, the only slightly complicated thing is if the current min value drops out of the window, then you need to search through the current window to find a new one. But this is much less work than what you are currently doing.

I saw one or two indentations werent right and took somethings out of a loop.
I then also added in the changes suggested by @cguy

I see there are a few lower levels that also have calcs, so optimizing each one will take some time.
I appreciate the advise, will definitely go through it when I've got more time.
Running my C# this morning again and I do 2700 tests in 4minutes (vs the 250 tests in 36min), so due to time will simply stick with C# atm.

The only pitfall for me with C# is that I write code (eg a new test to perform) and then each time I have to stop the app, recompile everything, so you'll always be in "debug" mode.
Then to include it I have to add a few lines to bring in the new code...nothing major just a schlep.
Whereas with python I create the test engine and then just simply on the fly add the new calcs/change calcs.

Not sure how to create an app that can dynamically include code in C#, but otherwise so far it's doing really well.
 
I thought I would give a bit of an update on this for those interested.
After having research a lot! I learned some techniques about python coding.

1) C and Python are not the same so replicating code exactly doesnt quite work
2) Python hates for loops - my code had a couple of these: for i in range(df.index.size):
rather use

3) Julia, Cython and Numpy help declaring a series as a type...these improves performance exponentially


I wrote the RSI calculator using a dataframe and looped through this - this is proven to be the slowest technique possible.
Converting the Panda dataframe alone to a numpy series the results were the following
(Im sure Cython and Julia would do even better)

RSI Calc - 2.7 seconds (original code)
RSI Calc - 0.3 seconds (convert to numpy only)

Only changing
price = table['close']
netchng = pd.Series([0])
ttlchng = pd.Series([0])
ret1 = price.copy().fillna(0)
for i in range(price.index.size):
To
price = np.array(table['close'].values, dtype=np.float)
netchng = np.zeros((price.size,1))
ttlchng = np.zeros((price.size,1))
ret1 = np.zeros((price.size,1))
for i in range(price.index.size):


Other tricks to improve performance
Strange one:
putting code into a function (in your main app) rather than in the main body shaves off a few seconds
 
Last edited:
I thought I would give a bit of an update on this for those interested.
After having research a lot! I learned some techniques about python coding.

1) C and Python are not the same so replicating code exactly doesnt quite work
2) Python hates for loops - my code had a couple of these: for i in range(df.index.size):
rather use

3) Julia, Cython and Numpy help declaring a series as a type...these improves performance exponentially


I wrote the RSI calculator using a dataframe and looped through this - this is proven to be the slowest technique possible.
Converting the Panda dataframe alone to a numpy series the results were the following
(Im sure Cython and Julia would do even better)

RSI Calc - 2.7 seconds (original code)
RSI Calc - 0.3 seconds

Only changing
price = table['close']
netchng = pd.Series([0])
ttlchng = pd.Series([0])
ret1 = price.copy().fillna(0)
for i in range(price.index.size):
To
price = np.array(table['close'].values, dtype=np.float)
netchng = np.zeros((price.size,1))
ttlchng = np.zeros((price.size,1))
ret1 = np.zeros((price.size,1))
for i in range(price.index.size):
Congrats.

BTW, one of the reasons that we use C/C++ for all our expensive number crunching, is that most of these "Oh, that's 1000x slower..." types of pitfalls are considerably less frequent. One learns the tricks and tips and pitfalls over time - almost always, the issue is of the form: using X which is very similar to Y will run the loop in C, while using Y will interpret the whole thing.
 
Can confirm... Python isn't a fan of http requests and for loops in terms of speed - although it does the job great with only a few lines of code.

A few weeks ago, I had to loop through some data from json and make a request to an API based on json value and store the returned data onto a Heroku hosted Postgresql database... Took about 90 minutes to run through ~1600 requests.

To be fair, I used my Rain LTE connection... maybe that was the biggest bottleneck.
Hopefully fibre in a few weeks.
 
Top
Sign up to the MyBroadband newsletter
X