 # Python vs C#

#### cguy

##### Executive Member
As far as I know, if you want to do heavy number crunching in Python, you generally implement it in C and then call it from Python. But cguy knows this area much better than me.

EDIT: Possibly a stupid question, but are you sure that you get identical results for identical input?
I’ve found Python to be great for number crunching as long as you don’t actually use any Python. Same for R, if one can keep everything as NumPy calls (or vector ops in R), it’s efficient (because it’s effectively C as you say). The moment you do a big loop in Python or R that does data movement or Math, perf falls through the floor.

Also, in the above scenario NumPy is going to be using optimized vector instructions, possibly multithreading too, depending on settings. Going from compiled to interpreted scalar code is usually a 15-70x slow down, but going from compiled vector code to interpreted scalar code is in the order of 200x slower.

Last edited:
• Ancalagon

#### bchip

##### Expert Member
The moment you do a big loop in Python or R that does data movement or Math, perf falls through the floor.
Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.

df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-3): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1

* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.

Last edited:

#### cguy

##### Executive Member
Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.

df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-4+1): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1

* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.
One thing that pops out is that you could probably pull out your data frame references: eg, l3 = df[‘l3’], and then reference l3 inside the loop. This way it’s not searching for the column on every access.

Also, for the rest of it, are you really comparing and setting ilocs, or the contents of the columns?

If the latter, I would expect something like this to be much faster than a for loop and if:
df['Flag4'] = df['val']< df['I5']

#### bchip

##### Expert Member
With a bit of tweaking and improving I was able to get it down to 36 minutes (from 52)

• Ancalagon

#### theberg

##### Active Member
Wowzers, that result difference is crazy. I always thought python was faster and that's why its preferred in progrannubg compettions. I guess it depends on the use case.

#### Johnatan56

##### Honorary Master
Wowzers, that result difference is crazy. I always thought python was faster and that's why its preferred in progrannubg compettions. I guess it depends on the use case.
Understandable that in alternative universe with progrannubg compettions python would be faster.

Most of the reason people use python for stuff like this is that it's quicker to set up and write in python and all the heavy lifting is done in C++ libraries.
It might change now that C# is starting to treat scripting as a first class citizen with C# 9: https://anthonygiretti.com/2020/06/21/introducing-c-9-top-level-programs/

In terms of actual execution time, python is very, very slow, this one is Java vs Python: https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/python3-java.html again, don't take benchmarks as a holy grail, they're average stuff, and you need to pick what's good for your use-case.

#### zippy

##### Executive Member
A lot of Python libraries, e.g. numpy are written in C, making them fast. In these cases using Python gives the advantages of the Python env as well as C’s fast execution. This is why Python is very popular in the scientific community.

Always check if there are libraries that you can use and do a bit of research into those libraries. Chances are, this being 2020, there is a library you can use.

#### konfab

##### Honorary Master
Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.

df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-3): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1

* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.
Numba might be able to help with the looping part, although you might need to pull out the data into a numpy array to work well.
Python:
``````from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit
def go_fast(a): # Function is compiled to machine code when called the first time
trace = 0.0
for i in range(a.shape):   # Numba likes loops
trace += np.tanh(a[i, i]) # Numba likes NumPy functions
return a + trace              # Numba likes NumPy broadcasting

print(go_fast(x))``````

Python:
``````from numba import jit
import pandas as pd

x = {'a': [1, 2, 3], 'b': [20, 30, 40]}

@jit
def use_pandas(a): # Function will not benefit from Numba jit
df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame
df += 1                        # Numba doesn't understand what this is
return df.cov()                # or this!

print(use_pandas(x))``````

Numba does a JIT compilation of the functions you decorate into LLVM, which is pretty much as fast as C.

• bchip

#### Ancalagon

##### Honorary Master
Looking at timers now to identify the longest part of the code, its in a section where I do a lot of the calcs.
There's a lot of fors and if, not sure how else to do it.

df['Flag1'] = df['I1'].apply(lambda x: 1 if x >= 20 else 0)

for i in range(df.index.size):
if (i >= 2):
if (df['I2'].iloc >= df['I3'].iloc):
df['Flag2'].iloc = 1

minval = df['I4'].iloc[(i-3): (i+1)].min()
if (minval < 3):
df['Flag3'].iloc = 1

if (df['val'].iloc < df['I5'].iloc):
df['Flag4'].iloc = 1

* Note that the C# has the same calcs, so the 5 extra columns exist on both sides

** Recommendations on improvement are welcome.
What did you do to get it down to 36 minutes?

Some things I'd do is change the loop to not need the i >= 2 statement, because that gets run every iteration.

I'd also rewrite how you calculate the min value, to use a sliding window instead of calling the min function on a range in every iteration. A simple stack data structure with a fixed size would work for this - you just need to track when values go out of the window. In fact, the only slightly complicated thing is if the current min value drops out of the window, then you need to search through the current window to find a new one. But this is much less work than what you are currently doing.

• bchip

#### bchip

##### Expert Member
What did you do to get it down to 36 minutes?

Some things I'd do is change the loop to not need the i >= 2 statement, because that gets run every iteration.

I'd also rewrite how you calculate the min value, to use a sliding window instead of calling the min function on a range in every iteration. A simple stack data structure with a fixed size would work for this - you just need to track when values go out of the window. In fact, the only slightly complicated thing is if the current min value drops out of the window, then you need to search through the current window to find a new one. But this is much less work than what you are currently doing.
I saw one or two indentations werent right and took somethings out of a loop.
I then also added in the changes suggested by @cguy

I see there are a few lower levels that also have calcs, so optimizing each one will take some time.
I appreciate the advise, will definitely go through it when I've got more time.
Running my C# this morning again and I do 2700 tests in 4minutes (vs the 250 tests in 36min), so due to time will simply stick with C# atm.

The only pitfall for me with C# is that I write code (eg a new test to perform) and then each time I have to stop the app, recompile everything, so you'll always be in "debug" mode.
Then to include it I have to add a few lines to bring in the new code...nothing major just a schlep.
Whereas with python I create the test engine and then just simply on the fly add the new calcs/change calcs.

Not sure how to create an app that can dynamically include code in C#, but otherwise so far it's doing really well.

• Ancalagon

#### Hamster

##### Resident Rodent
Running my C# this morning again and I do 2700 tests in 4minutes (vs the 250 tests in 36min), so due to time will simply stick with C# atm.
Considered other languages? If it is raw speed you are after: https://www.gonum.org/

• cguy