Predictive Models for Crypto using Machine Learning

Piet_Pompies

Active Member
Joined
Aug 5, 2022
Messages
37
Reaction score
21
I recently signed up for a Binance account and finally figured out how to use the API to download current, hourly data. I will now use this to attempt some predictive modelling and share my progress.
I will be using R with xgBoost and the TTR package in R.
I will use a cryptocurrency that has at leat 6 months worth of hourly data. As a start I will use Bitcoin (any suggestions welcome).

I use the "https://pypi.org/project/python-binance/" modules on Linux (Ubuntu 20.04) to get data from Binance.
To get hourly Open, High, Low, Close, Volume data you need the following:

In a folder create the follwing 2 files:
1) config.py
with the following 2 lines
API_KEY = 'YOUR API'
API_SECRET = 'YOUR SECRET API'

2) Download_Binance_Data.py:
import config, csv
from binance.client import Client

client = Client(config.API_KEY, config.API_SECRET)

csvfile = open('FULL_LOCATION_OF_DOWNLOADED_FILE', 'w', newline='')
candlestick_writer = csv.writer(csvfile, delimiter=',')

candlesticks = client.get_historical_klines("BTCUSDT", Client.KLINE_INTERVAL_1HOUR, "20 Jan, 2022")

for candlestick in candlesticks:
candlestick[0] = candlestick[0] / 1000
candlestick_writer.writerow(candlestick)

csvfile.close()

Change to the folder and run"
python Download_Binance_Data.py

UPDATES ON PROGRESS:
DATEAMOUNT IN (USDT)AMOUNT OUT (USDT)PROFIT/LOSS
17 Aug-18 AUG (SOL)5048.985-1.015
19 Aug - 19 Aug (AAVE)5047.516-2.4833
19 Aug - 19 Aug (SOL)5050.593+0.593
 
Last edited:
I'm keen to follow but I suspect there's way too much external data that you're not privy to that would be required to predict with any certainty...
Whales do what they like unfortunately.
I am hoping to predict Maximums and Minimums with a 48 hour range. That way major down-swings caused by whales might be mitigated.
 
library(magrittr)
library("lubridate")
library("readr")
library("scales")
library("xts")
library(broom)
library(caret)
library(compareDF)
library(cowplot)
library(data.table)
library(dplyr)
library(e1071)
library(fs)
library(ggplot2)
library(magrittr)
library(Matrix)
library(pracma)
library(pROC)
library(Rcpp)
library(readr)
library(roll)
library(splus2R)
library(stringr)
library(tidyr)
library(TTR)
library(useful)
library(xgboost)
library(zoo)
 
So the above script yields the following confusion matrix for buys (Predictions when to buy):

Reference
Prediction 0 1
0 146 8
1 0 14

Accuracy : 0.9524
95% CI : (0.9083, 0.9792)
No Information Rate : 0.869
P-Value [Acc > NIR] : 0.0002948

Kappa : 0.7526

Mcnemar's Test P-Value : 0.0133283

Sensitivity : 1.0000
Specificity : 0.6364
Pos Pred Value : 0.9481
Neg Pred Value : 1.0000
Prevalence : 0.8690
Detection Rate : 0.8690
Detection Prevalence : 0.9167
Balanced Accuracy : 0.8182

'Positive' Class : 0

If it predicts a 1 it means the maximum future price within 48 hours should be 106% or more.
The "Pos Pred Value" essentially states the model got it right >94% of the time.


For sells we get a different picture:
Reference
Prediction 0 1
0 27 0
1 134 7

Accuracy : 0.2024
95% CI : (0.1444, 0.2712)
No Information Rate : 0.9583
P-Value [Acc > NIR] : 1

Kappa : 0.0165

Mcnemar's Test P-Value : <2e-16

Sensitivity : 0.16770
Specificity : 1.00000
Pos Pred Value : 1.00000
Neg Pred Value : 0.04965
Prevalence : 0.95833
Detection Rate : 0.16071
Detection Prevalence : 0.16071
Balanced Accuracy : 0.58385

'Positive' Class : 0

If it predicts a 1 it means the minimum future price within 48 hours should be <95%.
While the "Pos Pred Value" is good, the Sensitivity is too low.


So this model is a decent model to enter the market but a pretty poor predictor of when to exit. I suppose you can just exit when you reach 6 % profit within 48h. I'll post some pictures for etter visuals later.
 
Dont waste your time on crypto.

Elon farts and crypto breaks (everything is a "black swan"), if you seriously wanna mess with this kinda thing focus on FX and Gold decades of data available.
 
Nice.... I'd add that regular trading advice applies including not risking more than .5% of your entire portfolio per trade and having a good stop loss strategy as well as a scalping strategy.
Very nice! Ta for sharing.

I'd be surprised if most bots aren't using similar ML strategies with the more readily available ML tools out there these days.
What platform are you running this on?
Small correction. The sell model has too many false positives so it may be too safe.

I use Binance data and run in the R statistcal package.

I am thinking of experimenting with $50 to see how the models perform.
I'll convert between various cryptos and USDT. More later.
 
Last edited:
With a few extra parameters added I get a pretty decent buying and selling accuracy of 97% and 90% respectively. When you plot the predicted buy and sell signalls it looks like this:
Top graph is buy signals. Middle graph is sell signals. Bottom graph is Buy minus sell to remove overlaps to give best buy signals. This is for Bitcoin. So no buy signals for now. Everything right of the black vertical line is 48h or less.
 

Attachments

  • BTC.png
    BTC.png
    290.9 KB · Views: 21
Last edited:
For Ethereum the buying and selling accuracies are 70% and 93% respectively.ETH.png
 
Last edited:
For Solana the buying and selling accuracies are 79% and 95% respectively. Green lines are + and - 5 percent range:
SOL.png
 
Nice hardware... going to test on a 4/8 intel and will report back.
Did you factor trade costs in anywhere, or are you just making sure bids are big enough to cover those?
Trade cost on Binance appears to be between 0.1 and 0.5%. So I am going worst case scenario whereby it cost 0.75% to convert to a coin and then another 0.75% to convert back to USDT. So a 1.5% trade cost for a back and forth. The model predicts buys that will max out at at least 6% within 48h (realistically let us say 3% at least).

So if the model is correct then there should be at least more than 0 profit and hopefully 4% or above per trade.
 
Aragon is interesting because there appears to be specific hours in the week where it hits a minimum and maximum:
ANTc.png
For example Tuesday and Saturday 1 O'clock in the morning it tends to hit bottom and within 48h it peaks again (above graph is average for each "week hour" from 16 May this year).

Here is its model:
The buying and selling accuracies are 66% and 81% respectively.
ANT.png
 
Trade cost on Binance appears to be between 0.1 and 0.5%. So I am going worst case scenario whereby it cost 0.75% to convert to a coin and then another 0.75% to convert back to USDT. So a 1.5% trade cost for a back and forth. The model predicts buys that will max out at at least 6% within 48h (realistically let us say 3% at least).

So if the model is correct then there should be at least more than 0 profit and hopefully 4% or above per trade.
Regular spot trading fee is 0.1% for maker and take, Then just buy some BNB tokens and you can get 25% off all trading fees if you hold BNB, so 0.075% for each trade...
 
Regular spot trading fee is 0.1% for maker and take, Then just buy some BNB tokens and you can get 25% off all trading fees if you hold BNB, so 0.075% for each trade...
Thanks. Do you need to trade with the BNB tokens or just have a few?
 
It sounds as though you are trying to predict exogenous events (impossible by definition), and are using ML to guess things similar to easily derived quantities (future variance, which is usually similar to past variance - the prediction aspect would only provide useful additional detail if it predicts changes in variance better that simple historical variance, which I doubt it does).

Generally speaking, there are much bigger players doing exactly this stuff, but with more information, more hardware and specialists in the field. If there is an opportunity, they will exploit it and trade it out of the market before you.
 
It sounds as though you are trying to predict exogenous events (impossible by definition), and are using ML to guess things similar to easily derived quantities (future variance, which is usually similar to past variance - the prediction aspect would only provide useful additional detail if it predicts changes in variance better that simple historical variance, which I doubt it does).

Generally speaking, there are much bigger players doing exactly this stuff, but with more information, more hardware and specialists in the field. If there is an opportunity, they will exploit it and trade it out of the market before you.
My reasoning for trying this is as follows:
ML is just good at recognizing patterns.
Human behaviour follow patterns. E.g. day and night cycles and daily hormonal changes. Weekend and mid week spending patterns etc.
Human behaviour to a large extent still dictates trading patterns (yes, bots also affect it).

With these models I pick up that hours of the week are always in the top 10 of "Gain" when looking at the importance of predictive parameters. E.g. Tuesday and Saturday in the morning 01:00 SAST price tend to hit a minimum. This coresponds to Hong Kong mornings 08:00 (a lot of trading happens in Hong Kong). 24-48h after this minimum it tends to rebound. Why is this pattern there?
It may be that some crypto is converted to cash for the weekend (upping supply, downing demand and thus downing the price). And then traders buy at these low prices to drive up the price again. It may even be bots. Who cares, the patterns are there to be exploited and ML can help identify it.

And I agree, big players employ these techniques. But they cannot fully negate these patterns for me not to identify and use. Their bots might actually help to make patterns that can be identified.
 
My reasoning for trying this is as follows:
ML is just good at recognizing patterns.
Human behaviour follow patterns. E.g. day and night cycles and daily hormonal changes. Weekend and mid week spending patterns etc.
Human behaviour to a large extent still dictates trading patterns (yes, bots also affect it).

With these models I pick up that hours of the week are always in the top 10 of "Gain" when looking at the importance of predictive parameters. E.g. Tuesday and Saturday in the morning 01:00 SAST price tend to hit a minimum. This coresponds to Hong Kong mornings 08:00 (a lot of trading happens in Hong Kong). 24-48h after this minimum it tends to rebound. Why is this pattern there?
I seriously doubt that you have discovered a directional pattern. Sure it may hit a minimum, but it also should hit a maximum. If it is just hitting a minimum and hits the maximum at some other point of the day, either you have outsmarted companies that have invested billions into such research, or you’re wrong.

It may be that some crypto is converted to cash for the weekend (upping supply, downing demand and thus downing the price). And then traders buy at these low prices to drive up the price again. It may even be bots. Who cares, the patterns are there to be exploited and ML can help identify it.

And I agree, big players employ these techniques. But they cannot fully negate these patterns for me not to identify and use. Their bots might actually help to make patterns that can be identified.
They can and do fully negate them for the vast majority of players. The interplay with existing bots is very complex, because they certainly detect each others patterns and then inform a new bot that effects the market in turn - you need to find the hardest to find patterns and react to changing conditions rapidly. This is what the successful players do very very well.
 
Last edited:
It's just about increasing odds of getting a winning bid... given evolving data, you should almost always have an edge with short term trades IMO. As pointed out by piet_pompies, bots are rife and contribute to useful data for prediction.
There is enormous competition on the short term trades. What edge do you think there is?
 
Top
Sign up to the MyBroadband newsletter
X