Use Machine Learning to Predict the Value of Bitcoin



The day has finally come at which I can use machine learning and bitcoin in the same article. The only thing that would impress me more would be relating the topics of machine learning, bitcoin, and blockchain in the same article, but I digress. Anway, the value of bitcoin is astonishing. It went from  $900 to well over $14,000 in less than 6 months time! So hey, let’s start riding this bubble and try to cash in before its too late. Therefore, we can use machine learning to predict the value of bitcoin in the future.

Machine Learning and Bitcoin

Before you get started, let’s establish a couple of things. First, it’s nearly impossible to accurately predict the value of a stock (and cryptocurrency) with a simple computer algorithm. This is because there are so many factors that can affect the price of a stock that we cannot account for. Think about this, for almost no apparent reason, the price of bitcoin rises and surges. There is no mathematical variable or equation that we can use to predict these rises and falls. Yes, there are some really advanced computer models for stocks which takes into account many long-term factors, but nothing is going to give you 100% accuracy.

Now when it comes to machine learning, we are going to keep it slightly old school. We are going to use linear regression to predict the value of bitcoin. Linear regression is perhaps the oldest method of “machine learning” but it’s also the easiest to comprehend.

Linear Regression

The most classic linear regression example is predicting the price of a house. Say we have the following information about the house:

  1. Number of Rooms
  2. Price

In this example, we would want to predict the target value (price) of the function. Now the only feature we have are the number of rooms (independent variable). If we plot number of rooms vs price we will notice a positive correlation. As the number of rooms increases so does the price of the house. Therefore, we can use a simple linear regression algorithm to predict the price of a house using X number of rooms. This can also be found by simply drawing the line of best fit.

positive correlation

Recurrent Neural Network

Another alternative in using machine learning to predict the value of bitcoin is to use recurrent neural networks with long short term memory (LSTM). Essentially, recurrent neural networks are like normal neural networks except that they “remember” past data. The outputs and inputs are feed in such a way that it creates “memory” for the network. This is ideal for working with time series data, such as the price of stocks or value of bitcoin. Unfortunately, implementing a recurrent neural network isn’t as straightforward as implementing linear regression. We will leave that as a future topic to this project.

Python

So let’s get start using machine learning to predict the value of bitcoin. I am going to use python with its Sklearn library to help us out. I uploaded a CSV and JSON file of the daily bitcoin prices from April 2013 to January 2018. You can use whichever you like, but for this example, I am going to use the CSV file. Below is my python program:

import csv
import numpy as np
from sklearn.svm import SVR
from sklearn import linear_model
import matplotlib.pyplot as plt

dates = []
prices = []

def get_data(filename):
with open(filename,'r') as csvfile:
csvFileReader = csv.reader(csvfile)
next(csvFileReader) #skipping column names
for row in csvFileReader:
dates.append(int(row[0]))
prices.append(float(row[5]))

return

def show_plot(dates,prices,x):
linear_mod = linear_model.LinearRegression()
dates = np.reshape(dates,(len(dates),1)) # converting to matrix of n X 1
prices = np.reshape(prices,(len(prices),1))
linear_mod.fit(dates,prices) #fitting the data points in the model
plt.scatter(dates,prices,color='black') #plotting the initial datapoints
plt.plot(dates,linear_mod.predict(dates),color='blue',linewidth=3) #plotting the line made by linear regression
plt.show()
predicted_price =linear_mod.predict(x)
return predicted_price

get_data('price.csv') # calling get_data method by passing the csv file to it

prices.reverse();

predicted_price = show_plot(dates,prices,1715)

print "The predicted price is: $",str(predicted_price)

The above program is not perfect (sorry I am a JS developer), but it gets the job done.

The Predicted Value of Bitcoin

If you run the python program, you will notice that it predicts the value of bitcoin to be $3,418 the next day. Clearly, this is very inaccurate. So what went wrong? Well let’s look at the graph:

See, until recently, bitcoin has a steady increase in price. Therefore, its value could fit into a simple linear model. However, bitcoin has a huge spike at the end of last year which caused its value to grow almost exponentially. So while we could try to fit the data into a more accurate linear model, it will still not be ideal because bitcoin’s growth has exploded.

Polynomial Regression

So we established that a linear model wouldn’t be the most accurate representation for the value of bitcoin. So let’s try to use a Polynomial regression instead. To keep things simple, I decided to code this one in javascript using the same data:

const regression = require('regression');
const bPrices = require('./BitcoinPrice');

const x = [];
const y = [];

const data = [];
let j = 0;
for (let i = bPrices.length - 1; i >= 0; i--) {
data.push([j, bPrices[i].Close]);
j++;
}
const result = regression.polynomial(data,{ order: 2 });
console.log(result.predict(1715));

If you run this program in NodeJS, you will notice that the output is $18,592. The current value of bitcoin is around $16k. Again, this is not very accurate but it’s a lot closer to the actual value compared to the linear regression.

Conclusion

tl;dr It’s very hard to predict the value of stocks and cryptocurrency. Using a linear model works if the increase is steady. Unfortunately (or not), Bitcoin’s value spiked exponentially. Therefore, we have to resort to a second degree polynomial to get a better prediction. Also, without any finite independent variables for correlation, there is only so much we can do to try to model the data using regression. In the end, a look into using recurrent neural networks would be the best approach.