South Africa’s biggest forum. Discuss, discover, and connect with thousands of members.
I’ve used R, but not through R Studio. We also use a bunch of custom wrappers, but shoot anyway.Is there anyone here who has programmed on R-Studio?
I'm doing an introduction course but just have a few simple questions.
I’ve used R, but not through R Studio. We also use a bunch of custom wrappers, but shoot anyway.
#install.packages("pacman")
#install.packages("caTools")
#install.packages("rsample")
#install.packages("dplyr")
#install.packages("caret")
#install.packages("e1071")
#install.packages("FNN")
#Load packages for use
library(pacman)
library("caTools")
library("rsample")
library("dplyr")
library("caret")
library("e1071")
library("FNN")
library(readr)
realEstate <- read_csv("C:/Users/User/Desktop/Notes/Predictive Analytics/Module3/Assignment/realEstate.csv")
dim(realEstate)
set.seed(101)
realEstate.df <- realEstate %>%
# We can convert numeric variables into categorical / factor as Naive Bayes can only handle categories
mutate(
#id = factor(id), #ignore
full_sq = factor(full_sq),
life_sq = factor(life_sq),
floor = factor(floor),
max_floor = factor(max_floor),
material = factor(material),
build_year = factor(build_year),
num_room = factor(num_room),
kitch_sq = factor(kitch_sq)
)
# train test split - select a random sample of rows - 80%
sample = sample.split(realEstate.df$priceClass, SplitRatio = 0.80)
train <- subset(realEstate.df, sample==TRUE) # training
test <- subset(realEstate.df, sample==FALSE) # test
# distribution of Attrition rates across train & test set
table(train$priceClass) %>% prop.table()
table(test$priceClass) %>% prop.table()
# We perform Naive Bayes on full_sq + life_sq + floor + max_floor + material + build_year + num_room + kitch_sq
Naive_Bayes_Model_4=naiveBayes(train$priceClass ~ full_sq + life_sq + floor + max_floor + material + build_year + num_room + kitch_sq, data=train)
Naive_Bayes_Model_4$tables
# Accuracy on training set
train_predictions_4 = as.data.frame(predict(Naive_Bayes_Model_4, train[,-1], type="raw"))
# removing target variable from dataframe for prediction
fc1 <- as.factor(ifelse(train_predictions_4$High>0.2,'High',ifelse(train_predictions_4$Medium>0.2, 'Medium', 'Low')))
fc2 <- as.factor(train$priceClass)
confusionMatrix(fc1, fc2)
# Accuracy - 0.7
I’ve used R, but not through R Studio. We also use a bunch of custom wrappers, but shoot anyway.
Posted the code and the data file. I've got the whole script working except for the last line.
confusionMatrix(as.factor(ifelse(train_predictions_4$High>0.2, 'Yes', 'No')), train$priceClass)
I dont understand this at all (this line came from a different assignment referring to something else)
I'm trying to adapt it to this code and get the accuracy level but I keep getting the error:
Error: `data` and `reference` should be factors with the same levels.
What I also dont understand is that why do I have to refer to one of the dimensions whenever I call
a variable?
Most of the time it refers to train$priceClass
why not just train?
"priceClass" is the target dimension (which we are trying to predict)
Like with the line: table(train$priceClass) %>% prop.table()