R Studio

bchip

Expert Member
Joined
Mar 12, 2013
Messages
1,324
Reaction score
418
Is there anyone here who has programmed on R-Studio?

I'm doing an introduction course but just have a few simple questions.
 
Is there anyone here who has programmed on R-Studio?

I'm doing an introduction course but just have a few simple questions.
I’ve used R, but not through R Studio. We also use a bunch of custom wrappers, but shoot anyway.
 
I’ve used R, but not through R Studio. We also use a bunch of custom wrappers, but shoot anyway.

Awesome thanks. Will only be able to post in a few hours.
 
I have this code


Code:
#install.packages("pacman")
#install.packages("caTools")
#install.packages("rsample")
#install.packages("dplyr")
#install.packages("caret")
#install.packages("e1071")
#install.packages("FNN")

#Load packages for use
library(pacman)
library("caTools")
library("rsample")
library("dplyr")
library("caret")
library("e1071")
library("FNN")
library(readr)

realEstate <- read_csv("C:/Users/User/Desktop/Notes/Predictive Analytics/Module3/Assignment/realEstate.csv")
dim(realEstate)

set.seed(101)

realEstate.df <- realEstate %>%
  # We can convert numeric variables into categorical / factor as Naive Bayes can only handle categories
  mutate(
    #id = factor(id),           #ignore
    full_sq = factor(full_sq),
    life_sq = factor(life_sq),
    floor = factor(floor),
    max_floor = factor(max_floor),
    material = factor(material),
    build_year = factor(build_year),
    num_room = factor(num_room),
    kitch_sq = factor(kitch_sq)
  )

# train test split - select a random sample of rows - 80%
sample = sample.split(realEstate.df$priceClass, SplitRatio = 0.80)
train <- subset(realEstate.df, sample==TRUE) # training
test <- subset(realEstate.df, sample==FALSE) # test


# distribution of Attrition rates across train & test set
table(train$priceClass) %>% prop.table()
table(test$priceClass) %>% prop.table()


# We perform Naive Bayes on full_sq  + life_sq + floor + max_floor + material + build_year + num_room + kitch_sq
Naive_Bayes_Model_4=naiveBayes(train$priceClass ~ full_sq  + life_sq + floor + max_floor + material + build_year + num_room + kitch_sq, data=train)
Naive_Bayes_Model_4$tables

# Accuracy on training set
train_predictions_4 = as.data.frame(predict(Naive_Bayes_Model_4, train[,-1], type="raw"))
# removing target variable from dataframe for prediction

fc1 <- as.factor(ifelse(train_predictions_4$High>0.2,'High',ifelse(train_predictions_4$Medium>0.2, 'Medium', 'Low')))
fc2 <- as.factor(train$priceClass)

confusionMatrix(fc1, fc2) 
# Accuracy - 0.7
 

Attachments

Last edited:
I’ve used R, but not through R Studio. We also use a bunch of custom wrappers, but shoot anyway.

Posted the code and the data file. I've got the whole script working except for the last line.
confusionMatrix(as.factor(ifelse(train_predictions_4$High>0.2, 'Yes', 'No')), train$priceClass)

I dont understand this at all (this line came from a different assignment referring to something else)
I'm trying to adapt it to this code and get the accuracy level but I keep getting the error:

Error: `data` and `reference` should be factors with the same levels.
 
Last edited:
What I also dont understand is that why do I have to refer to one of the dimensions whenever I call
a variable?
Most of the time it refers to train$priceClass
why not just train?

"priceClass" is the target dimension (which we are trying to predict)

Like with the line: table(train$priceClass) %>% prop.table()
 
Posted the code and the data file. I've got the whole script working except for the last line.
confusionMatrix(as.factor(ifelse(train_predictions_4$High>0.2, 'Yes', 'No')), train$priceClass)

I dont understand this at all (this line came from a different assignment referring to something else)
I'm trying to adapt it to this code and get the accuracy level but I keep getting the error:

Error: `data` and `reference` should be factors with the same levels.

This is complaining about non-conforming data types.

Specifically, both factor objects each have a $levels attribute, and they don’t match for whatever reason. I expect that you should be able to print out train$priceClass$levels to see what this is. Then you can also print out $levels for the result of the as.factor call too. Then try understand the mismatch.
 
What I also dont understand is that why do I have to refer to one of the dimensions whenever I call
a variable?
Most of the time it refers to train$priceClass
why not just train?

"priceClass" is the target dimension (which we are trying to predict)

Like with the line: table(train$priceClass) %>% prop.table()

train is a container variable (perhaps a data frame), that has other values in it. If it is a data frame, then priceClass will refer to a column, but if it is a list, it can contain anything.

The operations using the $priceClass subobject are expecting values of the type of the subobject and not the parent container.

This is a design choice - it generally means that the operations are just more primitive, so they are less aware of your overall goal, but are more widely applicable.

You can always write a wrapper function that takes a container as a parameter and “knows” that say priceClass is a subobject of it and assumes it is there.
 
:D

Got it to work.
train_predictions_4 had the levels 'Yes' & 'No' but the train$priceClass had the levels 'High', 'Med', Low

Changed this
confusionMatrix(as.factor(ifelse(train_predictions_4$High>0.2, 'Yes', 'No')), train$priceClass)

fc1 <- as.factor(ifelse(train_predictions_4$High>0.2,'High',ifelse(train_predictions_4$Medium>0.2, 'Medium', 'Low')))
fc2 <- as.factor(train$priceClass)

confusionMatrix(fc1, fc2)

@cguy Thanks!
 
Top
Sign up to the MyBroadband newsletter
X