Farms: Introduction to Data Science with R - Data Analysis Part 1
Productivity | Information | History | View | Quality
309800View
Part 1 in a in-depth hands-on series of videos introducing the viewer to Data Science using R. The video series illustrates the complete Data Mining project lifecycle via Kaggle's Titanic 101 competition. All source code from videos are available from GitHub at: https://github.com/EasyD/IntroToDataScience. NOTE - The data for the competition has changed since this video series was started. You can find the applicable .CSVs in the GitHub repo.
Comments
-
very simple and useful!
-
Hi Dave
Great vidieo
I would like to ask, in meta-analysis using R, how i can get Begg’s funnel plot and Egger’s linear
regression using R? -
The best tutorial ever. Thanks much.
-
very useful. Thanks.
-
ggplot(data.combined[1:891,],aes(x=title,fill=Survived))+
geom_histogram(binwidth = 0.5)+
facet_wrap("¬Pclass")+
ggtitle("Pclass")+
xlab("Title")+
ylab("Total Count")+
labs(fill="Survived")
how to go for tlide symbol and geom_bar not working :-P -
This was amazing. Thank you so much for all the resources and a very stimulating class.
-
Thank you for posting this! You're a great teacher...the fact that you're calm and go through each step makes it easier for whomever is watching to feel more confident about actually learning R. Cheers!
-
Hello, thanks for this video, I have downloaded and used the data from github but when I tried to combine the test and train together using rbind, I get the following error:
Warning message:
In `[<-.factor`(`tmp`, ri, value = c(0L, 1L, 1L, 1L, 0L, 0L, 0L, :
invalid factor level, NA generated
My entire column of survived for train became NA -
data.combined <- rbind(train,test.survived)
i am facing this error. please help ! ------- Error in match.names(clabs, names(xi)) :
names do not match previous names -------------- -
great tutorial!
-
Thank so much, that was a very good introduction.
-
the way this guy talks reminds me of Sam Harris
-
Great tutorial. Thanks David. It is not for beginners but if you have a previous little knowledge is fantastic.
-
Hi David, thank you for the helpful video. I got stuck in the video where I get the error message
data.combine <- rbind(train, test.survived)
Error in match.names(clabs, names(xi)) :
names do not match previous names -
At 18:32, you could say:
> tempTest<- Test
> tempTest$Survived<- "NONE"
No need of calling nrow() and rep() functions.
Also at 55:05 no need to use which() or even character() twice:
> dup.names <- data.combined [duplicated(data.combined$Name), ]
And to see if all duplicate names are diff people or not:
> data.combined [data.combined$Name %in% dup.names, ]
At 1:11:30, instead of all those If-Else statements and For loop, you could just simply write:
>data.combined $titles<-"Other"
> data.combined [grep("Miss.",data.combined$Name),"titles"] <- "Miss."
> data.combined [grep("Master.",data.combined$Name),"titles"] <- "Master."
> data.combined [grep("Mrs.",data.combined$Name),"titles"] <- "Mrs."
> data.combined [grep("Mr.",data.combined$Name),"titles"] <- "Mr." -
at the last histogram , how can I get the percent or relative frequency instead "the total count" at the x axis?
-
Leave a like who watching this in 2016
===> -
The video is great!!! I feel i love the data science even more. Do you have the data set so i can practice with? Could you share with me?
-
For those having troubles combining the data.frames train and test.
It's is probably because you have to put the columns of both sets in the same order and the added variable Survived should start with a capital letter.
I'm a newby but figured it out by looking at the datasets..
So the creation of the data.frame test.survived should be:
test.survived <- data.frame(Survived = rep("None", nrow(test)), test[,])
Next the added variable Survived in the test.survived data.frame has to be the 2nd column. You can reorder by issuing this command:
test.survived <- test.survived[,c(2,1,3,4,5,6,7,8,9,10,11,12)] -
great staff indeed!