• About
  • Visual Basic for Applications (VBA)

Adam On Analytics

~ Ramblings on analytics, business, statistics and anything else

Adam On Analytics

Tag Archives: stratified sampling

R Function for Stratified Sampling

10 Tuesday Apr 2012

Posted by adamsanalytics in Uncategorized

≈ 7 Comments

Tags

R, stratified sampling

So I was trying to obtain 1000 random samples from 30 different groups within approximately 30k rows of data. I came across this function:

http://news.mrdwab.com/2011/05/20/stratified-random-sampling-in-r-from-a-data-frame/

However, when I ran this function on my data, I received an error that R ran out of memory. Therefore, I had to create my own stratified sampling function that would work for large data sets with many groups.

After some trial and error, the key turned out to be sorting based on the desired groups and then computing counts for those groups. The procedure is extremely fast, taking only .18 seconds on a large data set. I welcome any feedback on how to improve!

stratified_sampling<-function(df,id, size) {
#df is the data to sample from
#id is the column to use for the groups to sample
#size is the count you want to sample from each group

# Order the data based on the groups
df<-df[order(df[,id],decreasing = FALSE),]

# Get unique groups
groups<-unique(df[,id])
group.counts<-c(0,table(df[,id]))
#group.counts<-table(df[,id])

rows<-mat.or.vec(nr=size, nc=length(groups))

# Generate Matrix of Sample Rows for Each Group
for (i in 1:(length(group.counts)-1)) {
start.row<-sum(group.counts[1:i])+1
samp<-sample(group.counts[i+1]-1,size,replace=FALSE)

rows[,i]<-start.row+samp

}

sample.rows<-as.vector(rows)
df[sample.rows,]
}

Advertisement

Subscribe

  • Entries (RSS)
  • Comments (RSS)

Archives

  • June 2019
  • May 2019
  • April 2019
  • November 2014
  • October 2014
  • August 2014
  • April 2012
  • August 2011

Categories

  • Neural Networks
  • Optimization
  • Real Estate
  • SAS
  • Statistics
  • Uncategorized
  • VBA

Meta

  • Register
  • Log in

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Adam On Analytics
    • Already have a WordPress.com account? Log in now.
    • Adam On Analytics
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar