Data subsets with dynamic conditions in R

I have a dataset of 2500 rows, all bank loans. Every bank loan has an outstanding amount and type of collateral. (Real estate, machines, etc.)

I need to draw a random selection from this dataset where for example the amount outstanding = 2.5m + 5% and a maximum of 25% of loans with the same asset class.

I found the optim function, but it asks for a function and looks to be built to optimize for a portfolio of stocks, which is much more complicated. I would say there is an easy way to achieve this?

I've created a sample dataset that better illustrates my question:

dataset <- data.frame(balance=c(25000,50000,35000,40000,65000,10000,5000,2000,2500,5000)
                      ,Collateral=c("Real estate","Aeroplanes","Machine tools","Auto Vehicles","Real estate",
                                    "Machine tools","Office equipment","Machine tools","Real estate","Auto Vehicles"))

      

If I want, for example, to get 5 loans from this dataset, the amount of remaining balance = 200,000 (with a 10 percent margin) and no more than 40% is allowed for the same type of collateral. (so the maximum is 2 out of 5 in this example)

Please let me know if more information is needed. Thanks a lot Tim

+3


source to share


1 answer


This function that I created:

pick_records <- function(df,size,bal,collat,max.it) {
  i <- 1
  j <- 1
  while ( i == 1 ) {
    s_index <- sample(1:nrow(df) , size)
    print(s_index)
    output <- df[s_index,]
    out_num <- lapply(output,as.numeric)
    tot.col <- sum(as.numeric(out_num$Collateral))
    if (sum(out_num$balance) < (bal*1.1) &
          sum(out_num$balance) > (bal*0.9) &
          all(  table(out_num$Collateral)/size  <= collat)   ) {
      return(output)
      break
    }
    print(j)
    j <- j + 1
    if ( j == max.it+1) {
      print('No solution found')
      break}     
  }
} 

> a <- pick_records(dataset,5,200000,0.4,20)
> a
  balance       Collateral
3   35000    Machine tools
7    5000 Office equipment
4   40000    Auto Vehicles
5   65000      Real estate
2   50000       Aeroplanes

      



Where df

- your data frame, size

- the number of records required and max.it

the number of maximum iterations to find a solution before returning the error no solution found

, bal

- is the limit for the balance sheet and collat

the same for the Pledge. You can change them as you wish.

Let me know if you don't receive any part of it.

+3


source







All Articles