Creating data frames
Feature engineering


Well done! You can use multiple columns in conditions. Consider this example: You want to create a column named clean_ownership in the houses data frame that indicates whether a particular property's paperwork is in order.

Here are the conditions we'll use for creating this column:

  • The column is populated with "Yes" if usage_permit, ownership_licence, ownership_licence are all marked with "Y".
  • Otherwise, clean_ownership should be populated with "No".

First, we'll create the new clean_ownership column and populate it with an initial value of "No" for all rows:

houses$clean_ownership <- "No"

Afterwards, we'll overwrite "No" with "Yes" for rows that satisfy the clean ownership conditions:

  houses$usage_permit == "Y" & 
  houses$building_permit == "Y" & 
  houses$ownership_licence == "Y",]$clean_ownership <- "Yes"


A client is searching a house that has a decorated yard. We'll define a "decorated yard" as one that has a pool, garden, or both. Create a new column named garden_decorated in the houses data frame. The column is set to "Yes" if at least one of the columns has_pool or has_garden is populated with "Y".

Stuck? Here's a hint!

First you should populate the garden_decorated column with "No". Then populate the column with "Yes" according to the given condition:

houses[houses$has_pool == "Y" | houses$has_garden == "Y",]$garden_decorated <-"Yes"