Saturday, March 28, 2015

R - Multiple subsetting a large xyz file with several polygons


Lets say you have csv file of size more than 4 GB and you want to extract xyz points with a certain boundary. If you need to extract xyz points for multiple boundaries then it will require some steps in ArcGIS program. Here i am demonstrating an R code to extract xyz points for multiple boundaries at a time.

Here is an example,

I have combined 5 LiDAR files into 1 "input.csv" file and I would like to extract xyz points for 3 multiple boundaries for each of their individual output.csv file.



I have created a csv file having Xmin, Ymin, Xmax, Ymax for each of the boundary corner points.



rm(list=ls())

library(sp)

workDir <- "J:/Thumb drive/Technical/GIS Exercise/Chapters/Chapter 14 - R by example in Statistics & Geospatial Analysis/R/reneedyourhelpmann/Incomplete/Multiple Subsetting/"
setwd(workDir)

#read in points from csv
input <- read.csv("input.csv", header=F)
names(input) <- c("x","y","z")

# create spatial points object
pts <- SpatialPoints(input[,1:2])

# read in bounding box coords
boxes <- read.table("XY(min,max).txt", sep=",")
names(boxes) <- c("id", "xmin", "ymin","xmax", "ymax")

# expand to coordinates for bounding box rectangles
poly.coords <- cbind(boxes$xmin, boxes$ymin,
                 boxes$xmin, boxes$ymax,
                 boxes$xmax, boxes$ymax,
                 boxes$xmax, boxes$ymin,
                 boxes$xmin, boxes$ymin)
ID <- boxes$id

# create polygons for bounding box rectangles
polys <- SpatialPolygons(mapply(function(poly, id) {
  xy <- matrix(poly, ncol=2, byrow=TRUE)
  Polygons(list(Polygon(xy)), ID=id)
}, split(poly.coords, row(poly.coords)), ID))

polys.df <- SpatialPolygonsDataFrame(polys, data.frame(id=ID, row.names=ID))

# overlay points on polygons, get polygon ID and add to input csv
input <- cbind(input, over(pts, polys.df))

# write out csv only for points that fall within bounding box polygons
output <- input[!is.na(input$id),]
head(output)
write.csv(output, "output.csv", row.names=F)

>              x       y       z id
> 102423 6521081 2056150 2544.94 B1
> 102424 6521054 2056184 2544.61 B1
> 102425 6521030 2056193 2544.75 B1
> 102427 6521073 2056161 2544.91 B1
> 102429 6521035 2056142 2545.27 B1
> 102432 6521046 2056171 2544.81 B1

# write out three separate files, no quotes around the ID column
write.csv(output[output$id == "B1",], "outputB4.csv", row.names=F, quote=F)
write.csv(output[output$id == "B2",], "outputB5.csv", row.names=F, quote=F)
write.csv(output[output$id == "B3",], "outputB6.csv", row.names=F, quote=F)


# write out three separate files, no quotes around the ID column
for (i in unique(output$id)){
  write.csv(output[output$id == i,], paste0("output",i,".csv"), row.names=F, quote=F)
}

This code will generate an outputB1.CSV, outputB2.CSV, outputB3.CSV having xyz file for each of the boundaries.

No comments:

Post a Comment