Imagine this, a city is documented by millions of points acquired by an airborne laser scanner. Now, in order to make sense of all of these points, we need to go one by one and classify them – does it belong to a tree? A building? A road? Here, we used a deep neural network to do the work for us!
Airborne LiDAR provides a vivid 3D digital representation by collecting massive and dense point clouds in a short time. The inherent information, however, only includes location (X, Y, Z) and echo – and this is how computers see it: millions of data points. They cannot differentiate between points on the roof or on the ground.
In order to extract meaningful information about the scanned city, we must give each point meaning. That is to say, we need to perform a classification process, which is also known as semantic segmentation.

Nowadays, most point cloud classification tasks are carried out by using machine learning. In recent years, both academia and industry prefer Deep Learning over classical “non-deep” learning because of its unprecedented results for classification.
The deep in deep learning
Deep learning is a specialized machine learning method, derived from a neural network approach. The “deep” refers to the depth (or the number) of the layers in the neural network. A neural network that consists of more than three hidden layers can be considered a “deep” learning network.
The main difference between deep learning and classical machine learning lies in how each method learns. In the classical approach, a set of features that describe a phenomenon is designed and calculated. Then, we teach the machine which class has which features. Therefore, the features we use and their quality affect the method’s performance immensely.




For airborne laser scans, the commonly used features are calculated based on points within a local neighborhood. The size of the local neighborhood needs to be carefully determined.
Deep learning, on the other hand, does not need such features. We feed the machine data that was classified in advance, and it does the rest. It is capable to ingest the data in its raw form (e.g., pixel values of images and X, Y, Z location values of point clouds) and to decide by itself which features describe each class.
So where is the challenge?
Unlike images that are structured as grids and are easy to index, in point clouds the location of a point is expressed only through its coordinates. That’s why even extracting each point’s neighbors is a task. The disorderliness of 3D point clouds makes the use of deep learning very challenging.
Deep learning of point clouds
There are multiple existing networks to analyze point clouds, we will focus on three:
PointNet++ was the first to apply deep learning on 3D point clouds directly. It uses the well-known multilayer perception (MLP) to learn the features of each point. However, multilayer perception does not take the data arrangement into account.
SparseCNN is designed exactly for that. Based on the powerful Convolutional Neural Network (CNN), it explores the spacial arrangement of data by examining its neighbors. Therefore, the point cloud has to be arranged in a structured manner. In SparseCNN, the point cloud is converted to a 3D grid called voxels and the convolution is carried in three-dimensions. The problem is that this rearrangement of the data causes a loss in resolution.
The KPConv method proposes to use the point cloud directly to avoid resolution loss. To do so, it defines a group of points as kernel points that carry the weights of each layer. Instead of the 2D matrix shape-like kernel in the convolution for images, KPConv uses a 3D sphere shape-like kernel in which the kernel points are spatially distributed around the center point The problem in this approach is that the 3D kernelization requires very high computational resources.
The classification of Vienna

We compared which of the methods will classify the point cloud of the city of Vienna (Austria) the best. We used the classical machine learning model “Random Forest” as the baseline.
An interactive point cloud of the classified city of Vienna. Navigate your way to see the classification result by Random Forest approach.
The three deep learning networks deliver a smoother classification result than the classical approach. SparseCNN achieves the most visually appealing result.
An interactive point cloud of the classified city of Vienna. Navigate your way to see the classification result by SparseCNN.
PointNet++ tends to ignore small objects. For example, the meadow on the left is left undetected.
An interactive point cloud of the classified city of Vienna. Navigate your way to see the classification result by PointNet++.
KPConv fails to detect large objects like the roofs at the bottom. This, however, is due to hardware limitations, which dictated only small kernels. This emphasizes the disadvantage of the method, which requires extremely high-end machines.
An interactive point cloud of the classified city of Vienna. Navigate your way to see the classification result by KPConv.
So what’s next?
Both classical machine learning and deep learning are supervised methods. That means that they require sufficient training samples to learn. Therefore, more and more researchers are focusing on efficient samples generation for training.
A popular method is “human in the loop”. In this scheme, the data is labeled by a pre-trained deep-learning model and then handed over to operators to correct the initial labels. This process is much faster than labeling from scratch. However, we are still looking for smarter active learning that can generate the labels automatically.
The role of Point Cloud Classification
Point cloud classification is a basic step for many applications. For instance, classified ground points are used to generate high-resolution DEMs, building points serve as input for 3D modeling, and vegetation points enable biomass computation and mapping. These, of course, are just a few examples, as classification basically gives us a good idea of what types of objects exist in our data.
The SparseCNN is online
The pre-trained model for ALS point cloud classification is ready to use through Zenodo.
Technical Details
Acquisition by an airborne laser scanner
- Resolution after strip adjustment: >15 points/m2 for 97% of the area.
Deep learning experiments were performed on a machine with:
- AMD Ryzen Threadripper 1900 × (3.5 GHz) processor
- 512 GB RAM
- Nvidia RTX 2080 Ti with 11 GB RAM
Participants in Research
Further Reading
- Tiles of airborne laser scanning point clouds of Vienna, Austria (2016)
- Distinctive 2D and 3D features for automated large-scale scene analysis in urban areas
This post is based on the scientific paper(s)
But refers also to: