In this chapter, results of the lidar sensor modeling workgroup within ENABLE-S3 are presented. The main objective is to describe the commonly agreed general functional blocks and interfaces of lidar sensor systems for object detection. Having the in the following described interfaces at hand, requirements for the generation of synthetic lidar data at specific interfaces can be formulated and modeling as well as verification and validation of the simulation can be performed.

1 Motivation for Interface Definitions by Functional Decomposition of Lidar Sensor Systems

Lidar sensors are seen as one of the key components for automated driving (AD) as well as for highly automated driving functions (HAF), such as a highway pilot. They are active perception sensors, which enable to perceive a vehicle’s environment by emitting light and receiving its reflections. The resulting information is currently subject to research for the application in obstacle detection and classification, simultaneous localization and mapping (SLAM), but also ego-motion estimation.

In this chapter, results accomplished by the lidar sensor modeling workgroup within ENABLE-S3 are presented. The partners forming the workgroup all individually work on lidar sensor models of different extent and complexity for different types of lidar sensors and output data. Therefore, the first action of the workgroup was to find a common understanding for describing lidar sensors.

The following chapter presents the definitions of common interfaces and functional blocks in between that were elaborated and agreed upon. Furthermore, the interfaces, which were jointly identified by functional decomposition of lidar sensor systems for object recognition, are listed and described. As the interface specification is derived from inspected, available hardware components and agreed on within the ENABLE-S3 project, the authors propose an interface specification of lidar sensor systems for the application in virtual and real validation and verification. In fact, the standardization of sensor model interfaces is already ongoing, as e.g. the Open Simulation Interface (OSI) [1] originating from the German project PEGASUS [2]. An exchange has been established to this related work to ensure consistency.

Additionally, the definitions serve as a basis for the formulation of requirements, for the implementation, as well as for comparison, benchmarking and validation of sensor models that output data through one of the interfaces. Therefore, the authors propose the sequence of interfaces, as functional decomposition is performed exemplarily on the processing chain for object perception with lidar sensors. The overall goal of the partners forming the workgroup for lidar sensor modeling within ENABLE-S3 is, besides implementing different models, to develop measures for rating model quality and for benchmarking different modeling approaches. Based on the derived commonly agreed interfaces, now, metric definition and model validation against real data as well as benchmarking of different models can be performed.

The following section starts with a general introduction to lidar sensors and their physical measurement and signal processing principle. Then, an overview of the agreed interfaces is given. After this overview, the interfaces are described in more detail. Finally, after a conclusion, possible next steps are proposed.

2 Lidar Sensor Principle

Light detection and ranging (lidar) is an optical measuring technique for the localization and range measurement of surrounding objects. The active sensor emits light---usually from lasers---into the environment at a specific wavelength and measures the resulting echo. The time from transmission of the light (laser) impulse to the time of reception of the reflections is proportional to the radial range between the sensor and the detected object [3]. This principle is called “time of flight” measurement and the radial range r can be calculated by \( r=\frac{1}{2}{\mathrm{c}}_0{\tau}_{of} \), where c0 is the velocity of light and τ of is the time of flight of the laser beam.

In principle, the operation of a lidar sensor is similar to other active environment sensors like pulse radar, but it uses infrared light instead of e.g. microwaves. This results in an increased angular resolution and a higher atmospherical attenuation due to its shorter wavelength. The attenuation, the properties of the reflective surface, and the power and divergence of the emitted beam determine the ratio of the powers of the received echo over the transmitted power. Depending on the sensitivity of the lidar, this results in a maximum range until which an echo can be detected.

Furthermore, the view angle occupied by a target of a given size decreases as its distance from the lidar increases. Therefore, objects are less likely to be detected at longer distances. Besides, lidar sensors are sensible to particles present in the range between the sensor and the hit obstacle, such as rain, spray or snowfall. Still, automotive lidar sensor systems should be able to recognize that and suppress their effect. Additionally, the emitted laser beam does not stay infinitesimally thin in reality, as illustrated in Fig. 1. E.g., in the case of Ibeo 2010 sensors, the divergence is 0.8° in vertical direction and 0.08° in horizontal direction [5].

Fig. 1
figure 1

Beam divergence and echo pulse width, based on [4]

Beam divergence causes its energy to be spread over a wider area for higher distances. This spread, together with the already described generally smaller angular coverage by objects and the attenuation by the atmosphere leads to a lower detection probability for objects at higher distances. Furthermore, from a modeling perspective, the number of different materials and the complexity of shapes that get hit by a single beam rise at longer distances, which leads to higher requirements for the sensor model and the modeled environment if beam divergence should be included.

Lidar uses pulse modulation, where the source of the light signal is a semiconductor diode that emits photons when subjected to an electrical current. The sensors can be grouped into two different families depending on the way they spread this light over their field of view (FOV). At the moment, most existing lidars scan the environment by moving complete parts of the sensor system (e.g. Velodyne HDL-32E) or a mirror inside of them (e.g. Valeo SCALA, Ibeo LUX, etc.) using a small electric motor [6]. Moving parts typically increase a sensor’s size and cost, while reducing its reliability and robustness for mobile application.

The other group is formed by 3D-flash-lidar or solid-state-lidar sensors, which use non-mechanical scanning elements, such as Optical-Phased Arrays (OPAs). They allow for precise beamforming and steering of the light in order to scan the FOV of the sensor, but also for pointing at or tracking specific targets. OPAs can use liquid crystals [7], MEMS technology [8] and silicon photonics waveguides [9]. This results in a smaller, cheaper and more rugged package. From a modeling perspective, the technical implementation chosen does have an influence on effects like motion blur and rolling shutter, as they are described e.g. in [10].

An example for a signal received by a single photodiode, as it would be available after analog-to-digital conversion, is illustrated in Fig. 2. The first parameter to be considered here is the sampling rate of the signal converter as it can lead to different signal shapes, depending on the ratio of the dominant frequencies in the received signal to the sampling rate. Besides, another solution to collect the echoes can simply be a comparator and a counter. With these, sampling is not needed, whereas intensity measurement is not possible. The counter or the sampling rate, depending on the method to measure the echo, influence the radial resolution of the sensor as resolution in time means resolution in range with the used time-of-flight principle. In the obtained signal in the time domain, peaks are identified by a threshold (here U th) that is applied to the signal. Mostly, the threshold is variable to react to occurring noise and depends on the Signal-to-Noise Ratio (SNR).

Fig. 2
figure 2

Shape of the received signal in the time domain, based on [5]

Additionally, Fig. 2 shows another signal parameter, which is the echo pulse width (here w A, w B, w C). It directly relates to the radial extent of the hit target, as reflections caused by the target reach the sensor throughout this time, as shown in Fig. 1. By summing up the sampled signal (here U(τ)) over the duration of the echo pulse width, the signal intensity for the particular detection can be obtained. In Fig. 2, the time axis starts at the transmission of the beam, so τ A, τ B, τ C represent the different times of flight for the three targets A, B, C. In consequence, on the second abscissa, r A, r B, r C are the corresponding ranges to the particular targets.

Finally, the number of echoes the sensor keeps has an influence on the provided raw scan, while a scan in this context refers to a completed collection in angular range. If the received signal for a single pulse contains three echoes as in Fig. 2 and the sensor keeps all of them in the provided scan data, no decision has to be made. However, if the scan contains fewer echoes per pulse than are received, there needs to be a rule on which echoes should be kept (e.g. the most intense or the nearest).

3 Overview of Lidar Sensor Systems

The following schematic describes the different processing steps in a generic lidar sensor system (Fig. 3), resulting from a functional decomposition as introduced in the context of safety validation for automated driving in [11]. Among the lidar sensor simulation workgroup within ENABLE-S3, different interfaces have been defined, which are referred to by the abbreviation IF n. A typical object detection and classification process is chosen to serve as the basis for functional decomposition. The interfaces and the functional blocks in between are further described in the following section.

Fig. 3
figure 3

Overview of the functional blocks of a lidar sensor system

A generic lidar sensor system is divided into the front-end and a data processing unit. The former handles emission and reception of the laser beams, as described in the previous section about the lidar sensor principle.

Depending on the type of sensor system, the output of the front-end, the so-called raw laser scans, may directly be provided as an output to the user. Therefore, the front-end outputs a raw scan of the environment as 3D-points in spherical coordinates at interface IF 1. This allows fusion of low-level sensor data with data of other environment sensors like radar detections and is currently subject to research.

It would be possible to simulate the raw signal in the time domain, as shown in Fig. 2. But, as this is not very common right now and mostly not available in current simulation tools, the lidar workgroup within the ENABLE-S3 project decided not to include this as a specific interface of the lidar sensor system to be simulated.

While computing the raw scans, intrinsic calibration of the calculated points is performed, taking into account the alignment of the diodes and their slightly different signal intensities, as well as the timing of the actual sending and receiving process. In addition, these data are used in order to distinguish between different kinds of objects. Nevertheless, they can also be used for further functionalities, such as simultaneous localization and mapping (SLAM), which is not investigated further in the scope of this project.

Lidar sensor systems for object detection, as inspected here, include further data processing steps for handling the raw scans. In most cases, at first, the raw scans captured at different moments in time by a single sensor are aligned with other sensors mounted on the same vehicle, meaning extrinsic calibration is performed. This addresses two basic issues:

Registering laser scans in time is necessary for lidar sensor systems that incorporate a motor unit and moving base. Registering laser scans in space is necessary for multi-sensor systems, which are typically used to achieve extensive coverage of the environment. Temporal alignment is also important here.

The data of the sensors are then transformed and fused into a single coordinate frame, which results in a point cloud at IF 2. Therefore, the intrinsic parameters of the lidar system (azimuth and elevation angle, …) are used in order to map the raw scans from a spherical coordinate system into a metric, locally referenced Cartesian frame. This includes the alignment due to the movement of the car in between different measurements during one scan.

Based on the (fused) point cloud, different reflected points from the same object in the environment are then grouped together in clusters, while others are discarded during segmentation. This is realized based on the proximity of the points in the point cloud, as well as the historical data computed during previous measurement intervals. If available, those prior detections are indeed good starting points for the search for clusters at the current time step.

This results in the addition of two feedback loops in the sensor architecture in Fig. 3. In order to extract the positions of the objects surrounding the ego-vehicle (e.g. vehicles, pedestrians, cyclists, etc.) from the point cloud, the reflections returned by the road surface are usually removed during a ground plane segmentation step [12].

Similar to the raw signal in the time domain, the interface that could possibly be chosen at this stage, probably by the name “cluster list”, is not considered by the working group to be defined as a simulation interface. It can serve as debug or inspection interface for function development, but is not considered to be served by actual simulation tools. Also, data processing steps often cannot be separated easily, due to their complexity and interweaving of and within further processing steps like tracking algorithms.

Tracking algorithms use state estimators in order to associate detections resulting from the same object over time. The combination of historical data and state estimation algorithms allows estimating quantities that are not directly measured by the lidar front-end, such as orientation, speed, acceleration or geometrical extents of a target [13, 14].

Finally, (tracked) clusters can be sorted into a finite set of possible classes (vehicle, bike, pedestrian, etc.), mostly done rule-based or by usage of machine learning [15]. Historical data can be helpful in order to improve the accuracy of the classification (e.g. the speed of a vehicle is generally higher than the speed of a pedestrian), but also the localization accuracy since different parts of a target object may be visible at different times. All described functional blocks result in the last interface defined here, namely the object list at IF 3 .

4 Descriptions of Lidar Sensor Interfaces

The previously mentioned functional blocks, as visible in Fig. 3, are not described in more detail here, as they are well described in literature. This section now concentrates on the agreed interfaces, as already mentioned in the previous section and listed in the following Table 1. The exemplary data and header fields within the interface descriptions in the following tables are aligned with [1] and are seen as basic contents while not being limited to the listed fields. In the case of raw scans (IF 1), there are separate scans for each sensor in the overall simulated setup. Point clouds (IF 2) and object lists (IF 3), instead, can be derived by fusing multiple sensors.

Table 1 Lidar sensor interfaces

4.1 Interface 1 (IF 1, Raw Scan)

The raw scans of a lidar sensor as defined here are usually described by vectors of measured ranges and intensities from measurements of a single lidar sensor. Each entry represents the processed signal for one specific scanning angle. This means the data results from a uniform sampling in a spherical coordinate space. The following definition of the raw scan interface is closely related to the message definition for single line lidars in the Robot Operating System (ROS) [16].

The general data packets consist of a header and the following data. The header of the raw scan interface contains the sensor ID, which identifies the respective sensor that the data is perceived by, as well as its mounting position and orientation relative to the vehicle’s coordinate frame, taken from ISO 8855:2011 [17]. This right-hand coordinate frame is centered in the middle of the rear axle of the vehicle. Its X-axis is aligned with the wheelbase, whereas its Y-axis is aligned with the rear axle and its Z-axis is determined with the right-hand rule. These extrinsic calibration parameters enable subsequent processing and interpretation of the data.

To deal with potential misinterpretation of scan data caused by the minimum or maximum ranges that the scanner can detect, these parameters can also be included in the header of the raw scan, as can be seen in Table 2.

Table 2 Interface 1—raw scan header
Table 3 Interface 1—raw scan data

The actual measurements are encoded in a data field in the raw scan, which contains consecutive blocks for the ranges followed by the intensities measured, which are sometimes replaced by the formerly described echo pulse width, as listed in Table 3. The related scanning angles can be assigned by knowing the order of the transmission of beams and saving the received reflections into the list. If the specific sensor is able to detect multiple echoes for one single transmitted pulse, the number of entries can simply be multiplied by that number of possible echoes.

4.2 Interface 2 (IF 2, Point Cloud)

Point clouds are very plausible, intuitive, but generic data structures consisting of a list of 3-dimensional vectors (points) [18]. Different definitions for point clouds and the interpretation of points of a cloud exist [18,19,20]. To save the points, Cartesian or angular coordinate systems can be chosen. Intensity values or echo pulse widths and laser scanner IDs can be added to the points’ coordinates. To compute the point cloud, multiple lidar sensors can be fused together within the sensor system. Thus, it is important to have a semantically fixed interface. The lidar workgroup’s definition is based on lidar interfaces from Ibeo Lux [21], Velodyne HDL-32E [22] and the interface definition for point cloud messages from ROS [19].

The general structure of the point cloud starts with a header that contains general information about the point cloud and the specific scan, as shown in Table 4. If only one sensor is used for point cloud determination, the header contains the extrinsic calibration of the laser scanner towards the vehicle’s base coordinate system (Fig. 4) combined with the sensor identification number, similar to the previously defined raw scan. In any case, the header contains information about the extrinsic calibration of the point cloud towards the vehicle’s base coordinate system (Fig. 4) and the amount of point data that follows the header.

Table 4 Interface 2—point cloud header
Fig. 4
figure 4

Object list entries: geometrical and bounding box properties

A point itself is described by three-dimensional coordinates, mostly Cartesian, in the vehicle reference frame described in the header and contains additional intensity and time information. Thus, points are five-dimensional vectors (see Table 5). The time information is needed, since in a single scan of a laser scan the different points may be related to different hit points in time. This information becomes more important with increasing angular velocity of the targets and the ego-vehicle.

Table 5 Interface 2—point cloud data

4.3 Interface 3 (IF 3, Object List)

The points within a point cloud represent a measurement. Some sensor systems provide additional processing steps and outputs, e.g. for object detection, as described here. In order to find related structures and objects within these points, segmentation and clustering methods are applied [18, 23]. Therefore, assumptions about the environment (similar range or intensity) are made. The clusters generated in this way can then serve as object candidates. If an object is tracked over time, it will keep its ID in consecutive tracked object lists. The age of such an object, i.e. the number of measurements realized since its creation, is stored in the age property. The probability of existence gives the user information about the quality of the measurements and the tracking. It is usually computed directly inside the state estimation algorithm. Further output might be covariance measurements of the state estimation algorithm.

Typical state estimation algorithms like movement models based on constant velocity, if used during object tracking, allow estimating quantities that are not directly measured by the lidar sensor, such as the velocity and acceleration, but also the geometric extents of the objects. Both are provided in the sensor coordinate frame. Finally, state estimators usually provide a quality indicator per tracked state, which is stored in the corresponding measurement error fields.

An object list is divided into a header and the list of objects. If only one sensor is used for object list determination, the header contains the extrinsic calibration of the laser scanner towards the vehicle’s base coordinate system (Fig. 4), combined with the sensor identification number, similar to the previously defined interfaces. If more than one sensor is used, these fields are obsolete. In all cases, the header contains the timestamp associated with the current measurement and the number of objects in the actual list at that time (Table 6).

Table 6 Interface 3—object list header

Each detected object corresponds to an entry in the list and is defined by a set of properties. The structure of the proposed object list entries (Table 7) is inspired by the work proposed in the Open Simulation Interface [1] and object list descriptions used in the industry [24]. The ID of the object describes its position in the list. The object’s class can be computed using classification algorithms, rule-based or by machine learning, on the cluster associated with the considered detection. The intensity of an object must be calculated using the corresponding points’ intensities of the point cloud, e.g. by their mean value.

The geometrical information of the detection is provided in the coordinate frame of the vehicle that needs to be given. They correspond conventionally to the center of the bounding box, which is described by its length, width and height. The bounding box can subsequently be rotated around its center. Finally, the intensity of light associated with the object can be added to the list. The principal properties associated with each object in the list are summarized in Fig. 4.

The bounding box is directly influenced by the selected tracking method. If it is computed using the actual size of the point cloud, it grows with lower distance and decreases with higher distance of objects. By using historical information, the size of the bounding box can e.g. be kept, once the point cloud is shrinking again.

Table 7 Interface 3—object list data

5 Methods for Sensor Simulation

There are several methods to generate synthetic sensor data for the different interfaces. For raw scan and point cloud generation, ray tracing or ray casting are widely used in commercially available simulation tools. In addition, especially in the context of vehicle simulation, Open Source frameworks, such as GAZEBO, or 3D game engines, such as UNITY, can be used for lidar simulation. Ray tracing and ray casting are very efficient in computing multi-path propagation and reflections due to their ability to parallelize workloads.

Nevertheless, methods like beam tracing or Z-buffer are possible to use, as well. When it comes to beam divergence, other methods than ray tracing are possibly even more favorable. Beam tracing was invented for the simulation of diffraction. Z-buffer is very efficient if multi-path propagation or echoes are not of interest.

Object list simulation has the objective to accurately reproduce existence, state and class uncertainties of moving objects. In most cases, they are simulated directly via probabilistic models with data-driven stochastic for the three listed uncertainties and their subsequent object states and parameters. Besides, it is also possible to simulate data on prior interfaces like point clouds and apply the same algorithms as in the real sensor system, if available, until object lists are obtained or to use own algorithms and methods to derive them accordingly.

The authors propose the following strategy for modeling the functional blocks in order to serve the necessary interfaces in simulation:

At first, the requirements for the generation of synthetic lidar sensor data need to be defined. Therefore, physical effects that should be simulated need to be selected, like signal degradation, beam divergence etc. The selection must be derived from the intended usage of the synthetic data, which can be function development of further data processing steps, testing of such processing, or to serve as input for safety validation of a complete automated driving system.

Secondly, with the requirements at hand, the previously mentioned simulation approaches like ray casting, ray tracing or Z-buffer need to be evaluated and the best suitable with respect to the effects to simulate and the computational effort should be selected.

Since the simulated environment serves as a resource for the sensor simulation to gain information about the objects that get hit by the active sensor, requirements should be derived for them, as well. Of course, technical limitations come into play if a certain degree of fidelity is needed. As an example, in the case of fine structures like fences or grass, approximations and merging have to be used.

6 Conclusion and Outlook

The length of this chapter does not allow describing functional blocks, interfaces, or simulation methods in more detail. Still, the functional comprehension and the commonly agreed definitions of interfaces described here can be seen as a major achievement and milestone within the ENABLE-S3 project. Furthermore, the work performed by the different partners regarding lidar system simulation covers different modeling methods and interfaces, as they have been briefly described here.

A special task for all partners is the validation of the different models. One approach could be the validation by evaluating the effect of injecting partly augmented sensor data into black-box ECUs by monitoring the automated driving function’s response to the stimulation, see [25]. There, the statement’s validity originates from simulation and from independent examination of the black-box system. This includes the formulation of requirements and collection of parametrization and validation data. Besides, reference data needs to be collected during the measurements to be able to re-simulate the real driving maneuvers in a virtual environment that also needs requirements.

Another and more common approach for validation is the replay-to-sim method, where the real measurements are performed first while collecting reference data like GPS trajectories of all moving objects in high accuracy. Afterwards, these reference data together with the static scenery that needs to be modeled serve as input for the simulation. In the end, several metrics for the comparison of real and synthetic data are applied. If the sample experiments are chosen in a way that the overall parameter space of the model is covered properly, e.g. by applying a sensitivity analysis beforehand, the sample validities from single experiments gain overall trust in the model [26].

Therefore, to obtain valid sensor models, with the described interfaces at hand, the next logical steps are to find metrics for benchmarking of sensor models on different interfaces and for example to compute metrics on data of the same origin interface and further processing for correlation of metrics applied on those subsequent interfaces. In a first trial within ENABLE-S3, benchmarking of the implemented different approaches has been fulfilled on point cloud level and these results will be included in scientific publications and project reports.