Clustering road sections with respective to the correlations of traffic flows

Shanshan Wang
shanshan.wang@uni-due.de
Dec.31, 2021

Table of Contents

1 Introduction

Road sections in a traffic network manifest strong correlation of traffic flows, especially during the period of rush hours. Due to the correlations, classifying road sections into groups benifit the identification of traffic behavior of road sections based on the behavior of one or several road sections in the same group. The collective behavior of road sections in a group is of importance in implimenting traffic planning and managenment. Therefore, this report aims at clustering road sections with respect to the correlations of traffic flows. To this end, we work out the correlation matrix with the open dataset from Ministry of Transport of the State of North Rhine-Westphalia (NRW), Germany. After dimensional reduction of the correlation matrix, four clustering methods, including $k$-means clustering, hierarchical clustering, DBSCAN clustering and mean shift clustering, are applied to our data set and their clustering results are compared in terms of the silhouette values, which measures the cluster cohesion and separation. We summize our results and give the suggestions for next steps around this study.

2 Data

2.1 Description of datasets

This report uses the open dataset from Ministry of Transport of the State of North Rhine-Westphalia (NRW), Germany, with the Data license Germany attribution 2.0. It lists the annual results of traffic census in NRW during 2017. The attributes of the raw dataset are listed in Table 2, which contains the attributes' abbrivations, full names, units and data types. After data cleaning, the used data attributes are listed Table 3. The data is aggregated by counting station name (ZST_NAME) and street class and number (STRASSE), respectively, resulting a data matrix df3 and df4, which will be used for analysis.

Table 1: the annual results of traffic census in NRW during 2017

Table 2: a summary of the dataset's attributes

2.2 Data cleaning

The data cleaning for the used dataset is carried out by converting the data types of some variables, i.e., attributes, grouping with counting station name (ZST_NAME) and street class and number (STRASSE), respectively, and checking and removing the missing values.

Table 3: a summary of used data attributes

Table 4: data of road sections

Table 5: data of motorways

Now data frames df3 for the data of road sections and df4 for the data of motorways are clearned for using. In the following, we will focus on the data matrix df3 with 18 attributes as columns and 323 counting stations, representing 323 road sections, as rows.

2.3 Feature engineering

Feature engineering is performed by visualizing the data matrix, displaying the relationship of six vechicle types, examing the skew values and logarithmically tranforming the highly skewed variables.