Ranking and Clustering Cities in North Rhine-Westphalia, Germany

-- A Project for Applied Data Science Capstone by IBM/Coursera

Shanshan Wang
shanshan.wang@uni-due.de
Feb. 9, 2021

Table of Contents

1 Introduction

A strategic city planning is of benefit to a state government to improve citizens' economic and living levels. To this end, a better understanding to the cities in a state is of importance. In this project we will cluster and evaluate the cities in North Rhine-Westphalia, Germany based on the fields in working, education, living facilities, transportation, health care and leisure places. By this way, we can find out the top cities in each field as well as the bottom cities that to be improved in the corresponding field. Moreover, we can reveal the correlations among cities based on the above-mentioned fields and figure out how a city's change impacts on the correlated cities. The correlations among cities will facilitate the development of multiple cities synchronously and therefore is useful to be applied to a city planning.

To simplify the issue, we focus on the frequency of appearance of categorized venues as an index to estimate the level of the development in each city field. By $k$-means clustering and hierarchical clustering, we classify the cities in five clusters. For each cluster, a correlation pattern among different city fields is disclosed. To give a recommendation for traveling or a suggestion for city planning, we rank the five top and bottom cities in each field.

This project report is organized as follows. In section 2, we describe the dataset we used and the processing for dealing with the raw data. In section 3, we work out the frequency values of venues in each category and classify the cities by the $k$-means clustering and the hierarchical clustering. In section 4, we analyze and discuss the characteristics of city clusters and select out the best and the worst cities in each field. We finally conclude our results in section 5.

2 Datasets

The project uses two datasets. One is from Wikipedia, where we downloaded a table which lists the ranks of population ranks, names, populations in 2017, areas in square kilometer and populations in per square kilometer of the ten largest cities in North Rhine-Westphalia (NRW). The information of the city names are then used to find their locations.

The other dataset is from Foursquare company. With a given search query, i.e., a key word, we search the relevant venues around the central locations with a radius of 100000 kilometer. These central locations is set as the locations of the largest ten cities in NRW. In this way, the searched venues almost come from the whole state. The location data from Foursquare company includes the information of location names, categories, addresses, latitudes, longitudes, distances, postal codes, city's names, state's names, countries and so on. We considered multiple search queries, i.e., Company, GmbH, Factory, Fabrik, Office, Restaurant, Supermarket, Shop, University, Universität, College, School, Hospital, Residence, Haus, Park, Transport, and added their information as a column of that table.

Totally we downloaded 8321 data points for categorized venues from Foursquare company, where the 6144 data points are located in NRW. They are visualized on the maps by categories. We split all search queries into six main categories. They are named as working from the search queries Company, GmbH, Factory, Fabrik and Office, education from University, Universität, College and School, living facilities from Restaurant, Supermarket, Shop, Residence and Haus, health care from Hospital, transportation from Transport, and leisure places from Park. In the following, we will use these 6144 data points for our calculation.

Import necessary Libraries

Download the top ten largest cities in North Rhine-Westphalia as central locations

Define Foursquare Credentials and Version

Define a function for loading data from Foursquare with the central cities with a radius

Search and load the building data based on the given key words

Define information of interest and filter dataframe

Select the rows with the locations in state Nordrhein-Westfalen

Create map of Nordrhein-Westfalen (NRW) using latitude and longitude values and add markers to map

Company(GmbH), Factory (Fabrik), Office