Having a look to the restaurant options in Terrassa
Foto de Helena Lopes en Pexels

Having a look to the restaurant options in Terrassa

After some months working on data science and data analysis, this is my first publication. It is at the same time, my capstone project for the full course. I would preferred that it contained much more information but like that I have still plenty of room for improvement!

Starting with the real words. Let me please show you where is Terrassa and its neighborhoods:

No alt text provided for this image

Now that you know the city, let's talk about the project itself:

1. Introduction:

This project aims to help while opening a new restaurant in Terrassa, the city where I live.

It will do so by two means:

  • Firstly, showing the distribution of restaurants in Terrassa
  • Also, showing the common restaurant categories by neighborhood

This two informations can be useful to minimize concurrence of other restaurants geographically and to choose the cuisine of the restaurant; would you rather sign into a new and ever more popular trend or open a traditional restaurant in a neighborhood without concurrence?

2. Data required

To achieve the goals described in the introduction, some datasets are required, mainly:

  • Geographic data of Terrassa i.e its location, its neighborhoods and boroughs
  • Latitude and longitude of the restaurants of Terrassa
  • The category of the restaurants, i.e Italian food, sushi place, etc.

Folium will be used to generate maps. Nominatim will be used to fetch the location of Terrassa in the map and Foursquare will yield the venues in the desired neighborhoods.

The venues have to be filtered to the ones containing "Restaurant” in their names.

Kmeans will cluster the neighborhoods following their most common restaurants.

3. Methodology

Once all the required data is available in a dataframe format the analysis can be done. The methodology is focused on clustering the different neighborhoods of Terrassa. The first step has been to place every neighborhood on the map.

After that, I looked for all the venues available in the city, filter them by containing the word “restaurant” and represent them in a new map, coding their kind of cuisine by color.

Also, I wanted to cross information with the wealth of each neighborhood based on the average value/square meter (sqm) of the flats to sell/rent but I could not get access to the required API (I should have tried the analysis straightforward with NY city but I run out of Watson Studio hours by then).

The next step is the classification of neighborhoods with Kmeans using their available “cuisines” and their number of restaurants. This clustering can offer different views to the interested person and give advice of where to place and which kind of restaurant could be profitable in each case.

This project is rather visual, relying almost completely on maps. Although it is simple, is very comprehensible in my opinion.

4. Results

The analysis show that there are 93 restaurants available in Terrassa. From those, there are 15 unique categories. The restaurants are displayed on a map, with defined colored labels showing the restaurant category (i.e. Mediterranean food in green, Italian food in blue etc).

Further analysis showed which ones are the 3 most common venues in each neighborhood. With this dataframe I used Kmeans to cluster the different neighborhoods. These results help the decision maker to choose where to open a new restaurant and which kind of cuisine should it be. Basic questions that this model could answer are:

  1. Is the neighborhood overcrowded with restaurants? Is the area missing restaurants?
  2. Which are the most common restaurants in the neighborhood? Do I want to bring a new trend, or sum up to the existing one?
  3. Is a specific cuisine missing in a neighborhood from a specific cluster?

To visualize how the restaurants are distributed in Terrassa a picture is worth more than a thousand words:

No alt text provided for this image

You can guess where is the city center.

5. Discussion

The first two questions are quite obvious. In my opinion a good insight can be taken from the 3rd question.

3. Is a specific cuisine missing in a neighborhood from a specific cluster?

In this case, in specific clusters containing many neighborhoods, it is normal that the neighborhoods share the most famous restaurants among them in the cluster. However, there is a high probability that some neighborhoods share only 1 or 2 of the most common restaurants. Therefore, a good opportunity can be to open a new restaurant with another cuisine common in the same cluster but not popular in an specific neighborhood.

These are the clusters in Terrassa:

No alt text provided for this image

Focusing the results, It will not surprise anybody that one of the most common kinds of restaurants are the Mediterranean and Spanish cuisine. However, I was surprised that many neighborhoods got between the 3 first options vegetarian or vegan options which are certainly a new trend in the city.

Cluster 1, frequent in the outskirts of the city

No alt text provided for this image

Cluster 2

No alt text provided for this image

Cluster 3

No alt text provided for this image

Cluster 4

No alt text provided for this image

Cluster 5, including city center and the closest neighborhoods to it

No alt text provided for this image


I also would like to mention some flaws I realized while coding this notebook.

Firstly, as Terrassa is a smaller city compared to NY or Toronto, the size of data is quite smaller and the analysis is accordingly less robust. t might be that some neighborhoods do not have enough data online. Accordingly, the results may be altered.

In addition, the “venue category” field in Foursquare might be tricky since I cannot know how are all the restaurants classified in the data base. In the notebook I filter all the venues using the “restaurant” string. However some entries are named “pizza place” which could be also included in the analysis but in this case are ignored.

Last but not least, the “luxury/quality” of the restaurant is also ignored in this analysis. Comparing the price of the sqm between neighborhoods can give insight about how expensive the area is and therefore how expensive the resturants can be. The quality, which is independent of how expensive is the restaurant, is in my opinion the harder parameter to quantify.

6. Conclusion

The notebook can help decision makers visualizing the different restaurant options of Terrassa checking their place and style. It offers insight to decide to jump into a positive trendline of new restaurants in the area, or if preferred, opening a traditional restaurant in a place without concurrence.

I would like to continue improving this notebook solving the problems mentioned in the discussion in order to achieve better accuracy and more representative results.


To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics