Building a Data Pipeline with the Stad Gent Parking API, Python, PostgreSQL, and Metabase

In this tutorial, I’ll walk you through how I used the Stad Gent Open Data API to fetch parking garage data with Python, store it in a PostgreSQL database, and visualize it using Metabase.

The system is designed to run on Hetzner Cloud servers with Ubuntu, and includes automated data aggregation and cleanup to manage database storage efficiently. My background is in business intelligence, so building data pipelines and dashboards is what I do for a living, but I wanted to use a couple of tools that I don’t often encounter at my current clients to keep up my skills in a fast evolving market. I have left out the large pieces of python and SQL code to improve readability, if you want the full technical specs you can find them here in the Medium article I wrote about this hobby project.

Overview of the System

  1. Data Collection: Python scripts fetch parking data from the Stad Gent API every 5 minutes and store it in a PostgreSQL database.
  2. Data Storage: PostgreSQL tables are created to store raw and aggregated data.
  3. Data Aggregation: A script aggregates data hourly and deletes detailed data older than 2 days.
  4. Visualization: Metabase is installed on a separate server, configured to run over HTTPS, and used to create dashboards for real-time and historical parking data.

Step 1: Setting Up the Hetzner Cloud Servers

Create Two Servers:

  • Server 1: For PostgreSQL and Python scripts.
  • Server 2: For Metabase.
  • Use Ubuntu 22.04 LTS for both servers

I use the smallest servers for this of type CX22. They have 2 CPU’s, 4GB RAM and and 40GB SSD for 3,29€ per month. I also take regular snapshots of my servers in Hetzner to make sure I have a recent version to restore when something goes wrong - which it did trying to install HTTPS on the Metabase server.

Why two servers, and not just one?

Using two servers instead of one is a design choice that offers several advantages, particularly in terms of scalability, security, and performance. However, it’s not strictly necessary, and you can absolutely run everything on a single server if your requirements are modest. Let’s explore the reasoning behind using two servers and when it might make sense to consolidate everything onto one.

The primary reason for using two servers is the separation of concerns. By running PostgreSQL and the Python scripts on one server, you keep the data processing and storage layer isolated from the visualization layer, which is handled by Metabase on the second server. This separation makes it easier to manage resources, optimize performance, and troubleshoot issues. For example, if Metabase is under heavy load due to multiple users accessing dashboards, it won’t impact the performance of the database or the Python scripts fetching and processing data.

Performance optimization is another key factor. Both PostgreSQL and Metabase can be resource-intensive, especially under load. Running them on separate servers avoids contention for CPU, memory, and disk I/O. This ensures that the database can handle data ingestion and queries efficiently, while Metabase can serve dashboards without slowing down the system.

Security is also a significant consideration. By isolating Metabase on a separate server, you reduce the attack surface. If Metabase is exposed to the internet for dashboard access, keeping it isolated from the database server minimizes the risk of unauthorized access to your data. You can configure stricter firewall rules for the database server, allowing access only from the Metabase server and the Python script server.

Scalability is another advantage of using two servers. If your system grows—for example, if you add more data sources, users, or complex queries—having separate servers makes it easier to scale horizontally. You can add more resources to the database server if storage or query performance becomes a bottleneck, or scale Metabase independently if more users start accessing the dashboards.

Finally, maintenance and updates are simpler with a two-server setup. Updating or restarting one service, such as Metabase, won’t affect the other, such as PostgreSQL. This is particularly useful during maintenance or troubleshooting, as you can address issues on one server without disrupting the entire system.

That said, there are scenarios where using a single server makes sense. If your workload is light, with small amounts of data and few users, a single server can handle everything without issues. Cost is another factor; running two servers doubles your infrastructure expenses, so if you’re on a tight budget, consolidating everything onto one server can save money. Additionally, managing a single server is simpler than managing two, as you don’t need to worry about networking, firewalls, or inter-server communication.

If you decide to use a single server, you can adapt the setup by installing all services—PostgreSQL, Python, and Metabase—on the same Ubuntu server. Adjust firewall rules to allow access to Metabase from the internet while restricting PostgreSQL to localhost or specific IPs. Monitor resource usage closely to ensure the server can handle the combined workload. You can also use a reverse proxy like Nginx to handle HTTPS termination and route traffic to Metabase, simplifying the setup.

In conclusion, using two servers provides better performance, security, and scalability, but it comes at the cost of increased complexity and higher infrastructure expenses. If your workload is light or you’re just starting out, a single server is a perfectly valid and cost-effective choice. As your system grows, you can always migrate to a two-server setup later.

Step 2: Install and Configure PostgreSQL

1. Install PostgreSQL

2. Create a Database and User

3. Install pgAdmin4

4. Create Tables

You can create tables from from the command line as well, but we installed pgAdmin4 to manage our databases more conveniently.

Use pgAdmin4 to connect to your PostgreSQL database and create the tables needed to store the data coming from the API.

Step 3: Writing the python scripts

1. Fetch Data from Stad Gent API

Install Required Python Libraries on the database server

pip install requests psycopg2-binary

2. Python Script to Fetch and Store Data

I added extensive logging to the script to identify some of the fields that weren’t being fetched correctly. I decided to put the fields I could not easily get to work in comment, since I didn’t really need them in the context of the dashboard I wanted to build.

The data in the api is refreshed every 5 minutes. To make sure I get the latest updates I use a cron job to run this script every 5 minutes.

Room for improvement

One improvement would be to figure out why some of the fields are not loading correcty. Another one has to do with the way I schedule this job to run. Writing this tutorial I found out I might have also used the python library Schedule and run the script as a service. This is certainly something I will be looking at next time I build a hobby project like this.

3. Aggregation Script

To make it possible to analyse trends in the past, without having to keep all the data that is created every five minutes, I aggregated the data per hour. I added a trend as well: are people arriving or are people leaving?

4. The third python script I created is the delete script

To make sure the server doesn't fill up too quickly, I only keep the full detail of the data of the last 48 hours.

Step 4: Schedule the scripts using cron jobs

I use cron to run this scripts every five minutes, every hour and every day.

Step 5: Configure the Postgres database to allow connections from the Metabase server

Add the IP-address 11.22.33.44/32 of the Metabase server to the pg_hba.conf file on the database server to allow for access to the data. Restart the posgres service for this change to take effect.

Step 6: Install and Configure Metabase

1. Install Metabase

2. Configure HTTPS

3. Convert the certificate to a Java KeyStore (JKS)

4. Run Metabase as a Service

Create a systemd service file in this location using the nano text editor:sudo nano /etc/systemd/system/metabase.service

After modifying the service file, you need to reload the systemd manager configuration to apply the changes. Then you can start and enable the service.

Step 7: Build Dashboards in Metabase

1. Connect Metabase to PostgreSQL:

2. Set the url to HTTPS:

3. Build the objects and visualizations you want to use in your dashboard:

For example the hourly trend:

Metabase allows you to put data on a map if you have the coordinates in your data, which we have:

Conclusion

Of course I skipped over some of the details here and there, , but this should give you a solid understanding of what it takes to build your own Metabase dashboard connected to an open dataset API. Now, I have a fully automated system that fetches, stores, aggregates, and visualizes parking data from Stad Gent. So, when my friends ask me where to park their car before we head out, I simply send them this linkto my dashboard, and we’re good to go! I hope this inspires you to explore your own data—happy building!