Coding as Literacy

Creating Value Through Machine Learning and Publicly Available Data Streams


Vahid Moosavi


16th February 2017


Outline

About Me

Why Data Driven Modeling and what is coding as literacy>

Some Experimental Applications


About Me

I am a systems engineer with interest in "Scientific Modeling" as a generic process.


I see Modeling as "Alchemy", where models are ideally gold, made out of other lower value elements


Theoretically interested in the idea of "Representation and Idealization" in scientific modeling

My PhD thesis: Pre-Specific Modeling


How I position myself in academia?

And of course, this was the original vision of General Systems Theory back in 1950s (Unity Through Diversity)


Why Data Driven Modeling and what is Coding as Literacy?


Data Driven Modeling causes an inversion from "knowing the answers" to "finding good questions". Then, with data and Machine Learning we can answer any question, but the importance is the question itslef.

Coding as literacy means a new way of communication and looking at the world

Applications


I have tried many applications domains

  • Supply Chains and manufacturing systems
  • Transportation Dynamics
  • Air Pollution Modeling
  • Water Flow Modeling
  • Real Estate Market
  • Urban Design
  • Economic Networks
  • Natural Language Modeling

In terms of techniques

  • In general any ML algorithm, thanks to Scikit-learn and recently Tensorflow
  • Mostly focused on Self Organizing Maps and Markov Chains
  • Web Applications: Flask, D3, Leaflet
  • Scraping (BS4,LXML) and APIs in general for data collection

Today I will present 3 applications

1-Personalized News Papers ( Natural Language Processing)

2- Real Estate Market as a Media Business

3- Urban Morphology Meets Big Data and Deep Learning


1- Personalized News Paper

The Basic Idea

  • Issue of Privacy in Social Media and Centralized Servers
  • Toward a network of Private Servers
  • Or how to invert evrything upside down

Main Requirements

  • A dedicated private server (we started with raspberry pi! totally 80CHF and now MiniPCs 100CHF from Alibaba!)
  • Continuous Collection of News through WebCrawling and APIs (e.g. around 10K per night, BE A Super User FOR TWITTER!)
  • Applying ML (Unsupervised) to cluster the news (mainly text)
  • Interactive ML (RL or Supervised) to learn what user dislikes! (Twitter sees you are following many, but privately you unfollow them)
  • Or how to invert evrything upside down

Main Techniques and Frameworks


Live Demo (localhost)

2- Real Estate Market as a Media Business

The Basic Idea

  • Initially: To see how it is easy to predict real estate property values, BUT THE WERE NO DATA AVAILABLE!
  • So, we started crawling publically available data in Switzerland and Germany + open geodata
  • Prediction was easily possible with ML (94% accuracy (ARE) for rental price estimations in Switzerland).
  • This took us more into this application

Main Elements

  • Continuous Collection of online ads through web crawling
  • Geo-Coding: Google API
  • Collecting any other Open Source Data: OSM, Geoadmin.ch
  • Applying ML (Unsupervised) for filling the empty fields
  • Automated Evaluation Model on a server
  • Interactive web application

Main Techniques and Frameworks

  • Self Organizing Maps for multidimensional probablistic models
  • Ensemble models such as Random Forests (Scikit-learn)
  • Flask as web framework
  • Leaflet for mapping application
  • Mapbox Layers
  • D3 for interactive visualizations

  • Online Version


3- Urban Morphology Meets Big Data and Deep Learning

The Basic Idea

  • Urban Morphology is the study of urban forms and patterns
  • But currently limitted to theoretical and abstract models, which are based on limitted observations
  • In terms of ML, urban planners work with "A-priori" Rules

  • What if we collect the data for thousands of cities and use ML to study them in a more data-driven way, then we can answer questions such as:
  • What are the clusters of emergent urban forms at the global scale?
  • What are the charachterisitics of each cluster?
  • Or predict quantitative features of cities (e.g. road pollution) by looking at their urban forms: streets, buildings, satellite images

Main Elements

  • Collecting images of street networks from OSM via styled maps from Mapbox
  • It is also possible to get the geometric information of road networks and buildings. There around 150M digitial buildings in OSM format
In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('QFF5IezOdaU',width=700, height=600)
Out[1]:

Main Elements

  • How to compare city maps?
  • To learn a dense representation of each city

Main Techniques and Frameworks

  • (Unsupervised) Convolutional Auto-Encoders to learn the dense representations of each city


Som initial Results

  • Using KNN on the learned dense vectors for each city

Online Version

  • Further, Self Organizing Maps for dimensionality reduction and visualization
In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('j0mrOhPyhRI',width=700, height=600)
Out[2]:

Conclusions and Questions to Discuss

Plug and Play Machine Learning

What happens to the classical notion of "Domain Expertise"?

Big Data and Machine Learning are causing an inversion from "knowing the answers" to "finding good questions". Then, with data and Machine Learning we can answer any question, but the importance is the question itslef.


In [ ]: