Modeling With Big Data and Machine Learning


vahid moosavi


17th March 2017


Outline

What is Data Driven Modeling using Machine Learning?

Some Experimental Applications


In [9]:
import warnings
warnings.filterwarnings("ignore")
import datetime

import pandas as pd
# import pandas.io.data
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
import sys
import sompylib.sompy as SOM# from pandas import Series, DataFrame
pd.__version__
%matplotlib inline

A bit of my background

Landscape of computational urban modeling from the vision of General Systems Theory


Life and death of computational (Urban) modeling concepts

Data From Google Ngram Service

What is happening?

A major shift:

Knowing Vs. Learning (Theory Driven Vs. Data Driven) Beyond domain expertise

The primary role of Data in Data Driven Modeling

Machine Learning and Data Driven Modeling together have turned the classical notion of expertise as

“Having the Answers to the Known Questions”

to

“Learning to Ask Good Questions”

This means soon we expect that all the classical application domains will be "revisited" or "redefined"!


My personal research agenda:

How to get literate in these new fields and new ways of looking at the world?

How to dive into different application domains (with the classical vision of systems)?


However, these new fields are also rapidly evolving!

Then what is Machine Learning and Data Driven Modeling?


In Terms of Daily Use of ML: We need to be able to speak in this new "LANGUAGE"!


But from an abstract point of view:

Main elements of data driven modeling (in my opinion) are about:

1- Main approaches to represent the "data" and objects of the study

2- The architecture of the "models" and their storage capacity in dealing with different levels of complexity


1- How to represent the objects via data?


1 - Representation of objects based on a priori given features

Main priors:

  • Independece of the objects
  • Notion of abstract object

Examples:

  • representing a house based on its features
  • A student based on its grade
  • A country based on its GDP


2- Representation of objects based on other concrete objects: Relational Representation

Main priors:

  • Dependence of the objects to each other

Examples:

  • A word in a text
  • A pixel of an image
  • A chemical element in a molecule
  • A specific ingredient in a food receipie
  • A building in its neighborhood
  • A Person in a social network

The second approach is essentially Data Driven!

Main reference:

  • Markov 1906
    #

2- Architecture of different modeling approaches and their Storage Capacity

1- Classical regression (Least Square Method started from 1800)

In [10]:
N = 500
x1= np.random.normal(loc=17,scale=5,size=N)[:,np.newaxis]
x2= np.random.normal(loc=0,scale=5,size=N)[:,np.newaxis]
y = 3*x1 + np.random.normal(loc=.0, scale=.4, size=N)[:,np.newaxis]
x1 = np.sort(x1)
# x1 = np.random.uniform(size=N)[:,np.newaxis]
# y = np.sin(2*np.pi*x1**3)**3 + .1*np.random.randn(*x1.shape)



# x1= np.random.normal(loc=17,scale=5,size=N)[:,np.newaxis]
x1 = np.random.rand(N) * 10 - 5
x1 = np.sort(x1)
x1 = x1[:,np.newaxis]
noise = 0.1 

# y =-.1*x1**3 + 2*x1*x1 + 2*np.sqrt(x1)+ 10*np.random.normal(loc=30.0, scale=4.7, size=len(x1))[:,np.newaxis]

def f(x):
    x = x.ravel()
    return np.exp(-x ** 2) + 1. * np.exp(-(x - 1) ** 2)

y = f(x1) + np.random.normal(0.0, noise, N)
y = y[:,np.newaxis]


# x1= np.random.normal(loc=17,scale=5,size=N)
# x1 = np.sort(x1)
# # x1 = x1[:,np.newaxis]

# def f(x):
#     x = x.ravel()
#     return -.1*x1**3 + 2*x1*x1
# y =f(x1) + 10*np.random.normal(loc=30.0, scale=4.7, size=len(x1))
# y = y[:,np.newaxis]
# print x1.shape, y.shape

def polynomial_regr(degree=1):
    from sklearn.preprocessing import PolynomialFeatures
    from sklearn import linear_model
    
    
    
    X_tr = x1[:].astype(float)
    y_tr = y[:].astype(float)
 
    
    poly = PolynomialFeatures(degree=degree)
    X_tr_ = poly.fit_transform(X_tr)
#     X_ts_ = poly.fit_transform(X_ts)
    
    regr = linear_model.LinearRegression()
    regr.fit(X_tr_, y_tr)
    
    
    y_pred_tr = regr.predict(X_tr_)[:]
#     y_pred_ts = regr.predict(X_ts_)[:]
    # Predicting the training data
    plt.plot(X_tr,y_tr,'.b',markersize=6,alpha=.4 );
    plt.plot(X_tr,y_pred_tr,'-r',markersize=10,alpha=1 );
In [11]:
from ipywidgets import interact, HTML, FloatSlider
interact(polynomial_regr,degree=(1,160,1));

This is where we are looking for "one optimum descriptive function"

I show this with one abstract point

In [12]:
# plt.xlim(2,2)
# plt.ylim(2,2)
for i in range(150):
#     x = 4*np.random.rand()
    x = 2 + np.random.randn()*.04
    y = 2 + np.random.randn()*.04
#     y = 4*np.random.rand()
    plt.plot(x,y,'ow',markersize=16,alpha=.1);
    plt.plot(x,y,'or',markersize=6,alpha=.05);

plt.plot(2,2,'ow',markersize=16);
plt.plot(2,2,'or',markersize=6);
# plt.axis('equal');
plt.axis('off');
plt.title("space of potential functions and the optimum function");

This is a "central memory" and a "global prototype"

  • central memory means that this abstract point can generate complete instance of the objects!

And unfortunately, it is not powerful enough and conceptually doesn't fit with data driven modeling

Usually, the aim is a simple, explanable generalization from the phenomenon under study!



What if we forget about the single descriptive model?

2- Centralized Memory and Distributed Prototypes and Multiple Overall Views

In [5]:
for i in range(5,6,1):
    for j in np.linspace(1,10,num=10):
        plt.plot(i,j,'ow',markersize=16);
        plt.plot(i,j,'or',markersize=6);
plt.xlim(0,10)
plt.ylim(0,11)
# plt.axis('equal');
plt.axis('off');
# plt.title("single storage point");

Higher capacity for dealing with complexity

Key issue: How to orchestrate these abstract points

  • By looking at different features of the objects
  • By focusing on different areas of the state space
  • Many powerful algorithms are in this category

This is also called "Manifold Learning"

Space of potentials

Most of real world applications can be solved with the second approach!

However, the capacity of the model grows linearly as the function of number of points!



3- Distributed Memory and No Explicit Prototypes and No Overall View

In [6]:
from matplotlib.patches import Rectangle

for i in np.linspace(1,10,num=5):
    currentAxis = plt.gca()
    cx = i-.6
    cy = 1-.5
    currentAxis.add_patch(Rectangle((cx,cy), 1.2, 10.1, fill=None, alpha=1))
    for j in np.linspace(1,10,num=10):
        
        cx = i-.6
        cy = j-.5
        ax = plt.axes()
        
        plt.plot(i,j,'ow',markersize=16);
        plt.plot(i,j,'or',markersize=6);
#         ax.arrow(cx+1.2, cy+.5, .7, 0, head_width=0.15, head_length=.1, fc='k', ec='k')
                    
    
plt.xlim(0,12)
plt.ylim(0,11)
plt.axis('off');

Each point in each layer becomes only a new dimensionality

In this level the combination of points in vertical and horizontal directions create the full object.

therefore the memory is distributed and by adding a new layer the capacity of the model grows exponentially!

This is in my opinion the main feature of the so called Deep Networks and Deep Learning methods

However, each node has no overall view!

Imagine if the current architecture of academia is like this.


In [7]:
from matplotlib.patches import Rectangle

for i in np.linspace(1,10,num=5):
    currentAxis = plt.gca()
    cx = i-.6
    cy = 1-.5
    currentAxis.add_patch(Rectangle((cx,cy), 1.2, 10.1, fill=None, alpha=1))
    for j in np.linspace(1,10,num=10):
        
        cx = i-.6
        cy = j-.5
        ax = plt.axes()
        if np.random.rand()>.3:
            plt.plot(i,j,'ow',markersize=16);
            plt.plot(i,j,'or',markersize=6);
#             ax.arrow(cx+1.2, cy+.5, .7, 0, head_width=0.15, head_length=.1, fc='b', ec='b')
        else:
            plt.plot(i,j,'ow',markersize=16);
            plt.plot(i,j,'og',markersize=6);
            ax.arrow(cx+1.2, cy+.5, .7, 0, head_width=0.15, head_length=.1, fc='g', ec='g')
            
    
plt.xlim(0,12)
plt.ylim(0,11)
plt.axis('off');
plt.title("combinatorial representation of objects");
print "Red nodes and arrows are activated prototypes for a specific input."
Red nodes and arrows are activated prototypes for a specific input.


Hierarchical Representation Learning and Compositionality

  • In addition, in this architecture, the models also learn a hierarchical representation of the objects, unlike the previous architectures that the representation of the data is fixed in advance!
  • This is of great importance in more complex applications where objects are composed of lower level features:
  • A floorplan as the composition of its elements"
  • A sentence as an ordered collection of words and ordered collection of charachters
  • A city as a composition of its road segements and building patterns
  • Biological Systems,..

An example of hierarchical representation learning using Convolutional Neural Nets



Applications

  • In principle, Machine learning and Data are offering a universal way to solve (disolve) many real world problems, which are hard by classical expert based or theoretical approaches
  • In principle, from the view of ML different domain specific problems become similar to each other

Some of the projects in different application domains:

  • Supply Chains and manufacturing systems
  • Transportation Dynamics
  • Air Pollution Modeling
  • Water Flow Modeling
  • Real Estate Market
  • Urban Modeling
  • Economic Networks
  • Natural Language Modeling
  • ...

1- Urban Morphology Meets Big Data and Deep Learning

Toward indexing all the cities in the world?

The Basic Idea

  • Urban Morphology is the study of urban forms and patterns
  • But currently limitted to theoretical and abstract models, which are based on limitted observations
  • In terms of ML, urban planners work with "A-priori" Rules

Or focusing on the evolution of city networks

  • What if we collect the data for thousands of cities and use ML to study them in a more data-driven way, then we can answer questions such as:
  • What are the clusters of emergent urban forms at the global scale?
  • What are the charachterisitics of each cluster?
  • Or predict quantitative features of cities (e.g. road pollution) by looking at their urban forms: streets, buildings, satellite images

Data Collection

  • Collecting images of street networks from OSM via styled maps from Mapbox
  • It is also possible to get the geometric information of road networks and buildings. There around 150M digitial buildings in OSM format!

location of 68K cities


In [8]:
from IPython.display import YouTubeVideo
YouTubeVideo('QFF5IezOdaU',width=700, height=600)
Out[8]:

Main Elements

  • How to compare city maps?
  • To learn a dense representation of each city

Main Techniques and Frameworks

  • Convolutional Auto-Encoders to learn the dense representations of each city

  • We Used Tensorflow codes will be here


Som initial Results

  • Using KNN on the learned dense vectors for each city

  • Further, Self Organizing Maps for dimensionality reduction and visualization

GitHub Repo. of the project


Next Steps

  • (Supervised) Convolutional Nets to Predict Airpolution (Not finished yet!) #

2- Real Estate Market

The Basic Idea

  • Initially: To see how it is easy to predict real estate property values, BUT THERE WERE NO DATA AVAILABLE!
  • So, we started crawling publically available data in Switzerland and Germany + open geodata
  • Prediction was easily possible with ML (94% accuracy (ARE) for rental price estimations in Switzerland).
  • This took us more into this application

Main Elements

  • Continuous Collection of online ads through web crawling
  • Geo-Coding: Google API
  • Collecting any other Open Source Data: OSM, Geoadmin.ch
  • Applying ML (Unsupervised) for filling the empty fields
  • Automated Evaluation Model on a server
  • Interactive web application

Main Techniques and Frameworks

  • Self Organizing Maps for multidimensional probablistic models
  • Ensemble models such as Random Forests (Scikit-learn)
  • Flask as web framework
  • Leaflet for mapping application
  • Mapbox Layers
  • D3 for interactive visualizations

Live Demo


Conclusions and Questions to Discuss

Plug and Play Machine Learning

What happens to the classical notion of "Domain Expertise"?

Big Data and Machine Learning are causing an inversion from "knowing the answers" to "finding good questions". Then, with data and Machine Learning we can answer any question, but the importance is the question itslef.