Jump to content

Spatial embedding

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by KnowledgeDevourer (talk | contribs) at 19:45, 19 January 2021 (Added image to graph data type). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension.

Embedded data types

Geographic data can take many forms: text, images, graphs, trajectories, polygons. Depending on the task, there may be a need to combine multimodal data from different sources. The next section describes examples of different types of data and their uses.

Text

Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques.

Image

Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region.

Satellite image of Seattle city.
Example of Seattle city satellite image acquired using remote sensing methods.

Point

A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph.

Map of point of interests from OpenPoiMap
Example of a point of interests map from OpenPOIMap.

Line / Multiline

Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization.

Black and white map of Beijing containing few mobility trajectories plotted on top of it.
Example of mobility trajectories from the GeoLife dataset (Beijing, China).

Polygon

The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents.

Map of San Francisco bay with 19 blue hexagons plotted on top of it
Example of regular hexagonal tiling used to divide San Francisco Bay area using Uber's H3 library.
Map of San Francisco administrative districts
Map of San Francisco administrative districts.

Graph

An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network.

Diagram of the Rennes Metro
Example of a city network: the Rennes Metro (French: Métro de Rennes). In this example metro stops are vertices and tracks between them are edges.

Usage

Temporal aspect

Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other. Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region. In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.


References