background

Pipelining Business Data for Commercial Real Estate

Feeding a Fortune 500 REST API

PROJECT GOALS

A Fortune 500 commercial real estate services company was building a service oriented web application platform for business market data analysis built with Django. The platform required large quantities of data from multiple internal data silos to be ingested, processed, and loaded into their API databases efficiently and programmatically.

The source data lived in less-than-desirable conditions, like:

  • Very unkempt PeopleSoft databases
  • Excel spreadsheets on an FTP site
  • Tab delimited files in email inboxes

Further, the data (while related) was maintained by entirely different business units in the organization and therefore inconsistent.  Records would need unit transformations and extensive mappings, and relationships would need to be built in the final dataset.

Finally, much of the data was sensitive and confidential to the organization, so bulk pushes to the cloud were a non-starter--relevant information would have to be extracted explicitly.


OUR SOLUTION

Lofty Labs built a distributed data pipeline using Python and Celery.

With target application coded in Python and Django, taking advantage of existing validation and APIs and building a pipeline with Python was a massive boost to time-to-market.

ETL Architecture

The pipeline takes advantage of a host of optimizations, including smart batch processing, asynchronous task grouping, and post-processing jobs that cleanly and idempotently moves data into the cloud.


ENGAGEMENT RESULTS

The organization now has a pipeline capable of moving and manipulating tens of millions of records per day into their market statistic application APIs,  powering multiple dashboards and analysis applications.

This pipeline is supported by an extensible API that can be augmented to include many data sources and many new target stores, making it a potential standard for pipelining live data across the entire organization.

CBRE Brokers

Now lets work on your project.