Op-Ed: From Big Data to Humanitarian In The Loop Algorithms

This article was written by Miguel Luengo Oroz, Chief Data Scientist at United Nations Global Pulse, an innovation initiative of the Executive Office of the United Nations Secretary General with a mission to harness Big Data and Artificial Intelligence safely and responsibly as a public good. Here, Luengo Oroz describes an ongoing project which focuses on understanding how social media data can inform the perceptions of host communities on refugees and migrants fleeing conflict-affected areas across international borders. This op-ed was initially published in the UNHCR Year in Review 2017. Views are the author's own.

From Big Data to Humanitarian 'In-The-Loop' Algorithms
By Miguel Luengo Oroz

The Data Revolution is no longer a new topic but a reality trying to catch up with the expectations it has generated. The private sector is investing billions in new start-ups and technology companies that can ingest the vast amounts of data generated by citizens and which use artificial intelligence algorithms to predict when, how and what people are more likely to buy. In contrast, humanitarian organisations today have just begun exploiting the potential of big data to improve decision-making. Measuring the impact of these data-driven decisions will help make the case for further investment in big data innovations. Once humanitarian practitioners understand the return on investment of big data innovations, we can start measuring the costs (financial and human) of not using these data, and we can begin to streamline scaling and adoption mechanisms.

One of the factors contributing to the slow institutional uptake of big data and analytics within the humanitarian sector is a lack of knowledge and capacity to apply these instruments in operational settings. In general, humanitarian and data experts do not speak the same language; they do not share a common vocabulary or context, and often cannot align their goals. This challenge is not a new one. And for me has become a sort of “déjà vu.” Fifteen years ago I started working in development biology, where AI and data experts were helping to “revolutionise” the field the same way data scientists are trying to impact sustainable and humanitarian efforts today. New microscopes taking high-resolution images of tissues and organs were viewed the same way satellite imaging showing the impact and recovery from natural disaster is viewed today. Similarly, the same way that fluorescent markers allowed tracking of millions of cells migrating in the body, today we can track the movements of people fleeing conflict using aggregated mobile phone data. It took years for the field to mature while a new generation of researchers, technicians and biologists mutated into multidisciplinary profiles. This is also the case with humanitarian organisations that need to create hybrid profiles, i.e. data translators who can both understand the operational humanitarian contexts and have data intuition. They know what can and cannot be done with data and how to interpret and visualise data and algorithms to provide information for real impact.

At the beginning of this year, UN Global Pulse worked with UNHCR on a project to use realtime information on human perceptions to identify opportunities that can inform the organisation’s efforts on the ground, and more largely, its humanitarian strategy. The project combined UNHCR’s expertise in the field of humanitarian action, and the years of innovation work leveraging big data for social good from UN Global Pulse, to understand how social media data can inform the perceptions of host communities on refugees and migrants fleeing conflict-affected areas across international borders.

Using new data for insights into humanitarian contexts is a multifold process. Before we can test any innovation project in an ongoing emergency, we need to select a retrospective realistic scenario, or a simulation, to understand the value of the data. This is exactly what we did together with UNHCR, where we explored the viability and validity of Twitter data in the Europe Refugee Emergency crisis. Our goal was to see how we can bring more data-driven evidence into decision-making processes and advocacy efforts, particularly to help UNHCR develop an institutional policy against xenophobia, discrimination and racism towards migrants and refugees. For that purpose, we partnered with Crimson Hexagon, an analytics tool provider, and used their tools to access and analyse social media posts. The findings of the exploration can be accessed in the paper “Social Media and Forced Displacement: Big Data Analytics & Machine Learning.” The project has now entered a second phase, in which the aim is to create a real-time situation awareness tool. It will require finding the right balance to introduce a new approach into existing workflows and operations, respecting the unique strains on staff and responders during an emergency. The cocreation of prototypes with users on the ground is key to generating useful tools. This is why identifying the right partner, with the right complementary skills, is important.

Once you have created the right team and identified the right questions, the next step is data access and analysis. From UN Global Pulse’s experience working with many sources of data from social media, to radio feeds, to mobile surveys, to vessel tracks, postal traffic and so on, we have learned that clear and proven algorithms, and analysis methodologies are crucial to distilling insights from raw data. There is no silver bullet; and recent hype oversimplifies what can and cannot be done with big data and artificial intelligence. Data characteristics including sampling, demographics, completeness or inherent bias have different properties, hence analysis must always be put into context sooner rather than later.

When talking about machine learning and the new neural network architectures that have revolutionised AI in the past few years - aka deep learning- it is important to remember that the machine will be as biased as the data that is used to train it. Though current real-world applications are mostly limited to internet business, digital marketing, playing board games or self-driving cars, there is a wealth of opportunities for AI methods to perform tasks where certain patterns are repeated. One of the critical issues is the need for ethical principles that can govern how artificial intelligence methods are developed and used- and how and to which extent AI should be regulated. The use of autonomous weapons or viruses targeted to individuals with a particular trait in their DNA are clear examples of data driven threats. We also need to develop privacy protection principles on the use of data and agree on frameworks for the way in which these data are processed by algorithms. The principles of responsibility, explainability, accuracy, auditability, and fairness can guide how algorithms and AI programmes work. And although one size won’t fit all, especially in humanitarian situations, we can ask what expectations we should have in critical humanitarian scenarios where the well-being of vulnerable populations is at stake. Certainly, the benefits will depend on the nature of the crisis - a medical emergency is not the same as a natural disaster or a conflict-affected area - as will the potential risks and harms. If in certain situations the harm comes from not using the available data, in others, insights distilled from these data could be used to target populations and cause more damage than good.

So what will the future of big data analysis and AI bring for the humanitarian field? In my view, we should imagine a future where we have understood how to augment (and not replace) the human condition by leveraging technology. Data-driven benefits can certainly help reduce inequality. This will require a new research agenda where scientists and technology companies work to solve problems that apply to a wider range of social groups and that include the 17 global goals we have vowed to achieve by 2030. To serve humanitarian practitioners, the current deep learning revolution should pay increased attention to methodologies that can work in data-scarce environments, that can learn quickly with few examples and in unknown crisis scenarios, and that are able to work with incomplete or missing data (eg. “one-shot-learning”).

In humanitarian contexts, we could consider an extension of the “society-in-the-loop” algorithm concept - embedding the general will into an algorithmic social contract-, where both humanitarian responders and affected populations understand and oversee algorithmic decision-making that affect them. Before 2030, technology should allow us to know everything from everyone to ensure no one is left behind. For example, there will be nanosatellites imaging every corner of the earth allowing us to generate almost immediate insights into humanitarian crises. Progress will just depend on our actions and political will. What I also foresee is a not too distant future where data and AI can be used to empower citizens and affected communities in humanitarian crises. The digital revolution can help refugees protect their rights and their identities and even create jobs. Imagine a future where refugees could be granted digital asylum in other countries for which they can do digital work and contribute to the growth of that economy. From both public and private sector perspectives, we are living a unique moment in history with regards to shaping how algorithms and AI will impact society. What we need to make sure is that the data we produce is ultimately used to benefit all of us.

Read the UNHCR Year in Review 2017 at: http://www.unhcr.org/innovation/year-review-2017/.

(Photo courtesy of UN Global Pulse)

Subscribe to the HPC Newsletter