11.04.2018 change 11.04.2018

Polish researchers analyse Big Data and predict election results

Photo: Fotolia Photo: Fotolia

Analysis of publicly available online data - blogs, internet forums, articles - allows to predict the results of democratic elections more accurately than traditional polls, shows reseach conducted by scientists from the University of Warsaw.

There are now over 27 million internet users in Poland alone. They post huge amounts of comments, blog articles, files and documents every day. If you know how to search these data for specific information, you get an unprecedented insight into what people think, what their views are, what is important to them, and what they consider less important.

Analysing publicly available online content allows, for example, to predict the results of parliamentary and presidential elections a few days in advance and with greater accuracy than traditional polling would offer. This has been demonstrated by scientists from the Faculty of Journalism, Information and Book Studies of the University of Warsaw. Researchers are also trying to predict economic trends or directions of technology development by analysing Big Data.

"Our way of finding out what people think is completely legal" - comments Dr. Wiesław Cetera, who participates in the research project. He emphasizes that all the data used in research are available to the public, anyone can access it. Researchers only collect and "process" this information.

"Internet search engines display less than 0.1 percent of digital information available online, the rest is overlooked for various reasons" - says the research project leader Prof. Włodzimierz Gogołek. That is why his team uses their own bots - programs that scour the predefined information sources and search for specific words. "Before the 2015 presidential election, we sent bots to all Polish information sources that wrote about the election. This included newspapers, books, forums, blogs, public posts on Facebook" - the researcher says. The bots searched for pages that contained words related to the election, for example: "election" "Duda" and "Komorowski."

In the collected texts - after processing - researchers searched for so-called sentiments. Simply put, it means emotions associated with keywords. Positive sentiments included the words "amaze", "goal", "like", "sure", "win". Negative sentiments were, for example, "guilty", "destruction", "fall". Researchers checked which sentiments surrounded the words "Duda" and "Komorowski", and calculated the support for each candidate on this basis. "We knew who would win one week before the election. Our results were more accurate than those provided by the Public Opinion Research Center" - brags Prof. Gogołek. The system was tested during the parliamentary elections in 2011 and in 2015, and the presidential election in 2015.

Dr. Wiesław Cetera explains that poll centres use samples of 1-2 thousand people. Collections in Big Data analysis are much, much larger. "The sample is so large that it can not be achieved in traditional polls, and we know that the greater the sample, the more accurate the results. Even if the sample contains low-value opinions" - says the scientist. He adds that even if individual parties hired trolls (people paid for writing comments) before the vote, this did not have a major impact on the results of the researchers` analysis.

Predicting election results is just one example of how Big Data can be used. For example, researchers from the University of Warsaw help experts from the National Centre for Research and Development identify the directions of innovative activities that are worth investing in.

Prof. Gogołek says that one of the tasks his team received from the National Centre for Research and Development was to attempt to determine the development prospects of cloud computing - processing information in the cloud. "If there is money for cloud computing, then we can tell which direction is worth taking, and which is a dead end" - he says.

The researchers analysed huge data sets, including the results of public tenders. "We started to check who is interested in cloud computing in Poland" - says Prof. Gogołek. In these studies they also analysed sentiments - emotions that accompanied information related to cloud computing. As the researcher summarizes the results, large companies with stable own IT potential are reluctant to use cloud computing solutions and consider them to be too expensive. Interest in this type of solutions can be observed among smaller companies, mainly start-ups, and large companies that have not invested in their own IT potential.

Wiesław Cetera had another idea for Big Data analysis. He analysed the occurrence of words related to terrorism on the website of the Arabic TV station Al Jazeera. "If terrorism was not on the news for a few days, a new terrorist attack would follow" - he says. In addition, according to the researcher, the occurrences of words related to terrorism were associated with... crude oil prices on the New York Stock Exchange. Whether or not terrorism was a topic dictated the oil prices to some extent.

In turn, Prof. Gogołek`s students used Big Data analysis to try to predict the stock prices of Polish listed companies.

"We are not reinventing the wheel. Such systems of analysis of large information resources have existed for many years and have been used mainly by economists, especially bankers. We show further examples of the use of these solutions, for example in the humanities research, previously dominated by qualitative studies" - concludes Prof. Gogołek.

PAP - Science in Poland

Author: Ludwika Tomala

lt/ ekr/ kap/

tr. RL

Copyright © Foundation PAP 2020