Breakthroughs from Research #3

Author: Luigi Muzii. Link to original: (English).
Tags: НУА Submitted by dorogayaanna 07.09.2016. Public material.

Translations of this material:

into Russian: Перевод "Breakthroughs from Research #3". Translation is not started yet.
Submitted for translation by dorogayaanna 07.09.2016


In the last few years, Internet and data have been the engine for change, affecting global communications in every area, including the translation industry.

Big data and the IoT

A few weeks ago, at the International Consumer Electronics Show in Las Vegas, 2015 has been designated as the year of connected devices. From toothbrushes that can schedule check-ups with dentists to yoga mats that can analyze āsana in real-time, over collar-powered trackers helping owners locate their runaway pets.

It is the Internet of Things (IoT), everyday objects with integrated network connectivity, which Gartner predicts in over 25 billion by 2020.

These devices will be producing exabytes of data every day. Real-time processing, analysis, and leveraging are becoming a capability requirement.

Right now, big data is central to many areas because of the unparalleled amount of data produced every day. Most research projects require a massive data-crunching and machine-learning approach.

Recently, the American Association for the Advancement of Science identified a poor fit in traditional university career paths for experts to build the tools to analyze vast amounts of data now abundant in every field. Big data experts are already sought-after by industry and needed in academia, i.e. to process gene sequences or cosmological data.

Achievements in statistical machine translation are also due to a change in paradigm made possible by the availability of an unmatched amount of language data.

Kenneth Cukier, co-author of Big Data: A Revolution That Will Transform How We Live, Work, and Think, explained this brilliantly in a TED talk last June.

Computer scientists changed the nature of the problem from trying to explain to the machine how to translate to instructing it to figure out what a translation is from a huge amount of data around it. They call it machine learning.

With the emergence of IoT, data is changing status, from static to dynamic, and is been leveraged for uses never imagined when collected, with translation becoming ubiquitous and more and more a big data issue.

The case against machine translation

As Thomas H. Davenport suggests in the Wall Street Journal, the value added to data by analytics comes from the human brains’ systematic fallacy in its appraisal. Computers can see more in data and do things that humans cannot do, especially with small data.

Fundamentally, the hostility against machine translation comes from the anthropomorphization of computers, which makes people assume that computers process data just like humans do, albeit more quickly and more accurately.

In reality, a big data system looks at a historical set of behavioral data and statistically infers a probable behavior under similar circumstances. The same happens with statistical machine translation.

Many people worry that increasingly smarter machines will disrupt the labor market and threaten humans. Even some technology optimists, such as Vinod Khosla, say that as computers and robots become more proficient at everything, skilled jobs will quickly vanish.

Others, like Marc Andreessen, think those worries are nonsense, since technological advances have always improved productivity and created new jobs. Andreessen also considers machine learning as a trend to watch out for in 2015.

The evolution in artificial intelligence and machine learning consists of deep neural networks (DNNs,) biologically inspired computing paradigms designed like the human brain, enabling computers to learn through observation.

At the beginning of the last decade, building DNN-based systems proved hard, and many researchers turned to other solutions with more near-term promises. Now, thanks to big data, new DNN-based models can learn as they go and build larger and more complex bodies of knowledge from and about the dataset they are trained on. Machine translation is a promising research field for the application of DNNs.

The uberization of work

According to Farhad Manjoo, new technologies have the potential to chop up a broad array of traditional jobs into discrete tasks to be assigned to people when needed. Wages could be set by a dynamic measurement of supply and demand. A worker’s performance could be tracked and subject to the light of customer satisfaction.

Manjoo calls this uberization of work, to resemble what Uber is doing for taxis, with the key perks of an Uber job being flexibility, working 1-15 hours a week and an easy additional income. Also, Uber drivers do not require any particular ability other than driving.

This is exactly what has been happening for decades in the localization industry, where freelancers have been experiencing this kind of ‘novelty,’ called moonlighting. On the other hand, in The Internet is Not the Answer, Andrew Keen uses Uber as an example of the exploitation of the openness of the Internet to take control of existing industries.

The prevalence of on-demand jobs in the immediate future is the real novelty, while for the localization industry uberization could represent disintermediation.

The invisibility of translation

Ray Kurzweil predicted many of the most important innovations of the last twenty years. His predictions for the next 25 years could seem mind-boggling, but also obvious.

Four years ago, Kurzweil predicted that spoken language translation would be common by the year 2019, and that machines would reach human levels of translation quality by the year 2029.

Kurzweil’s predictions seem realistic also because we have been in the second half of the chessboard for a few years now and the cost of storing and analyzing ever-vaster amount of information keeps steadily decreasing. Big data and IoT will make translation even more central than in the past, but definitely invisible.

The problem with exponential growth is that, contrarily to previous technological revolutions, shift has been happening too fast to provide new opportunities to successive generations of workers.

Only a quarter of a century ago, young people could comfortably plan their future on a five-year span, and retraining for displaced workers was a viable solution. Today retraining is a viable solution if it is quick enough, and no competence can be disjoint from data and its manipulation. This is true even for the language industry, if language is just a technology as Mark Changizi, Director of Human Cognition of 2AI Lab, suggested.

The pervasiveness and centrality of translation in the age of IoT and big data shall lead to applied knowledge, with the ability of producing, using and manipulating data being essential.

Quantity may be quality

Today, as General Electric CEO Jeff Immelt noticed and the success of Douglas Hubbard’s book How to Measure Anything testifies, almost everything can be measured, making the physical and the analytical worlds no longer separated.

Often we do not notice the big data aspect of our daily encounters with technology. And yet, despite the often comical renditions, not only is the autocorrect feature in many a device helpful, it is also daunting: It is the result of the infinite number of combinations, a matter of big data.

The task is challenging and tricky, as frequent errors in data can become viral, but big data and machine learning will improve context-based functions for an improved experience across (connected) devices.

When new technologies make bold promises, discerning the hype from what’s commercially viable is a problem. As Alon Halevy, Peter Norvig, and Fernando Pereira from Google noticed six years ago, the promise of gaining more insights from the more data collected could be labeled as “the unreasonable effectiveness of data.”

In reality, the problem is one of validity: With so much data and so many different tools to analyze it, how can one be sure results are correct? Good statistical modeling requires stable input, at least a few cycles of historical data, and a predicted range of outcomes.

Following Gartner’s Hype Cycle, big data may have just crested the wave of inflated expectations and be barreling towards the trough of disillusionment, but this means it could be approaching the maturity stage, when technologies recover to reach a plateau of productivity.

In the end, the question is still the same, whether (translation) machines will render the work of (trained) humans obsolete. Wait: whether or when?