Note: This article was first published in GovernmentCIO magazine
n the 21st Century, there is no job sexier than that of the Data Scientist, according to Harvard Business Review. Thousands of Data Scientists are currently hard at work in both big business and start-ups. All over the country, companies are utilizing the high-powered, data-driven, enthusiastic minds of Data Scientists to explore the world of Big Data. In our modern world, there are sources of information that have never existed before; companies are able to utilize an ever-increasing amount of data in order to better predict market trends and customer values.
Part analyst; part artist. One company that is currently leading the push for more data scientists is IBM, who describes the ideal data scientist as someone who, “represents an evolution from the business or data analyst role. The formal training is similar, with a solid foundation typically in computer science and applications, modeling, statistics, analytics and math. What sets the data scientist apart is strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization.” Essentially, a Data Scientist is an individual who thinks to ask questions no one else is asking, and then goes after the answers through as many independent sources as possible.
Data Science as an idea goes all the way back to famous American mathematician John W. Tukey who wrote that data analysis needed to take on the characteristics of science rather than mathematics due to its mostly empirical nature. Over the next half century, Tukey’s idea has taken on a new life of its own as Data Scientists are now being called Renaissance men and being compared to computer programmers of the 1990s. In fact, Forbes just listed Data Science as the 8th highest paying job of 2015; quite a leap from the mind of a mild-mannered mathematician in 1962.
When you hear someone describe a Data Scientist, it might sound like they are describing a unicorn. How can someone be both a computer programmer and a statistician, as well as possess strong business acumen? Not only should your data scientist be a technical and statistical maestro with a strong head for business, they should also be able to take incredibly complex ideas and explain them to a layperson. However, that is exactly what it means to be a Data Scientist, and business and IT leaders are turning to them in greater numbers every year because of the value they provide.
Big Data is a rather nebulous term for the collection of all the data generated by society, and includes everything from user demographics and customer data to pictures and video, text messages and emails, and even gps locations. According to IBM, the human race is generating 2500 Petabytes of data per day.
To put this into perspective, consider this:
Here are a few comparisons to help contextualize what 100 petabytes looks like:
1/4th as much data as Facebook stores today for its 1+ billion users.
11,415 years of HD video watched 24×7 could be stored.
$51,600,000 spent annually to store this much data on Amazon S3.
33 billion songs stored, or all of the songs iTunes has 1270 times over.
This means that every day the human race is generating:
6 and 1/4th times as much data as Facebook stores
285,375 years of HD video
$1,290,000,000 worth of annual storage space on Amazon servers
825 billion songs, or the entire iTunes library 31750 times.
We have in the last two years, created more data than the entire history of the world before it. And this is Big Data. These data sets are so large that traditional methods of processing and analyzing do not work. This is what Data Scientists venture out to tame, to interpret and to understand.
Perhaps we can look at this another way. You may be familiar with A Song of Ice and Fire, the basis for the HBO award-winning show Game of Thrones by George R.R. Martin. A Song of Ice and Fire is a fantasy epic, second only to Tolkien’s The Lord of the Rings, currently topping out at 4,272 pages. What you may not know is that 4,272 pages of text translates to only 21.7 Megabytes of data.
Based on research done between Cloudera and UC Berkeley, companies today are using Data Scientists to extract information from data sets that are anywhere from 80 Terabytes to 8 Petabytes. This means Data Scientists are looking for answers in data sets large enough to hold A Song of Ice and Fire over 400,000 times.
But Data Science is not just about reading large data sets. The ability to take those large data sets and synthesize them into something meaningful has real-world application and value. The Fathom Knowledge Center states that bad data and poor quality data together cost American businesses 600 billion dollars annually. However, this is a true two way street, as data-quality best practices have been shown to boost a company’s revenue by up to 66%. In the case of Fortune 1000 companies, Fathom even asserts that a 10% increase in data utilization can result in an increase in revenue of over two billion dollars.
Avanade, an internationally recognized technology consulting firm, just finished an international survey to assess the impact of big data on the information technology (IT) industry. By surveying managers and C-level executives from seventeen different countries they showed that 56% of business and IT executives were overwhelmed by the amount of data they now have to work with; 43% of executives are dissatisfied with the data management systems and tools currently in place, and 46% of executives report having made a bad business decision based on incorrect or outdated data. Data Scientists exist to alleviate this burden and protect businesses from bad business decisions.
Around the world, Data Scientists are helping to create a culture of data management and utilization that will allow businesses and organizations to make more accurate and timely decisions while increasing revenue. If your organization is stuck in the same routine as usual, struggling to make the best decisions, or just in need of a change, consider hiring a Data Scientist to help you see things through another lens: Big Data.
Comments