Data science exploded in popularity a few years ago and shows no signs of cooling down. Whilst this popularity is relatively new, the term “data science” and the concepts that it envelopes have been around for much longer. The tools and models have changed along the way, but the core skills involved have remained the same: data manipulation and problem solving.
Nonetheless, the popularity of Data Science combined with catchy phrases like “Data is the new oil” have been effective at convincing organisations to invest in Data Science, hire teams of data scientists, and set up expansive data lakes – all in the hope that it will lead to faster and better decision making. The reality, however, is that data lake initiatives can take years to show results, and if not carefully managed, can easily become “data swamps” that simply eat resources.
Domain Knowledge is Key
Perhaps more fundamentally, extracting patterns and insights from data is only useful if it helps solve a problem; knowing that 37% of your customers get their hair cut on a Saturday is useful if you sell hair care products, but is unlikely to help a pet food business. Domain knowledge is just as important as data, and no amount of regression can replace understanding how a business works.
Before starting a data lake or hiring teams of data scientists, organisations looking to make better data decisions should start by looking at what they already have. They might not use neural networks or terabytes of data, but someone in your organisation is almost certainly solving problems with data – probably with nothing more than Excel. Sales forecasting, risk monitoring, asset health tracking, and even complex simulations – more Data Science and Business Intelligence happens in Excel than any other tool.
The problem with Data Science in Excel is that it’s impossible to deploy; it’s stuck in Excel, on a laptop, and almost always out-of-date. Versions of it might get emailed around or used to generate reports, but old data can be worse than no data, and it relies on the person who created it to update and share the “master copy” at regular intervals.
Keep it Simple
These Excel-bound Data Science projects are low-hanging fruit that can make fantastic starting points for a Data Science journey. By beginning with projects known to offer value, organisations can reduce risk and focus on the multitude of problems that come with trying to deploy (or “productionize”) Data Science projects, such as data ingestion, how to deploy it, and how to visualise the results.
By investing in tools like Lumen that help make deploying Data Science easy, organisations can empower their existing employees to build analytics and share insights that are accessible and always up-to-date. Many engineers already have skills with Data Science languages such as R, Matlab, and Python (or are eager to learn if given the opportunity). Alternatively, quickly converting Excel analysis into Python analytics can serve as a useful introduction to the business’ problems for newly hired Data Scientists and lets them deliver value early, before getting stuck into more complicated problems and solutions.
Data Science is a science, and much like the physical sciences, good engineering and quality tooling can help make it easier, but doesn’t guarantee success. By starting small, keeping it simple, and iterating quickly, organisations can unlock value from what they already have. Data Science doesn’t need to be complicated, but it is a skill that needs to be learned. Organisations should dip a toe, test the water, and learn what works best for them before taking a dive into the data lake.