What do Data Scientists do?

SearchTeam asked Data Science Expert Dan Hnyk what actually Data Scientist do and how to enter a data science field.

What do you do as a Data Scientist?

  • clean the data and prepare for analysis
  • analyzing data sets of various sizes from data quality assurance perspective (missing values, errors, statistics)
  • visualizing the data (charts of distributions, tables, relations…)
  • model the data
  • transform the data

How can somebody become a Data Scientist?

  • learn any open source stack to work with data sheets, Python (Pandas library) – preferred – or R, then in depends on the size of data sets and domain
  • have a strong domain knowledge about the problem (or have someone else who has that)
  • knowing maths and statistics helps

What are the traits of excellent Data Scientist?

  • as every engineer: focus on a product, knowing what’s the goal
  • his work is 100% reproducible, transparent
  • consider data as sacred – never does some weird transformation without thinking through possible side-effects 
  • he is never dishonest about the analysis, never hides something
  • never believes only his intuition, always validates that with reality
  • being able to handle whatever input source of the data he encounters

Do I have to know some programming language to become a Data Scientist?

  • yes, Python or R is ruling the world, respectively 
  • Excel, SPSS, Tableau etc. are sooner or later limit you what you can do (automation, running on a headless Linux server, size of the data set, …). Plus they are expensive for no additional value

In what companies are Data Scientist positions?

  • tech companies, not surprisingly… Any company which collects some data and is big enough to hire a dedicated data scientist (>40 people my humble guess). Data are coming from non-data products such as user behavior, acquisition, various performance, pricing models… Or data products such as performing some data research (such as surveys), evaluating a performance of some processes, web analytics, …

What is the best way to get a entry level data science job?

  • knowing some open source stack to the job (so the company doesn’t have to invest in you)
  • have hands-on experience on a real data, e.g. through Kaggle competition
  • if you can program (advantage of knowing Python), it’s usually much easier since you can also act based on the analysis

What are the main challenges of the Data Science job?

  • having the data – real-world data sets are small, messy, full of empty values and errors, without proper description or documentation. It’s often more important to propose a better data acquisition mechanism for better data and providing a simple analysis rather than having robust analysis on a garbage data (there is a saying “Garbage in, garbage out” and it’s true). It’s often not necessary to collect everything, but only a good subset and concentrate on that
  • predicting time taken, it’s just a dark art in DS. But all of us must meet the deadlines.

What do you like mostly about Data Science job?

  • it’s variance (non routine) – no data set is the same, there are always surprises, you never know what’s going to come
  • infinite number of approaches to solve a given problem
  • rapidly developing tech stack

What tools use Data Scientist?

  • PC/laptop with sufficiently high memory is all you need from a hardware perspective (depending of the size of the data set)
  • programming stack of his choice
  • various input sources of the data/storage (e.g. SQL, CSV, HDF, Hadoop…)

What in you opinion is a future of data science? 

  • automation of data insights – data science as an automated service 

Are there any good data science courses? a Are there any data science certifications?

  • yes, Coursera and Udacity offers great high-quality courses and the choice depends on ones target area