Big Data 30 September 2022

From Mathematician to Product Data Scientist

"This article was originally published at bigdata.go.th"

 

I remember when I first joined the data science industry, everyone was really surprised because my field of study had been only mathematics, but it is actually a very broad subject. To be honest, I would focus only on integrations that are similar to solving for area under the graph. That did not involve any data, machine learning, or AI at all. 

 

I understand that this may be the reason I was asked to write an article on how I became a data scientist despite having a mathematics degree.

 

Business understanding: Whoa! What is that?

My interest in data science began from the matrix (not the movie, but a box with numbers inside). I was so amazed with the strange multiplication definition, but it had various applications. We can write a linear equation in a pretty formula using the matrix and solve for the line of best fit by making use of the matrix’s inverse. Everything seemed so cool then.

Figure 1 An example of matrix

That was likely around the same time when the AI trend was regaining popularity as AlphaGo managed to defeat a go champion. Everyone was astonished, myself included. So, I took a beginner’s course on machine learning and data mining, and I found it was interesting. There were a lot of ideas that made sense and the beauty of mathematics was behind every model and every step of data analysis. However, I was too busy appreciating all these, so I didn’t pay attention to coding or using the model. At the end, there was still not much I could do.


Data understanding: Wait, so how is it going now?

When I graduated (my thesis was still about integration. If interested, you can read my paper at https://www.mdpi.com/2076-3417/9/11/2301), I had more free time. I had a chance for self-reflection, and I found my interest in data science was still there, and it was already more than that in integration. So, I thought of changing my own future. Instead of doing post doc, I decided to become a data scientist. But the point was, I did not have any skills necessary for a data scientist. What a sad life.

But honestly, stories in the data science industry are not that new from a mathematical perspective. For example, finding the line of best fit in the field of data science is called a linear regression model. That said, if you get familiar with the terms, it is possible to understand what is being unfolded in the data science realm more easily.
 

Figure 2 An example of the line of best fit (source: https://statology.org/line-of-best-fit-python)

Another point is that math in data science is not that complex. It doesn’t care much whether sequence will converge or not, or whether the records selected were ok during the model initialization. What data science primarily needs from math is probably statistics and experimental designs. Other things that if included will make the appreciation more profound are things like linear algebra, calculus (excluding integration), and graph theory.

If you ask me if I have these skills, I can tell you now that I don’t. LOL. And that took a toll on me.

Data preparation: (How to) get prepared

The situation right now was like the model needs some features, but they were somehow not in the dataset being analyzed. The only solution was to create the features. So, I had to take myself more seriously in learning. Not only did I have to understand how each model worked, but I also had to learn to write codes to make use of it. And that’s when I got to know about Coursera. I took courses ranging from writing Python to deep learning. Coursera was pretty much my bestie.

But this time was a total disaster. As libraries were developed so that coding to create models would be easier. All the math was already packed into model.fit. The first time I saw this code, I nearly cried. What I was looking for, what I had been searching for the whole time, it was not linear algebra, not calculus, or any graph theory. It was model.fit and model.predict only.

I admit that back then I thought becoming a data scientist was not difficult. Writing simple codes would make it easy-peasy. Picking metrics to measure the model’s performance would be it. I understand I thought that way because a misconception of the example datasets such as Iris or Titanic dataset that didn’t need much treatment before they went into the model, and there was no business impact to be concerned about. So, I was a bit overconfident because when I actually started working as a data scientist, everything was a much bigger disaster than expected.


Modeling: Remember, remember, remember, and learn

After I had a chance to be in the field with 0 years of experience, I ran into data and suddenly felt so pathetic. Whether it was a table inside a table like merged cells, or a table where data was being collected but suddenly there were sections like reporting. I started to question myself about why I was born, what harm I caused someone in my past life. Life was so sad.
 

But that tragedy was a positive experience in hindsight. I acquired skills to handle any type of data (except in pdf). At this point, I am grateful for colleagues and seniors in almost the entire organization who helped me survive and become better today.

Apart from managing data, methods of data separation, formats of model performance measurement, and model interpretation continue to excite me. I learned how to adapt methods and add features to the model to be able to extract main data necessary for building models. It was so much fun. You can call it a truly epic era of learning.

Evaluation: Is this even real?

I remember techniques and ideas from other people’s work, expecting that when it was time I did it myself, I would get similar working results. I remember the first model I did using actual data; it was a regression model that was not at all accurate. And that bumpy environment just suddenly made the performance better, like shockingly better. And what was even more shocking was when I was reviewing the codes, I used the target (the thing needed prediction) as features (data used for prediction). So cooooooool!

That was one of my mistakes. If I were to list all my mistakes, a 1,000 wordcount in this article would not suffice. I think it’s not always a bad thing to make mistakes as long as you learn from it (but it’s best to avoid it in the first place). So, what is challenging for me is doing whatever it takes to make the loop of mistrials and learning as short as possible, so that I can do it multiple times.

Anyhow, to be able to do that, the important thing that is mandatory is feedback. Because no matter how short the learning loop is, if you don’t understand what you already do well or your mistakes, all that you have learned can create a disaster. Fortunately, I’ve always had useful feedback. That’s why I have survived until these days as a data scientist.


Deployment: It’s time to debut

I just only recently dared to proudly call myself a data scientist. But that doesn’t mean that it’s the end of learning. It’s really strange because I think I know quite a lot more than I did (from integration), but the thing is that there’s still big data of knowledge that awaits my learning.

I don’t know how long it will take until I learn all of it as I am older, and I will keep getting older each day. The rest of my time will be allocated to keep myself busy learning and updating myself, like an iterative process that I name each section inspired by Cross-Industry Standard Process for Data Mining (CRISP-DM). It might be fun developing myself and the model simultaneously.

But it would be boring to have me sharing my stories one-sidedly. Now it’s your turn. If you have time, please share stories about how you became interested in data science.

Related Post

facebook
twitter
linkedin
email
print
copy Share
Cookie Notice

This site uses cookies for performance, analytics, personalization and advertising purposes. For more information about how we use cookies please see our Cookie Policy.

Manage Consent Preferences Essential/Strictly Necessary Cookies

These cookies are essential in order to enable you to move around the website and use it’s features, such as accessing secure areas of the website

Analytical/Performance Cookies

These are analytics cookies that allow us to collect information about how visitors use a website, for instance which pages visitors go to most often, and if they get error messages from web pages. This helps us to improve the way the website works and allows us to test different ideas on the site

Functional/Preference Cookies

These cookies allow our website to properly function and in particular will allow you to use it’s more personal features.

Targeting/Advertising Cookies

These Cookies are used by third parties to build a profile of your interests and show you relavant adverts on other sites. You should check the relevant third party website for more information and how to opt out, as described below.