🤑 How to Get Rich in Data Science
Hey, Zenia! Happy Birthday!
🥳🎂🎁🎉
It would've been nice if I could send some dried flowers 🌼 along with this, but I guess this will do for now.
Welcome to a work-in-progress document to give you everything you need to further break into data as a career, and succeed at it. 😤
You are one of the strongest people I've ever met and your tenacity, drive, and ambition has amazed me every day. I truly believe that you will be able to achieve everything you've set your mind to and I'll happily help you however I can along the way. ❤️ I can't wait to see where you go from here.
See ya at the Circuit Gilles Villeneuve, MTL. 🏎️
- With love, Journey
Some Background on What Data Work Even Means
Data science is a super broad field, with some jobs requiring highly technical knowledge, high-level mathematical ability, a developer background, or a strong understanding of business. I've found being a little bit of a generalist to be a good thing in this field. I might not be the best at anything, but I can usually synthesize these skills better than most, and I have a feeling you'll be very capable of the same thing. Don't be afraid to use that to your advantage and really highlight it on your resume/during interviews.
While going through this next list, consider your personal strengths and what kind of tasks you would find interesting and fulfilling. A major mistake many people make is to get very good at tasks that they hate doing, and then find themselves pigeonholed in their careers. Don't be afraid to experiment, but make sure you always target what you truly want, not just what is available. Never be afraid to self-advocate.
Some common roles/tasks in data careers:
- Dashboarding/KPI Metrics
- Most businesses have no clue what they are doing. Putting some numbers on a dashboard can genuinely provide a huge amount of value for very little effort. Key here is understanding business needs, stakeholder requirements, etc.
- Ad-hoc information
- In cases where data/analysis is not readily available to business units, they may ask you to pull that information for them. E.g., "What was our renewal rate for this specific demographic last year, and how did it compare to the average for the 5 years prior?"
- Note: In cases like this it's tempting to simply provide the data, but it's always better to get additional context, and confirm this data will truly answer the underlying question they have.
- E.g., in the above case it might be that they are trying to determine whether or not to continue marketing to a demographic and they've made a lot of unfounded assumptions about the profitability of this segment, the importance of renewals, etc.
- Deeper analysis
- Occasionally you might be asked to produce a deep analysis for particular business areas or concepts, essentially to confirm assumptions, produce better understanding for managers, and ideally find low-hanging fruit to improve business performance.
- This analysis usually doesn't need to have high-level mathematical modeling involved. What's more important is a deep understanding of the data, and the business, and to have a creative mind.
- Higher level modeling
- This gets to what most people imagine when they hear "Data Science" these roles are few, well-paid, difficult, and highly mathematical. They do not necessarily require a Masters/PhD, but it helps. Alternatively portfolio work/competitions can get your foot in the door here.
- Automation/development
- Maintaining scripts, data pipelines, dashboards, automatic reporting, etc.
- DBA work
- Usually very SQL focused. This is developer work, but highly specialized to database work. Performance is key and having a deep understanding of SQL, execution plans, etc. can be highly valuable to a company. (Imagine taking a reporting script that takes 8 hours to run and re-writing it to run in 30 seconds.)
A lot of businesses think they want "higher level modelling" while not even having the faculties to provide standard KPI dashboards. Good to know what you're actually getting into instead of what the company dreams they will be able to achieve.
Good to know that in some businesses the data area basically acts as it's own business, with the core business teams acting as clients. They might come to you for help, and you'll need to come up with a way to provide them with value. Sometimes you might even feel like an internal consultant providing thoughts on how to better operate the business.
In general I think there is a big gap in the data world for people who are able to communicate well, understand a business, and use that knowledge to actually provide value to a business. I've seen a lot of people start modeling/analyzing without understanding.
- Where did this data come from?
- What process put it there?
- Does this actually mean what I think it means?
- Are there any null values and is that expected?
- How does it relate to the profitability of the business?
- Are the key assumptions people have about the business holding true?
- What is it that stakeholders want, and is that actually the right thing to provide or just what they know how to ask for?
If you are able to show people that you are detail-oriented, business focused, a strong communicator, and easy to get along with, you will never lack for work. (This holds true for basically every industry, not just data.)
Getting There as Fast As Possible
While I'll provide more in-depth resources later on, the reality is you're smart and can probably learn whatever you'd need quickly while on the job. So here's a little info up front about how you might be able to shortcut all that and get more interesting job prospects fast. Can't say this will work for sure, but it's at least some things that helped me.
The goal here is to learn a few key skills that companies need, produce some sample work in a GitHub portfolio, and understand a few deeper topics for conversation during an interview. Honestly, some of the insights I've written above will already help show potential interviewers that you have some conception of the field and are prepared to operate within it beyond what most new-grads would understand.
- Get any prior data work you've done polished and uploaded somewhere as portfolio work you can speak to.
- I'm happy to provide feedback on anything you've done to make sure it's a good show for getting hired.
- Learn some basic SQL
- difference between SQL the language, and SQL server implementations (Oracle, PostgreSQL, SQLite, etc.)
- select from
- sub queries vs. temp tables
- all the joins
- Advanced topics to impress:
- Learn some basic Excel
- vlookup, xlookup (the better version of v/h lookup), index & match
- pivot tables
- when to use a named table
- charting and pivot charts
- basics of reconciliation (using excel to quickly validate datasets)
- Advanced topics to impress:
- hotkeys for common things like clearing filters, auto sums, navigating cells and sheets
- Learn some basic PowerBI
- Be able to produce some dynamic visuals from a multi-table dataset
- Difference between "Power Query with the M language" and "DAX"
- Advanced topics to impress:
- Being able to produce drill-through reports
- Learn some basic business knowledge
- Profit equations, churn, common KPIs
- The 5-whys and how to use it
- Just be smart like you are
- Use these skills in combination with some reasonable datasets from kaggle to produce portfolio pieces
TODO: Add specific explanations or resources
Getting There as Slow As Possible
If you want to go deeper, and truly develop not just the skillset, but a full understanding and intuition—really excel at this stuff—then these resources will get you there.
- Math basics
- How to Solve It by George Pólya
- pdf download
- This was recommended to me when I got my first data job.
- Honestly the key element imo is the first principle: "Do you understand the problem?"
- "Do you understand all the words used in stating the problem?"
- "Can you restate the problem in your own words"
- "Is there enough information to enable you to find a solution?"
- Asking yourself those questions during the information/requirements gathering phase of a project will save you endless future heart-ache.
- TODO: Sort through and find best materials for statistics
- Data science basics
- Understand common ways to lie with data (to avoid/use at your pleasure)
- TODO: Data cleaning (how to vet a dataset)
- Which visualizations to use when
- TODO: Presentation skills
- Some inspiration:
- SQL
- Getting good with SQL will change the way you think about certain problems (as it's a declarative language, whereas most people deal with and think similar to imperative languages)
- Very useful skill, also bridges well into development, very transferable overall
- Also will continue to be used forever. It's like learning C instead of Ruby, you'll just be set forever.
- SQL Murder Mystery
- Other SQL learning games:
- Download a SQL Tool
- DBeaver is a personal fave, download the open source community edition, create a local SQLite connection, start messing around
- Understand a little bit about different implementations:
- Python
- Python Data Science Handbook
- This book covers all the basics of modern python data work: IPython/Jupyter Notebook, NumPy, MatPlotLib
- Note: The machine learning section includes some useful basic modeling (that probably shouldn't be called machine learning)
- Requests
- Best library for working with HTTP API (when there isn't a specialized library)
- DB libraries:
- Excel
- R
- PowerBI
- TODO
- I hate it, but having some exposure is useful
- AI
Work Product Examples
- TODO: Add some examples of requirements gathering, follow-up questions/refinement of requirements, process docs, and final deliverables