
Every course in data science ends with a capstone project which requires learners to apply various concepts covered along the way. This project is also a great addition to their resumes. Obviously, from both learning as well as career prospects angles this is an important step. This post covers three aspects of capstone projects:
- Selecting the topic
- Getting dataset
- Presenting the capstone project in your resume
The topic
Often, capstone projects are executed as teams of learners. If your team was formed on the basis of similar background and career interests then topic selection is simpler. Data Science programs attract learners from diverse backgrounds. In such cases, a mainstream corporate topics can be selected. If you have special interest in topics like movies, songs, video games, anime or sports and wish to pursue a career in a related field then these are great topics. For someone interested in cricket analytics as a career working on IPL dataset is a great way of learning as well as projecting aspirations. However, for others a more mainstream corporate dataset is a better choice. Your ability to select features, fit models as well as communicate the findings are crucial. Therefore, select topics that you already understand or can easily learn about. Finally, select a topic that everyone in the team will be comfortable carrying in their resumes. Selecting a topic related to retail sales is awesome if you are keen on a career in this industry but does not harm your prospects of landing a job with healthcare or financial sector. They will understand.
Dataset
Before long, the topic selection translates to crossing the data challenge. After all, without dataset there is no project. There are several popular sources of such datasets like Kaggle, Data Science Dojo, Data World, University sites, Open Gov sites etc. Selecting a dataset from these sources is the easiest way forward. Often there are notebooks available for the datasets and specific sets of algorithms have been demonstrated. Potential employers also know that!! And it can dilute the value from the project as part of one’s resume. See the last section of this post to address this challenge.
There is a more exciting way of getting the dataset – create it. Creating your dataset is cool. The usual ways of accomplishing this are web scrapping, surveys or synthetically generate it. Apart from the higher effort, this also poses the same challenges that real life projects face – the uncertainty of outcomes. The popular datasets online have one thing in common. The models fitted on them perform well. The advantage of getting your own dataset is it allows flexibility in topic selection, certainly makes a great impression in a recruitment interview and you also get to learn the issues in gathering data for such projects.
There are two types of datasets – time series and cross-sectional. The first one will include time and / or date features. Beginners who have not learnt time series methods should avoid these datasets. Daily sales data from a retail chain, airline ticket pricing are examples of such datasets.
Presenting the capstone project in your resume
Include a brief overview of the project covering the topic, what was interesting in it, source of the dataset, challenges faced & learnings gained from the project. After all, capstone projects are just meant to help you learn through application. If you selected a dataset from popular sources like Kaggle then make sure to survey solutions posted by others, and mention if / how you have approached it differently. What are the aspects that one can highlight about their project:
- Business problem addressed
- Algorithms used, models generated and interpretations
- Issues faced in preparing data and other challenges like performance
- Learning from the project
Wish you all the best in your project and career ahead!! Leave a comment or even better subscribe to stay in touch with posts from playfully Serious.