Kaggle is where good Data Scientists showcase their skills.
Data can be considered as a new vital business asset as operationally, and it becomes easier to manage vast amounts of it. While the operational aspect of data management and data science is more comfortable due to off the shelf cloud computing and machine learning algorithms, the availability of great Data Scientists remains a challenge.
The chart above is from a Reddit post showing the shortage of Data Scientists at the world?s top 2,500 corporations. Faced with this competition, you should ask yourself: Do I stand a chance of hiring these prospective data scientists if I take the usual HR approaches?
Why are Data Scientists needed?
Data are abundant today. Each visit by a prospect to a store- physical or virtual leaves a trail of data elements that companies can easily access. A 2018 article in Forbes estimated that 90% of the data in the world was generated in the preceding two years- 2016 to 2018.
To find valuable trends and insights, you need experts- ?data scientists? to handle data with care. This kind of expertise is hard to find, keep, and grow. So, how are some corporations staying ahead of these issues? Many corporations use platforms like Kaggle.
So what is Kaggle?
It is the most significant data science community in the world that was born in 2010 and bought by Google in 2017.
In this platform, companies post anonymized data sets asking the Community of data scientists and machine learning practitioners, from over the world, to compete and proposing best-performing algorithms/analysis in a contest.
In this way, companies can leverage and discover the best scientists and researchers available to bring their insights. Work on Kaggle has also given birth to new software like Keras (an open-source neural network library written in Python), Xgboost (an open-source software library that provides a gradient boosting framework for multiple programming languages) and LightGBM (a gradient expanding framework that uses tree-based learning algorithms). In all three cases, the initial genesis of these came from work done on Kaggle.
How does Kaggle work?
Kaggle is organized into four main sections.
The main focus area of Kaggle is, of course, the contests space where companies make anonymized real-world datasets available on the platform. The anonymized data sets split into two - training sets and test sets. Algorithms learn on the training set. The public leaderboard feeds on the success of the test set of data.
We can think of ?notebooks? as code editors. Using ?notebooks?, data scientists can share and run machine learning code written in R, Python, Julia, or SQLite.
Not only do notebooks serve as a comment field for Data Scientists to share comments about their approach, but they also host the settings of the cloud computational environment that enables reproducible and collaborative data science work. This helps organizations decide on the physical computing resources they may need to apply an approach that a data scientist has outlined, at scale.
The ?Discussion? space splits into various fields, where data scientists can exchange opinions on every topic of interest about the Community.
Kaggle also has a Datasets area, which allows data scientists to collaborate and build a data set collection in Kaggle?s public data platform to share with other enthusiasts.
Participants that are regular contributors here are the kind of Data Scientists that corporations would benefit from where cross-functional teams rely on or collaborate to move forward.
E. Data Science Education
Another critical section is ?Faster Data Science Education?. This is where Data Science enthusiasts can practice or improve data skills learned in free micro-courses.
So is mining Kaggle for potential hires worth it?
Undoubtedly, it is. Kaggle is a platform suitable for every data scientist at all levels. The key is not to focus on those who win various contests.
A caveat is that people involved in Kaggle are more focused on machine learning activities, including part of data cleaning, explorative data analysis, pre-processing as well as modeling, compared with the definition of the problem and the collection of data.
These are critical tasks given that most data exist in different raw formats and comes from different environments. When data sets are provided, they gather data mostly formatted, reducing the data cleaning activity before applying Machine Learning algorithms, so one of the hard parts of what a Data Scientist does has already been accounted for.
So how do companies reach out to interested job applicants on Kaggle?
Great unicorns like Airbnb and established brick and mortar corporations like Walmart have opened samples of their data to the Data Science community. Along with bragging rights, the Kaggle contests also come with monetary prizes. One can view a current list of the contests at https://www.kaggle.com/competitions.
Once a competition is over, a company can ask the entrants for their resumes. Candidates can also mention their Kaggle profiles as a critical component of their resumes when applying for a job. If a company or an individual has a profile on Kaggle, they can view the rankings board at https://www.kaggle.com/rankings and reach out to someone from within the Kaggle platform.
Happy Data Science sourcing at Kaggle!