Introduction To Kaggle
If you have ever trained a neural network with a considerably large dataset on your pc/laptop then you know how much pain it is. It takes a huge amount of time to train a model locally and on top of that, you might face heating issues on your system. Well, the time taken is justified as neural networks learn to map the input to outputs on a training dataset by finding a set of weights that prove to be good. This process is iterative and on every iteration, a small change(update) is made to the model weights to increase the prediction accuracy. So all in all training, a neural network model on a local system is not a great idea.
So, what's the solution?
The solution is quite simple. Rather than training our model locally, we can train our model on online GPU/TPU provided by many platforms. These platforms do all the heavy lifting and reduce the training time considerably. They eliminate the need for having a local setup as they provide us with almost everything we need to train our model. (viz. python env., datasets, GitHub support, etc.).
What are the various platforms?
So now we know that using online GPU can make our life a lot easier but what are the various platforms that we can use? Below is the list of the same:
Jupyter Notebook on GCP
There are many other platforms that we can use but these are the major ones. Out of these Google Colab and Kaggle are the two major players. Both google colab and Kaggle have similar features but I personally prefer Kaggle and below are the reasons why.
Kaggle has a great community and it hosts many competitions.
Kaggle has a lot of data sets that we can use directly.
Kaggle creates a nice history of our notebook commits that we can review later.
So now that we have seen the advantages of Kaggle let's dive deeper into Kaggle and learn how to actually use it.
Getting Started with Kaggle
First, visit this link https://www.kaggle.com/ and register yourself. After registering you will get the below interface.
Here we have many options like competing, dataset, communities, etc. You can explore these on your own. In this blog, we will focus on setting up the environment so that you are ready to work on your first machine learning project!!.
So, on the above page click on the code tab in the side navigation bar. This will take you to a page that looks like the one given below.
Here we can see we have two options, create a notebook and your work. We can see our work in your work section. For now, we need to create a notebook so click on "new notebook".
Now Kaggle will set up your notebook and will prompt "session started". The environment looks like the one given below:-
The environment is quite similar to Jupyter notebook so if you have earlier worked with Jupyter then working with Kaggle should be a cakewalk.
You can start writing code by adding a new code block. For this, you can click on the plus icon in the top left corner or you can click on "+ code" below an exiting block.
To run your code just click on the play button in the cell or if you want to run all the cells click on it to run all.
To make your notebook more readable you can also add markdown to add instructions and information.
Using GPU and TPU
For enabling GPU or TPU in your notebook you will need to verify your mobile number first.
So click on Settings on the right and click on "Requires phone verification" and verify your phone number.
Now in the menu on the right side, you can see the accelerator option. From that, we can select either GPU or TPU. You will have a GPU quota of 42 hrs. and TPU 30 hrs. a week, so, it's a wise thing to switch of the accelerator when not in use.
Adding a dataset
Now comes one of the best features of Kaggle i.e. dataset. Kaggle has a huge amount of dataset which we can use directly without downloading it. Just click on "add data" on the right side and below screen will appear:
Here, simply search the dataset you need and add it to your notebook. Now you can directly use the dataset. You can also click on a particular dataset and explore it. Many people have already used the dataset to make projects, you can leverage them to get a head start.
If you are ever stuck somewhere during your machine learning journey or need some help in code you can easily get help from the Kaggle Community. Kaggle has a great community where you can find many solutions to your problems given by other users. Also, it's a great place to discover content and engage in discussion around topics that you’re interested in.
Just go to the community section by clicking on the comment icon in the left navigation bar and explore.
If you are a beginner in the field of ML then Kaggle is the best place to learn. Along with a great community, it has many free courses to get you started.
Just go to the courses section and discover a range of courses from the basics of python to some advanced concepts.
Competitions are a great way to apply and learn new skills. We know about some big platforms like CodeChef, Hackerrank, SPOJ, CodeForces which are for competitive programming. In the same way, Kaggle is for competitive machine learning and data science. On Kaggle you can get many real problems posted by companies and gain experience. It always has many active competitions going on and on top of that, you can also win rewards. In Kaggle you can also get medals for your performance. To know more about the progression system visit: https://www.kaggle.com/progression
Just go to the competition section and participate in any competition you want.
So that's it, now you are ready to make your first machine learning project !!
In the upcoming blog, I will be making a small project on computer vision using Kaggle so stay tuned...