Soil Fertility Prediction Using Gradient Boosting Technique

Agriculture is one of the most important inventions of humankind. But due to urbanization and industrialization, the amount of cultivable land is reducing day by day. Hence, there is an absolute need to boost agricultural production more sustainably and simultaneously ensure that the techniques should not harm the environment.

Key takeaways from this blog

  1. Concrete understanding of Why do we need Machine Learning in the agricultural domain?
  2. What are the different areas in agriculture where machine learning applications can help?
  3. What are the different algorithms that are mostly used in this area?
  4. Complete implementation steps of gradient boosting algorithm to predict soil fertility.
  5. Which major companies are contributing in this area?
  6. Possible interview questions on this topic.

With the advancement in technologies in every area, modern technologies in the agricultural domain can become a boon for all of us. To ensure this, the concept of Smart Farming has been coined where farming is managed by modern technologies to increase the quantity and quality of the produce. The use of recent technologies like Machine Learning and Data Science in farming will increase the quantity and quality and simultaneously ease the process of farming by a huge. In farming, Machine learning can be used in

  • Species recognition and breeding
  • Soil and water condition management
  • Prediction of yield quantity and quality
  • Disease or weed detection
  • Livestock Management

how machine learning helps in agriculture

In this article, we will be trying to get hands-on on ML applications, which is Soil condition management.

Problem Background 

Machine Learning has become a tool used in almost every task that requires estimation. The non-technical sector, such as agriculture, has also benefited from these techniques. This applies to predicting soil fertility using specific soil properties that vary from region to region. Traditional methods such as crop rotation, incorporating cover crops, and adding organic matter to soil have definitely helped this sector yield more crops. Still, they do not utilize the available domain knowledge by 100%.

There are some critical, yet unanswered, questions like :

  • Are we doing enough? Are we utilizing our knowledge enough to do the best for the agricultural sector? 
  • Do we know to what limit will traditional methods such as crop rotation help? 

All these questions can be answered by predicting the fertility of the soil. Soil fertility prediction does not promise us more yield, but it does promise us the extent to which our incorporated methods have improved the soil quality. Soil fertility prediction will help to condense the difficulties faced by the farmers and act as a medium to bid the agriculturalists efficient evidence required to get better yield.

The task of predicting soil fertility is not new to Machine Learning. Experts and consultants have tried out several possibilities to use different machine learning methods to predict it. Classification algorithms have proven to provide sufficient accuracy to deal with such a problem. ML algorithms such as k-NN, DTs, SVM, Random Forests have been used for different case studies on soil fertility. 
In this article, we will be using Gradient Boosting. We will tell you why we are using gradient boosting.

Gradient Boosting Classifier for Soil Fertility Prediction

Gradient Boosting falls under the category of Ensemble Methods. Ensemble methods incorporate a team of classifiers and vote them while testing their performance. These methods usually reduce the variance of the classifier. The main advantage of ensembling is that it is unlikely for all the classifiers to make the same error. In fact, as long as every error is made by a minority of the classifiers, proper classification can be achieved.

Gradient Boosting connection-tree

Gradient Boosting Connection-Tree

In Gradient Boosting, each predictor tries to improve on its ancestor by reducing the errors. Instead of fitting a model on the data to each iteration, Gradient Boosting essentially fits a new model to the previous ancestor model's residual errors. For every instance, while training, it estimates the residuals for that instance in terms of the log of odds. Once done, it builds a new Decision Tree that tries to predict the residuals estimated by the previous estimator. This is the main difference between Gradient Boosting Methods (GBM) and Random Forests (RF). 

Single decision tree vs random forest vs gradient boosting

Source: Rosaria Silipo

GBMs build an ensemble of shallow and weak sequential trees with each tree learning and improving on the prior, while random forests only ensembles independent trees. Using empirical results, it can be argued that the accuracy by Gradient Boosting can be hard to beat, although this approach is computationally expensive.

Let’s quickly move towards the core part of this article, where we will be guiding you to make your ML model that can predict the fertility of the soil.

Steps to implement the solution :

Step 1: Data Collection

Various datasets can be found on the internet very easily. Government sectors also release their dataset in the public domain. State-wise, data of India can be downloaded from the farmer’s portal.
Based on the above portal, a sample dataset can be used from here. We will be using this dataset in this article.

Step 2: Dataset explanation

The dataset contains sixteen different attributes. An explanation of every attribute is given in the table below.

Features list

Attribute Explanation

snippet of data

Sampled instances of the dataset

A snippet of the data can be seen by printing the dataframe sample.

Step 3: Data Pre-processing

With the seaborn library's help, we can plot the correlation between the dataset attributes. If we observe the below plot, attributes OC and OM are highly correlated, so in the final set of features used for training the model, we can select any one of them.

Seaborn relation

We can have these input features in the final set: pH, EC, OC, N, P, K, Zn, Fe, Cu, Mn, Sand, Silt, Clay, CaCO3, CEC. At the same time, the output feature vector is the decision vector formed from the last column. All these features have different ranges, hence using the MinMaxScaler() of sklearn’s preprocessing tool, all the attributes can be scaled in the range of [0,1] scale range. Alternatively, MaxAbsScaler() can be used to scale in the [-1,1] range. Using labelEncoder(), decision column (last column) of the dataset can be converted into numerical value (0 & 1). This dataset can be split into two sets, \

  1. Train Data 2. Validation Data. A validation set will be used to tune the hyperparameters.

Step 4: Model Formation

Setting up the algorithm 

  • From the module sklearn.ensemble import the algorithm GradientBoostingClassifier()to build a model. Certain hyperparameters need to be tuned efficiently. 
  • Take the learning rate to be higher than usual (say 0.2-0.25 instead of 0.1).
  • Chose a maximum depth (play around with values between 5–10), similarly play with the regularization parameter (alpha: 1e-3 to 1e-2 in steps of 1e-1)
model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0,
...     max_depth=1, random_state=0).fit(X_train, y_train)

training & fitting Algorithm with training Dataset can be fitted on the developed model using the, output)

Step 5: Evaluation of developed model

Model ScoreThis gives the mean accuracy on the given test data and labels. For the above model, model.score(testX, testy) will give you the model's average score.

Confusion MatrixThe confusion matrix can be imported from the metrics module of the sklearn library. The test set can be used to compare the predicted output and the ground truth.

Companies Case Studies

The use of Machine Learning technology in the area of agriculture can revolutionize the economy. With the consistently decreasing agricultural area, the prime need that yield should be maximized with the limited land. Big corporations such as Mineral, Cool Planet, and AgSolver are working actively on improving soil health by considering various factors and developing soil test kits and strategies to deploy them. 

Google’s Mineral

Alphabet Agriculture

Source: Alphabet

Alphabet’s X lab, a former Google division that launched the Waymo self-driving car unit and other ambitious projects, has taken the wraps off its latest “moonshot”: a computational agriculture project calling Mineral.

According to the Mineral’s lead, Elliott Grant, “Mineral project is focused on sustainable food production and farming with the help of advanced technologies of artificial intelligence, robotics, machine learning and simulations at an immense scale.”

Cool Planet

Coolplanet pic

Source: Coolplanet

Cool Planet is an agricultural technology company and mainly focused on soil health solutions. It uses advanced technologies for predicting the fertility of the soil. It has acquired $20.3 million of Series A funding. The company’s two largest existing investors, Agustín Coppel and North Bridge Venture Partners, led the investment.

Possible Interview Questions

  1. Is soil fertility prediction a multi-class classification problem? If No, can we convert it to the same?
  2. Which algorithms did you try along with boosting algorithm? On what basis you selected this algorithm?
  3. How boosting algorithms is connected with random forest?
  4. Do you think your results are best? What improvements you can suggest?
  5. Explain the boosting algorithm.


Machine Learning technology helps farmers make the farming process easy and ensures the yield's better quality and quantity. Famers mainly rely upon ML to automatically increase the quality and quantity of yield, automatically detecting the plant's diseases, automatically detecting the presence of weeds, and livestock management. Many big corporations are trying to improve the yield with satellite data and ground measured data. Soil fertility prediction is one of the techniques that identifies that any land is suitable for a particular crop or not. In this article, we have formed our own soil fertility prediction model that can sense soil fertility based on attribute measurements. Based on this prediction, farmers can decide whether they should choose land for agricultural purposes or not. I hope you have enjoyed this use case.

Enjoy Learning! Enjoy Thinking!

Our Weekly Newsletter

Subscribe to get well-designed content on data structures and algorithms, machine learning, system design, oops, and mathematics. enjoy learning!

We Welcome Doubts and Feedback!

More Content From EnjoyAlgorithms