Agriculture is one of the most important inventions of humankind. But due to urbanization and industrialization, the amount of cultivable land is reducing day by day. Hence, there is an absolute need to boost agricultural production more sustainably and simultaneously ensure that the techniques should not harm the environment.
With the advancement in technologies in every area, modern technologies in the agricultural domain can become a boon for all of us. To ensure this, the concept of Smart Farming has been coined where farming is managed by modern technologies to increase the quantity and quality of the produce. The use of recent technologies like Machine Learning and Data Science in farming will increase the quantity and quality and simultaneously ease the process of farming by a huge. In farming, Machine learning can be used in
In this article, we will be trying to get hands-on on ML applications, which is Soil condition management.
Machine Learning has become a tool used in almost every task that requires estimation. The non-technical sector, such as agriculture, has also benefited from these techniques. This applies to predicting soil fertility using specific soil properties that vary from region to region. Traditional methods such as crop rotation, incorporating cover crops, and adding organic matter to soil have definitely helped this sector yield more crops. Still, they do not utilize the available domain knowledge by 100%.
There are some critical, yet unanswered, questions like :
All these questions can be answered by predicting the fertility of the soil. Soil fertility prediction does not promise us more yield, but it does promise us the extent to which our incorporated methods have improved the soil quality. Soil fertility prediction will help to condense the difficulties faced by the farmers and act as a medium to bid the agriculturalists efficient evidence required to get better yield.
The task of predicting soil fertility is not new to Machine Learning. Experts and consultants have tried out several possibilities to use different machine learning methods to predict it. Classification algorithms have proven to provide sufficient accuracy to deal with such a problem. ML algorithms such as k-NN, DTs, SVM, Random Forests have been used for different case studies on soil fertility.
In this article, we will be using Gradient Boosting. We will tell you why we are using gradient boosting.
Gradient Boosting falls under the category of Ensemble Methods. Ensemble methods incorporate a team of classifiers and vote them while testing their performance. These methods usually reduce the variance of the classifier. The main advantage of ensembling is that it is unlikely for all the classifiers to make the same error. In fact, as long as every error is made by a minority of the classifiers, proper classification can be achieved.
Gradient Boosting Connection-Tree
In Gradient Boosting, each predictor tries to improve on its ancestor by reducing the errors. Instead of fitting a model on the data to each iteration, Gradient Boosting essentially fits a new model to the previous ancestor model's residual errors. For every instance, while training, it estimates the residuals for that instance in terms of the log of odds. Once done, it builds a new Decision Tree that tries to predict the residuals estimated by the previous estimator. This is the main difference between Gradient Boosting Methods (GBM) and Random Forests (RF).
Source: Rosaria Silipo
GBMs build an ensemble of shallow and weak sequential trees with each tree learning and improving on the prior, while random forests only ensembles independent trees. Using empirical results, it can be argued that the accuracy by Gradient Boosting can be hard to beat, although this approach is computationally expensive.
Let’s quickly move towards the core part of this article, where we will be guiding you to make your ML model that can predict the fertility of the soil.
Various datasets can be found on the internet very easily. Government sectors also release their dataset in the public domain. State-wise, data of India can be downloaded from the farmer’s portal.
Based on the above portal, a sample dataset can be used from here. We will be using this dataset in this article.
The dataset contains sixteen different attributes. An explanation of every attribute is given in the table below.
Sampled instances of the dataset
A snippet of the data can be seen by printing the dataframe sample.
With the seaborn library's help, we can plot the correlation between the dataset attributes. If we observe the below plot, attributes OC and OM are highly correlated, so in the final set of features used for training the model, we can select any one of them.
We can have these input features in the final set: pH, EC, OC, N, P, K, Zn, Fe, Cu, Mn, Sand, Silt, Clay, CaCO3, CEC. At the same time, the output feature vector is the decision vector formed from the last column. All these features have different ranges, hence using the MinMaxScaler() of sklearn’s preprocessing tool, all the attributes can be scaled in the range of [0,1] scale range. Alternatively, MaxAbsScaler() can be used to scale in the [-1,1] range. Using labelEncoder(), decision column (last column) of the dataset can be converted into numerical value (0 & 1). This dataset can be split into two sets, \
Setting up the algorithm
model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, ... max_depth=1, random_state=0).fit(X_train, y_train)
training & fitting Algorithm with training Dataset can be fitted on the developed model using the model.fit(train_feature, output)
Model ScoreThis gives the mean accuracy on the given test data and labels. For the above model, model.score(testX, testy) will give you the model's average score.
Confusion MatrixThe confusion matrix can be imported from the metrics module of the sklearn library. The test set can be used to compare the predicted output and the ground truth.
The use of Machine Learning technology in the area of agriculture can revolutionize the economy. With the consistently decreasing agricultural area, the prime need that yield should be maximized with the limited land. Big corporations such as Mineral, Cool Planet, and AgSolver are working actively on improving soil health by considering various factors and developing soil test kits and strategies to deploy them.
Alphabet’s X lab, a former Google division that launched the Waymo self-driving car unit and other ambitious projects, has taken the wraps off its latest “moonshot”: a computational agriculture project calling Mineral.
According to the Mineral’s lead, Elliott Grant, “Mineral project is focused on sustainable food production and farming with the help of advanced technologies of artificial intelligence, robotics, machine learning and simulations at an immense scale.”
Cool Planet is an agricultural technology company and mainly focused on soil health solutions. It uses advanced technologies for predicting the fertility of the soil. It has acquired $20.3 million of Series A funding. The company’s two largest existing investors, Agustín Coppel and North Bridge Venture Partners, led the investment.
Machine Learning technology helps farmers make the farming process easy and ensures the yield's better quality and quantity. Famers mainly rely upon ML to automatically increase the quality and quantity of yield, automatically detecting the plant's diseases, automatically detecting the presence of weeds, and livestock management. Many big corporations are trying to improve the yield with satellite data and ground measured data. Soil fertility prediction is one of the techniques that identifies that any land is suitable for a particular crop or not. In this article, we have formed our own soil fertility prediction model that can sense soil fertility based on attribute measurements. Based on this prediction, farmers can decide whether they should choose land for agricultural purposes or not. I hope you have enjoyed this use case.
Get well-designed application and interview centirc content on ds-algorithms, machine learning, system design and oops. Content will be delivered weekly.