In the initial stage of learning Data Science and Machine Learning, the first question that comes to our mind is, Which programming language should I know as a Data Scientist? or What's the best programming language for machine learning? The Internet is flooded with these questions. But frankly, there is nothing like the "best programming language" in any computer science field. It all depends upon the task that we are targeting. But considering the tasks in the Machine Learning and Data Science domain, Python is the most preferred programming language. But have we ever thought why?
The reason is that a larger community in these domains prefers Python over all other languages. We can easily find the solutions to various road blockers while developing machine learning solutions.
After going through this blog, we will be able to understand the following things:
Let's start our journey towards knowing the fundamental concepts of Python.
Python is an object-oriented high-level computer programming language that works across multiple platforms like Windows, Linux, Mac, or even on raspberry-pi devices. This language was developed in the 1980s and first released in 1991. It can be used for various tasks, mainly
Our article will mainly focus on the basics required for developing machine learning and data science applications.
In Python, data types are classes, and every value belongs to any particular data type. In this language, everything is an object. Objects of the data type classes are known as variables. Some popular data types in Python are:
There are three forms of data types in the category of python numbers, Integers, floating numbers, and complex numbers. In Python 2.x, there is another data type named "long" to store longer integer values, but this has been removed from Python 3.x.
The first data type is integer and is represented using "int". In Python 3.x, we don't have any upper limit on the values of integers. It depends upon our system's memory. The more the system's memory, the higher the integer value can go. We can use inbuilt Python functions "type()" or "isinstance()" to know about the data type of any value or variable. For example:
>>> a = 10
>>> type(a)
<class 'int'>
>>> type(10)
<class 'int'>
>>> isinstance(10, int)
True
>>> isinstance(7.11, int)
False
The isinstance function is used to check whether any value or variable is an integer or not and accordingly throws output as True or False.
We represent floating-point numbers as "float" and the difference between float and int is that float can take values between two integers. Decimal points are used as a characteristic to identify float values. Scientific notations can also be used if the number of digits appearing after the decimal is very high. The character "e" or "E" followed by any positive or negative integer is used to specify the scientific notation.
>>> a = 10.0
>>> type(a)
<class 'float'>
>>> b = 7.11
>>> type(b)
<class 'float'>
>>> isinstance(a, float)
True
>>> isinstance(a, int)
False
### Scientific Notation example
>>> 7.11e11
711000000000.0
The maximum value for any floating-point number can be approximately 1.8 x 10³⁰⁸. Python treats numbers beyond that as infinity.
>>> 1.79e308
1.79e+308
>>> 1.8e308
inf
Complex numbers are represented using their real component and the imaginary component along with the letter "j".
>>> a = 7 + 11j
>>> type(a)
<class 'complex'>
In Python, we represent a sequence of characters as strings, denoted by "str". Boundaries of any string data type are defined by either a single quote or a double quote.
>>> a = 'Single quote string'
>>> b = "Double quote string"
>>> type(a)
<class 'str'>
>>> type(b)
<class 'str'>
Depending on our system's memory, we can store as many characters in strings. It can be empty as well. To represent an empty string,
>>> ''
''
But what if we have a single quote present as a character? For example: 'We represent a single quote using 'as a character'.
'We represent a single quote using ' as a character'
SyntaxError: invalid syntax
As shown, it will produce a syntax error as the opening single quote got paired with a closing single quote (present before the "as" word), and the characters beyond that do not have any opening single quote. To avoid these errors, we have two fixes for that,
>>> "We represent a single quote using ' as a character"
"We represent a single quote using ' as a character"
Placing a backslash in front of the quote character makes Python treat it as a normal character and forget its special meaning. There are several other examples as well, where the escape sequence changes the behavior of the normal/special characters in the strings, like
>>> print("anb")
anb
### Placing backslash before n, makes it a newline character
>>> print("a\nb")
a
b
### Placing backslash before t, makes it a tab character
>>> print("a\tb")
a b
### Placing backslash before backslash removes the special meaning ### of backslash
>>> print("a\\nb")
a\nb
In Python 3, we have a boolean data type that can take either of the two values, True (with capital T) or False (with capital F). We can check the type of the variable as
>>> type(True)
<class 'bool'>
>>> type(False)
<class 'bool'>
This data type is used to check the truth of any statement. In Python, we use single "=" to assign value to the variable and double "==" to check the statement's validity.
>>> a = 5
>>> a == 5
True
In Python, we have the flexibility to change the datatype of values or variables, but only when the conversion is valid. For example, 2.0 is a floating-point, and we can convert it into an integer like this,
## Float to int conversion
>>> a = int(2.0)
>>> type(a)
<class 'int'>
## Int to float conversion
>>> a = float(2)
>>> type(a)
<class 'float'>
## Float to int conversion example
>>> a = int(7.11)
>>> a
7
If we note the second example of float to int conversion where we used int(7.11), it just applied the greatest integer function on the float value. The answer would be the same even when the value is 7.9. These conversions are also possible in the case of strings but only when strings have numbers present in them. Let's see some examples,
>>> a = '2022'
>>> type(a)
<class 'str'>
>>> int(a)
2022
>>> float(a)
2022.0
But, when the strings would not be numbers, then it will produce the ValueError like this,
>>> a = '1 1'
>>> int(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1 1'
That means this type of conversion is not allowed. In short, we can say that all float or int data types can be converted to a string, but all string data types can not be converted into int/float data types.
Now, as we know the basic data types present in Python, let's understand the concept of expression and variables:
Expressions are the operations that we want our computers to perform. For example, basic arithmetic operations like
# Numbers are operands and mathematical symbols are operators
>>> 5 + 4.99
9.99
>>> 12*7
84
## Division of integer data types results in a float value
>>> 12/6
2.0
>>> 10 - 5
5
Python follows the mathematical conventions to perform the mathematical operations, like
>>> 2*3+7
13
>>> 2+3*7
23
Similarly, expressions under the parenthesis will be operated first.
The variable is a kind of bucket in which we can store a data type. For example, in the below code snippet, we are treating "temp_variable" as a variable to store the value 6. Now this variable can be used somewhere else in the code, carrying the value of 6 with it.
temp_variable = 6
The usefulness of the variables can be thought of as if we want to change a value from 6 to 7; then, if we have everywhere written 6, we will have to change it everywhere in the code. But if we have used a variable with a value of 6 and then used that variable everywhere else, we could change the value of that variable once, and it will reflect everywhere else.
Note: It is always a good practice to use the sensible names for the variable because we will see a larger code-base in Machine Learning and Data Science domains. Hence tracking a variable and its usability would be much easier if the names were meaningful.
In this very introductory article on Python, we learned about the concepts of data types. We learned Python's primary data types: numbers, boolean, and strings. In the next part, we will discuss Python's different data structures like lists, arrays, and tuples. So stay tuned and enjoy learning.
Next Blog: Lists and Tuples in Python
Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.