Python Basics for Data Science and Machine Learning: Part 1

In the initial stage of learning Data Science and Machine Learning, the first question that comes to our mind is, Which programming language should I know as a Data Scientist? or What's the best programming language for machine learning? The Internet is flooded with these questions. But frankly, there is nothing like the "best programming language" in any computer science field. It all depends upon the task that we are targeting. But considering the tasks in the Machine Learning and Data Science domain, Python is the most preferred programming language. But have we ever thought why?

The reason is that a larger community in these domains prefers Python over all other languages. We can easily find the solutions to various road blockers while developing machine learning solutions.

Key takeaways from this blog 

After going through this blog, we will be able to understand the following things:

  1. What is Python?
  2. What are the various data types in Python?
  3. What are expressions and variables in Python?
  4. What is a String, and what operations can we perform on strings?

Let's start our journey towards knowing the fundamental concepts of Python.

What is Python?

Python is an object-oriented high-level computer programming language that works across multiple platforms like Windows, Linux, Mac, or even on raspberry-pi devices. This language was developed in the 1980s and first released in 1991. It can be used for various tasks, mainly

  • Artificial Intelligence development
  • Software development
  • Web development
  • System scripting

Our article will mainly focus on the basics required for developing machine learning and data science applications.

What are data types in Python?

In Python, data types are classes, and every value belongs to any particular data type. In this language, everything is an object. Objects of the data type classes are known as variables. Some popular data types in Python are:

Python Numbers

There are three forms of data types in the category of python numbers, Integers, floating numbers, and complex numbers. In Python 2.x, there is another data type named "long" to store longer integer values, but this has been removed from Python 3.x.

Integers

The first data type is integer and is represented using "int". In Python 3.x, we don't have any upper limit on the values of integers. It depends upon our system's memory. The more the system's memory, the higher the integer value can go. We can use inbuilt Python functions "type()" or "isinstance()" to know about the data type of any value or variable. For example:

>>> a = 10
>>> type(a)
<class 'int'>

>>> type(10)
<class 'int'>

>>> isinstance(10, int)
True

>>> isinstance(7.11, int)
False

The isinstance function is used to check whether any value or variable is an integer or not and accordingly throws output as True or False.

Floating Numbers

We represent floating-point numbers as "float" and the difference between float and int is that float can take values between two integers. Decimal points are used as a characteristic to identify float values. Scientific notations can also be used if the number of digits appearing after the decimal is very high. The character "e" or "E" followed by any positive or negative integer is used to specify the scientific notation.

>>> a = 10.0
>>> type(a)
<class 'float'>

>>> b = 7.11
>>> type(b)
<class 'float'>

>>> isinstance(a, float)
True

>>> isinstance(a, int)
False

### Scientific Notation example
>>> 7.11e11
711000000000.0

The maximum value for any floating-point number can be approximately 1.8 x 10³⁰⁸. Python treats numbers beyond that as infinity.

>>> 1.79e308
1.79e+308

>>> 1.8e308
inf

Complex Numbers

Complex numbers are represented using their real component and the imaginary component along with the letter "j".

>>> a = 7 + 11j
>>> type(a)
<class 'complex'>

Strings

In Python, we represent a sequence of characters as strings, denoted by "str". Boundaries of any string data type are defined by either a single quote or a double quote.

>>> a = 'Single quote string'
>>> b = "Double quote string"

>>> type(a)
<class 'str'>

>>> type(b)
<class 'str'>

Depending on our system's memory, we can store as many characters in strings. It can be empty as well. To represent an empty string,

>>> ''
''

But what if we have a single quote present as a character? For example: 'We represent a single quote using 'as a character'. 

'We represent a single quote using ' as a character'
SyntaxError: invalid syntax

As shown, it will produce a syntax error as the opening single quote got paired with a closing single quote (present before the "as" word), and the characters beyond that do not have any opening single quote. To avoid these errors, we have two fixes for that,

  • If the single quote is present as the characters, use double quotes to define the boundaries.
>>> "We represent a single quote using ' as a character"
"We represent a single quote using ' as a character"
  • Use "Escape sequences" in strings.

Placing a backslash in front of the quote character makes Python treat it as a normal character and forget its special meaning. There are several other examples as well, where the escape sequence changes the behavior of the normal/special characters in the strings, like

>>> print("anb")
anb

### Placing backslash before n, makes it a newline character
>>> print("a\nb")
a
b

### Placing backslash before t, makes it a tab character
>>> print("a\tb")
a b

### Placing backslash before backslash removes the special meaning ### of backslash
>>> print("a\\nb")
a\nb

Boolean

In Python 3, we have a boolean data type that can take either of the two values, True (with capital T) or False (with capital F). We can check the type of the variable as

>>> type(True)
<class 'bool'>

>>> type(False)
<class 'bool'>

This data type is used to check the truth of any statement. In Python, we use single "=" to assign value to the variable and double "==" to check the statement's validity.

>>> a = 5
>>> a == 5
True

Converting Data Types

In Python, we have the flexibility to change the datatype of values or variables, but only when the conversion is valid. For example, 2.0 is a floating-point, and we can convert it into an integer like this,

## Float to int conversion
>>> a = int(2.0)
>>> type(a)
<class 'int'>

## Int to float conversion
>>> a = float(2)
>>> type(a)
<class 'float'>

## Float to int conversion example
>>> a = int(7.11)
>>> a
7

If we note the second example of float to int conversion where we used int(7.11), it just applied the greatest integer function on the float value. The answer would be the same even when the value is 7.9. These conversions are also possible in the case of strings but only when strings have numbers present in them. Let's see some examples,

>>> a = '2022'
>>> type(a)
<class 'str'>

>>> int(a)
2022

>>> float(a)
2022.0

But, when the strings would not be numbers, then it will produce the ValueError like this,

>>> a = '1 1'

>>> int(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1 1'

That means this type of conversion is not allowed. In short, we can say that all float or int data types can be converted to a string, but all string data types can not be converted into int/float data types.

Now, as we know the basic data types present in Python, let's understand the concept of expression and variables:

What are Expression and Variables in Python?

Expressions

Expressions are the operations that we want our computers to perform. For example, basic arithmetic operations like

# Numbers are operands and mathematical symbols are operators
>>> 5 + 4.99
9.99

>>> 12*7
84

## Division of integer data types results in a float value
>>> 12/6
2.0

>>> 10 - 5
5

Python follows the mathematical conventions to perform the mathematical operations, like 

>>> 2*3+7
13

>>> 2+3*7
23

Similarly, expressions under the parenthesis will be operated first.

Variables

The variable is a kind of bucket in which we can store a data type. For example, in the below code snippet, we are treating "temp_variable" as a variable to store the value 6. Now this variable can be used somewhere else in the code, carrying the value of 6 with it. 

temp_variable = 6

The usefulness of the variables can be thought of as if we want to change a value from 6 to 7; then, if we have everywhere written 6, we will have to change it everywhere in the code. But if we have used a variable with a value of 6 and then used that variable everywhere else, we could change the value of that variable once, and it will reflect everywhere else.

Note: It is always a good practice to use the sensible names for the variable because we will see a larger code-base in Machine Learning and Data Science domains. Hence tracking a variable and its usability would be much easier if the names were meaningful.

Conclusion

In this very introductory article on Python, we learned about the concepts of data types. We learned Python's primary data types: numbers, boolean, and strings. In the next part, we will discuss Python's different data structures like lists, arrays, and tuples. So stay tuned and enjoy learning.

Share on social media:

More blogs to explore

Our weekly newsletter

Subscribe to get weekly content on data structure and algorithms, machine learning, system design and oops.

© 2022 Code Algorithms Pvt. Ltd.

All rights reserved.