Data Types and Variables in Python Programming

What are the data types in Python?

The concept of data types is about the various classes or categories of data in any programming language. In Python, every data value also belongs to one particular data type. For example, '1' belongs to the 'int' data type, '1.5' belongs to the 'float' data type, and so on.

In Python, everything is an object, and objects of any data type are known as variables. So '1' is an instance of the 'int' class, and '1.5' is an instance of the 'float' class. Note: We will learn about variables later.

Python Numbers

In the category of numbers in Python, there are three forms of data types: integers, floating numbers, and complex numbers. Earlier versions of Python (Python 2.x) had an additional data type called "long" for storing very large integer values (~10¹⁰), but this has been removed in Python 3.x and is considered integer only.

Integers

The first data type is the integer, represented by the "int" keyword. In Python 3.x, there is no upper limit on the values of integers, as it depends on the system's memory. The larger the system's memory, the higher the integer value can be. For example, if one system has 10 kb of space left and another system has 100 kb of space, the second system will be able to hold a higher value of the integer.

We can use the built-in Python functions type() or isinstance() to determine the data type of any value or variable. Here isinstance function is used to check whether any value or variable belongs to the mentioned data type or not and accordingly throws output as True or False. For example:

>>> a = 10
>>> type(a)
<class 'int'>

>>> type(10)
<class 'int'>

>>> isinstance(10, int)
True

>>> isinstance(7.11, int)
False

In Python, integer objects are immutable. This means that once an integer object is created, its value cannot be changed. Instead, a new integer object is created whenever a mathematical operation is performed on an integer. Here's an example code to demonstrate this:

# create an integer object
x = 5

# print the id of the object
print(id(x))  
# output: memory address of 'a'

# perform a mathematical operation on the object
x = x + 1

# print the id of the object again
print(id(x))  
# output: a different memory address

# create another integer object with the same value
y = 5

# print the id of the second object
print(id(y))  
# output: the same memory address as the first object

Here are some other critical insights about Python integer

Python allows for easy conversion between integers and other data types such as float, string, and boolean. This flexibility helps us in data processing and manipulation.
Python integers support bitwise operations such as AND, OR, XOR, and NOT. These operations are useful in machine learning applications such as image processing and computer vision.

Some common use cases of Integers in machine learning

Performing integer multiplication is computationally cheaper than performing multiplication on decimal values. So, to reduce computations, algorithms sometimes need integer-based operations.
Integer data type is often used for indexing in machine learning code. For example, when accessing elements in an array or matrix, integer indices are used to specify the row and column positions.
In many machine learning algorithms, it is necessary to count the occurrences of certain events or values. An integer data type is used for this purpose. Similarly, we can use integer data type to specify the number of iterations or loops.
Integer data type is used to specify integer hyperparameters such as the number of layers in a neural network, the number of decision trees in a random forest, or the number of clusters in a k-means algorithm.

Floating Numbers

Floating-point numbers are represented by the term "float". The key difference between "float" and "int" is that float can hold values that fall between two integers like 2.7. The presence of decimal points distinguishes this data type. A number with many digits after the decimal can be expressed using scientific notation. This notation uses the letter "e" or "E" followed by a positive or negative integer to indicate the magnitude of the number.

>>> a = 10.0
>>> type(a)
<class 'float'>

>>> b = 7.11
>>> type(b)
<class 'float'>

>>> isinstance(a, float)
True

>>> isinstance(a, int)
False

### Scientific Notation example
>>> 7.11e11
711000000000.0

The maximum value for any floating-point number can be approximately 1.8 x 10³⁰⁸. Python treats numbers beyond this maximum as infinity.

>>> 1.79e308
1.79e+308

>>> 1.8e308
inf

Here are some critical insights about Python floating-point numbers

Floating-point numbers have limited precision in Python, which means that they can only represent a certain range of decimal values accurately. This can lead to rounding errors and inaccuracies in mathematical operations.
Python allows for easy conversion between floating-point numbers and other data types such as integers and strings.
Python floating-point numbers can also represent special values such as infinity and NaN (not a number). These values can arise in certain mathematical operations and need to be handled properly to avoid errors.
Python supports a wide range of arithmetic operations for floating-point numbers, including addition, subtraction, multiplication, division, and more. But we must take care to handle rounding errors and other issues that can arise.

Some common use cases float data type in machine learning

Whenever we perform a valid division operation in Python, it returns a float data type value.
When working with datasets in machine learning, it is common to encounter decimal numbers. In order to perform calculations on these numbers, they need to be converted to float data type.
In some machine learning algorithms, such as neural networks, it is important to scale the input features to a range that is appropriate for the algorithm. This typically involves converting the features to float data type and scaling them to a range between 0 and 1 or -1 and 1.
When making predictions with machine learning models, the output is often a float value that represents a probability or a continuous variable.
Many machine learning algorithms use loss functions that involve float data types. For example, mean squared error, a common loss function used in regression problems, involves the use of float data types.

Complex Numbers

Python has built-in support for complex numbers, which are represented using the complex type. The complex type consists of two floating-point numbers, the real part and the imaginary part, separated by a plus sign (+) and the letter "j". The letter "j" represents the square root of the negative one, which defines the imaginary component of a complex number.

>>> a = 3 + 4j
>>> type(a)
<class 'complex'>

Python provides several built-in functions for working with complex numbers, including abs, real, imag, and conjugate.

The abs function returns the magnitude (or absolute value) of a complex number.
The real function returns the real part of a complex number.
The imag function returns the imaginary part of a complex number.
The conjugate function returns the complex conjugate of a complex number.

Here are some examples of using these functions:

z = 3 + 4j
print(abs(z))        # prints 5.0
print(z.real)        # prints 3.0
print(z.imag)        # prints 4.0
print(z.conjugate()) # prints (3-4j)

Python also provides operators for working with complex numbers, including addition, subtraction, multiplication, division, and exponentiation. These operators work just like their counterparts for real numbers, but they take into account the imaginary part of the numbers. Here's an example of using the addition operator with complex numbers:

z1 = 3 + 4j
z2 = 1 - 2j
z3 = z1 + z2
print(z3) # prints (4+2j)

Python's cmath module provides additional functions for working with complex numbers, such as sqrt, exp, and log. These functions can be useful for more advanced calculations involving complex numbers. We would suggest exploring some examples of this.

Use of complex numbers in machine learning

Complex numbers are not commonly used in machine learning and data science because the majority of the data and algorithms used in these fields involve real numbers. But there are some areas where complex numbers can be useful. One such area is signal processing, particularly in the analysis of audio and image data.

Here we use complex numbers to represent the Fourier transform of a signal i.e. a common technique for analyzing the frequency components of a signal. The complex numbers represent both the amplitude and phase of the frequency components.

Strings Data Type in Python

In Python, we represent a sequence of characters as strings, denoted by the keyword "str". Boundaries of any string data type are defined by either a single quote or a double quote ("or ""). Depending on our system's memory, we can store many characters in strings and an empty.

>>> a = 'Single quote string'
>>> b = "Double quote string"

>>> type(a)
<class 'str'>

>>> type(b)
<class 'str'>

### empty string
>>> ''
''

But what if we have a single quote present as a character? For example: 'We represent a single quote using 'as a character'.

>>>'We represent a single quote using ' as a character'
SyntaxError: invalid syntax

As shown, it will produce a syntax error as the single opening quote got paired with a single closing quote (present before the "as" word), and the characters beyond that do not have any single opening quote. To avoid these errors, we have two fixes for that:

If the single quote is present as the characters, use double quotes to define the boundaries.

>>> "We represent a single quote using ' as a character"
"We represent a single quote using ' as a character"

Use "Escape sequences" in strings.

Placing a backslash in front of the quote character makes Python treat it as normal and forget its special meaning. There are several other examples as well, where the escape sequence changes the behaviour of the normal/special characters in the strings, like:

>>> print("anb")
anb

### Placing backslash before n, makes it a newline character
>>> print("a\nb")
a
b

### Placing backslash before t, makes it a tab character
>>> print("a\tb")
a b

### Placing backslash before backslash removes the special meaning ### of backslash
>>> print("a\\nb")
a\nb

Here are some critical insights about Python String

Python strings are immutable, which means once a string is created, it cannot be modified. But we can create a new string by concatenating two or more strings using the + operator.

first_name = 'Enjoy'
last_name = 'Algorithms'
full_name = first_name + ' ' + last_name
print(full_name)  # prints "Enjoy Algorithms"

Python provides several built-in methods for working with strings, including len, lower, upper, strip, split, and join. These methods allow you to perform common string operations, such as getting the length of a string, converting a string to lowercase or uppercase, removing whitespace from the beginning and end of a string, splitting a string into a list of substrings, and joining a list of strings into a single string.

Here are some examples:

my_string = "Hello, World!"
print(len(my_string))          # prints 19
print(my_string.lower())       # prints "hello, world!"
print(my_string.strip())       # prints "Hello, World!"
print(my_string.split(','))    # prints ["Hello", " World!"]
print(' '.join(['Hello', 'World']))  # prints "Hello World"

Python provides support for regular expressions, which are powerful tools for working with strings. The re module provides functions for searching, replacing, and manipulating strings using regular expressions. Note: We would suggest exploring the use cases of regular expressions.

Use of String data type in machine learning

In machine learning and data science, strings are often used to represent text data, which is a common type of data in natural language processing (NLP) tasks such as sentiment analysis, language translation, and text classification.

Natural Language Processing is designed to learn the complex patterns in textual data, which are in string format. For example, predicting the sentiment of any user by reading their tweets.
In NLP, text data is often preprocessed by converting it into a numerical representation that can be used by machine learning algorithms. One common technique is to represent each word in the text as a vector of numbers, known as word embedding. Python provides several libraries for working with word embeddings (Gensim and spaCy).
Strings are also used in data cleaning and preprocessing tasks, such as removing stopwords (common words that do not carry much meaning, such as "the" and "and") and performing text normalization (e.g., converting all text to lowercase). Python provides several libraries for performing these tasks (NLTK and Scikit-learn).
Strings are often used to represent categorical data in machine learning models. For example, in a classification task where we want to predict the type of a product based on its description, we might represent the type of the product as a string and convert it to a numerical representation.

Boolean Data Type in Python

In Python 3, we have a boolean data type that can take either of two values: True (with capital T) or False (with capital F). We can check the type of the variable as:

>>> type(True)
<class 'bool'>

>>> type(False)
<class 'bool'>

This data type is used to check the truth of any statement. In Python, we use single "=" to assign value to the variable and double "==" to check the statement's validity.

>>> a = 5
>>> a == 5
True

In Python, the boolean values True and False are actually just special cases of the integers 1 and 0, respectively. This means you can perform arithmetic operations on boolean values, such as adding or multiplying them:

print(True + True)  # prints 2
print(False * 10)  # prints 0

Python provides several operators for working with boolean values, including and, or, and not. These operators allow you to combine boolean expressions and create more complex conditions:

x = 5
y = 10
z = 15
print(x < y and y < z)  # prints True
print(x < y or y > z)   # prints True
print(not(x == y))      # prints True

Python also provides several functions for working with boolean values, such as all and any, which allow you to check if all or any of the elements in an iterable are true:

my_list = [True, False, True]
print(all(my_list))  # prints False
print(any(my_list))  # prints True

Use of Boolean Data Types in Machine Learning

In machine learning and data science, boolean values are used to represent the outcomes of binary classification tasks. For example, if we are building a model to predict whether a customer will make a purchase or not, we can use True or False to represent the two possible outcomes.
We also use boolean values in the evaluation of machine learning models, particularly in metrics like accuracy, precision, recall, and F1 score. These metrics are based on the comparison of predicted and actual values, which are often represented as boolean values.
We can use a boolean data type in the creation of masks and filters in data analysis. For example, if we have a dataset of customer information and we want to filter out customers who have not made a purchase. For this, we can create a boolean mask that filters out rows where the purchase status is False.
Boolean logic is also important in the design of decision trees and other models that use binary splits to make predictions.

Converting Data Types

In Python, we have the flexibility to change the datatype of values or variables, but only when the conversion is valid. For example, 2.0 is a floating-point, and we can convert it into an integer like this:

## Float to int conversion
>>> a = int(2.0)
>>> type(a)
<class 'int'>

## Int to float conversion
>>> a = float(2)
>>> type(a)
<class 'float'>

## Float to int conversion example
>>> a = int(7.11)
>>> a
7

In the third example of float-to-int conversion, we used the "int(7.11)" function to convert a floating-point number to an integer. This function rounds down the floating-point number to the nearest lower integer (7). The result would be the same whether the float value was 7.11 or 7.9.

This type of conversion is also possible for strings, but only when the string contains numerical characters. In other words, strings representing numbers can be converted to integers or floating-point numbers. Let's see some examples.

>>> a = '2022'
>>> type(a)
<class 'str'>

>>> int(a)
2022

>>> float(a)
2022.0

But, when the strings would not have numbers, then it will produce the ValueError like this:

>>> a = '1 1'
>>> int(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '1 1'

That means such a type of conversion is not allowed. In other words, all float or int data types can be converted to a string, but all string data types can not be converted into int/float data types.

These conversions are very helpful in Machine learning or Deep Learning techniques.

Developers use type conversion methods to change floating operations into int operations to improve the computing requirements.
The string data type needs to be converted into an int or float data type so that computers can understand them.

Now that we know the basic data types in Python, let's understand the concept of expression and variables.

What are Expressions and Variables in Python?

Expressions

Expressions in Python are instructions that tell a computer to perform a specific operation. In other words, they are the building blocks of a Python program and are used to specify the computation that the computer should perform. Expressions can include arithmetic operations, logical operations, bitwise operations, etc. The result of an expression is a value that can be used in further computations or stored for later use.

For example, let's look at some basic arithmetic operations.

# Numbers are operands and mathematical symbols are operators
>>> 5 + 4.99
9.99

>>> 12*7
84

## Division of integer data types results in a float value
>>> 12/6
2.0

>>> 10 - 5
5

Operator Precedence in Python

In Python, operator precedence determines the order in which operations are performed in an expression. This order is based on standard mathematical conventions, such as the PEMDAS rule (Parentheses, Exponents, Multiplication and Division, and Addition and Subtraction).

When evaluating an expression using the PEMDAS rule, operations inside parentheses are done first, followed by exponents, then multiplication and division (from left to right), and finally addition and subtraction (from left to the right). This ensures that mathematical expressions are evaluated in a predictable and consistent manner, just as in mathematics.

In the case of a tie between two operators with the same precedence in an expression, the associativity rule is used to resolve the tie. The associativity rule states that all operators, except for exponentiation (**), follow a left-to-right associativity. This means that the expression will be evaluated from left to right.

For example, the expression (4 + 3) — 3² + 6/2 * 7 can be evaluated as follows, following the order of operations defined by the Python operator precedence:

First, operation within parentheses (4 + 3) is evaluated: 4 + 3 = 7
Next, operation 3² is evaluated: 3² = 9
Then, operation 6/2 is evaluated: 6/2 = 3
Then, operation 3 * 7 is evaluated: 3 * 7 = 21
Finally, expression 7–9 + 21 is evaluated: 7–9 + 21 = 19

So the final result of the expression (4 + 3) — 3² + 6/2 * 7 is 19.

The following is the operator precedence table in Python (Increasing precedence from top to bottom).

:= (Assignment expression)
lambda (Lambda expression)
if-else (Conditional expression)
or (Boolean OR)
and (Boolean AND)
not x (Boolean NOT)
<, <=, >, >= (Comparison operators)
!=, == (Equality operators)
in, not in, is, is not (Identity operators, membership operators)
| (Bitwise OR)
^ (Bitwise XOR)
& (Bitwise AND)
<<, >> (Left and right Shifts)
+, — (Addition and subtraction)
*, @, /, //, % (Multiplication, matrix multiplication, division, floor division, remainder)
+x, -x, ~x (Unary plus, Unary minus, bitwise NOT)
** (Exponentiation)
await x (Await expression)
x[index], x(arguments…), x.attribute (Subscription, slicing, call, attribute reference)
() Parentheses (Highest precedence)

Variables

A variable is a storage container for a data type. For example, in the following code snippet, "temp_variable" is treated as a variable to store the value 6 having int data type. This variable can then be used elsewhere in the program, carrying the value of 6.

temp_variable = 6

Variables are useful because they allow us to change a value in one place and reflect that change throughout the code. For example, if we want to change the value 6 to 7 and we have used the number 6 directly in many places in the code, we would have to change it in all of those places. However, if we have used a variable with a value of 6 and then used that variable in many places, we could change the variable's value once, and it would update in all places where the variable is used. This makes it easier to maintain and update our code.

Key Note: It is considered good practice to use descriptive and meaningful names for variables, especially in Machine Learning and Data Science domains where codebases become large. This makes us easier to understand and track the usage of each variable, making the code more organized and maintainable.

Conclusion

In this introductory blog on Python, we covered the basics of data types. We explored three primary data types (numbers, booleans, and strings), expressions and variables in Python and learned about their usage in Machine Learning and Data Science.

If you have any queries/doubts/feedback, please write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy algorithms!

Fundamentals of Data Types and Variables in Python