top of page

Mastering Python Functions for Data Analysis



Welcome to this comprehensive tutorial on mastering Python functions in the context of data science. This guide will take you through the process of creating your very own Python functions and how you can use them to solve real-world data science problems. By the end of this tutorial, you will be able to write Python functions that accept multiple parameters and return multiple values, a skill essential for any data scientist.


Let's dive in!


1. Introduction to User-Defined Functions


In Python, a function is a block of reusable code that performs a specific task. Python provides a variety of built-in functions like print(), len(), type(), and so on. However, in many situations, especially in data science, we often need to perform tasks that are specific to our needs. That's where user-defined functions come in.


User-defined functions are functions that are defined by the users themselves to carry out tasks tailor-made to their specific requirements.

Think of it like cooking a meal. The Python's built-in functions are like ready-to-eat meals: quick, easy but not always exactly what you want. User-defined functions, on the other hand, are like cooking your meal from scratch: it requires more effort, but you get exactly what you want.


Defining a Function


Let's start by defining a simple Python function:


def greet():
    print("Hello, there!")

In the above code snippet, greet is a user-defined function that prints out the string "Hello, there!".


To call or invoke this function, you simply need to type its name followed by parentheses:


greet()

Output:

Hello, there!


2. Defining a Function with Parameters


Let's move on to slightly more complex functions - functions with parameters. Parameters are values that we can pass into a function to alter its behavior.


Think of parameters like the ingredients in a recipe. Depending on the ingredients you add, the final dish can be different even with the same cooking process.

Let's modify our greet function to greet a specific person:

def greet(name):
    print("Hello, " + name + "!")

Now our function accepts one parameter - name. We can pass different names to get personalized greetings:

greet("Alice")

greet("Bob")

Output:

Hello, Alice!
Hello, Bob!


3. Return Values from Functions


Instead of just printing out a value, often we want our function to compute a value and give it back to us. This is known as returning a value.


Consider our function to be like a vending machine. We give it some input (insert coins, press buttons), and it gives us an output (returns a delicious snack).

Let's create a function that squares a number and returns the result:

def square(number):
    return number ** 2

We can now call this function with a number, and it will return the square of that number:

print(square(4))
print(square(5))

Output:

16
25


4. Docstrings in Functions


Docstrings, or documentation strings, are important as they describe what your function does. They are placed immediately after the function header and are written between triple quotation marks.


Think of docstrings as the label on a medicine bottle. It tells you what the medicine is, what it's used for, and how to use it.

Let's add a docstring to our square function:

def square(number):
    """Return the square of a number."""
    return number ** 2

When you write large programs or libraries, other people (or even future you) can read your docstrings to understand what your function does, without needing to read through all the code.


5. Multiple Parameters and Return Values


In many situations, we might need a function to work with more than one input. Python functions can easily handle multiple parameters.


Consider our function as a fancy blender. You can put in multiple fruits (inputs) and get a mixed fruit juice (output).

Let's write a function that raises one number to the power of another:

def power(base, exponent):
    """Return base raised to the power of exponent."""
    return base ** exponent

Now our function takes two parameters, base and exponent:

print(power(2, 3))
print(power(5, 2))

Output:

8
25

Functions can also return multiple values. When we want to get multiple outputs from a function, we can return those outputs as a tuple. More on this later.


6. Introduction to Tuples


A tuple in Python is a type of data structure that can hold zero or more values. Tuples are similar to lists in Python, but unlike lists, tuples are immutable, i.e., they can't be changed after they are created.


Think of a tuple as a bar of chocolate with its individual pieces. Each piece of chocolate (an item) is part of the bar (the tuple), and once the bar is made, you can't change the individual pieces.


Here's how you can create a tuple in Python:

my_tuple = (1, 2, 3)
print(my_tuple)

Output:

(1, 2, 3)


7. Unpacking and Accessing Tuples


Unpacking a tuple means extracting the individual values and assigning them to variables.


Think of it like opening a packaged gift box. You "unpack" the items inside and hold each one separately in your hands.


Here's an example of tuple unpacking:

my_tuple = (1, 2, 3)
a, b, c = my_tuple

print(a)
print(b)
print(c)

Output:

1
2
3

You can also access individual elements in a tuple using their indices, similar to lists:

my_tuple = (1, 2, 3)
print(my_tuple[0])  # Indexing starts at 0

Output:

1


8. Returning Multiple Values from a Function


As mentioned earlier, Python functions can return multiple values. This is done by returning a tuple of the values.


Let's modify our power function to also return the base and the exponent along with the result:

def power(base, exponent):
    """Return base, exponent, and base raised to the power of exponent."""
    result = base ** exponent
    return base, exponent, result

Now when we call this function, it returns a tuple:

print(power(2, 3))

Output:

(2, 3, 8)

We can also unpack the returned tuple into individual variables:

base, exponent, result = power(2, 3)
print("Base:", base)
print("Exponent:", exponent)
print("Result:", result)

Output:

Base: 2
Exponent: 3
Result: 8


9. Bringing It All Together


Let's now consider a practical application of everything we've learned. Imagine we have a DataFrame of Twitter data, and we want to count the number of occurrences of each language used in a collection of tweets.


First, let's create a hypothetical DataFrame using pandas:

import pandas as pd

# Our hypothetical Twitter data
data = {
    "tweet": [
        "Hello, world!",
        "Bonjour, le monde!",
        "Hallo, Welt!",
        "¡Hola, mundo!",
        "Ciao, mondo!",
    ],
    "language": ["English", "French", "German", "Spanish", "Italian"],
}

df = pd.DataFrame(data)
print(df)

Output:

               tweet language
0      Hello, world!  English
1  Bonjour, le monde!   French
2       Hallo, Welt!   German
3      ¡Hola, mundo!  Spanish
4       Ciao, mondo!  Italian

Now, let's define a function that accepts a DataFrame and a column name, and returns a dictionary with the count of each unique value in that column:

def count_values(df, column):
    """Return a dictionary with counts of unique values in the specified DataFrame column."""
    counts = df[column].value_counts().to_dict()
    return counts

We can use this function to count the number of occurrences of each language in our DataFrame:

language_counts = count_values(df, "language")
print(language_counts)

Output:

{'Italian': 1, 'Spanish': 1, 'German': 1, 'French': 1, 'English': 1}


In this case, each language only appears once, so we get a count of 1 for each one. If we had a larger dataset, we'd see a more useful distribution of counts.


Through this tutorial, you've learned how to define your own functions in Python, accept input parameters, return values, and even document your functions with docstrings. You've also learned about tuples, how to unpack them, and how to use them to return multiple values from your functions. Finally, we've brought all these concepts together to analyze a DataFrame of Twitter data.


Understanding these fundamentals will serve as a strong foundation for your journey in Python and data science. With a bit of practice, you'll soon find yourself writing functions to automate complex tasks, analyze large datasets, and extract valuable insights from them. Happy coding!

bottom of page