Python Essentials

Scopes

Python has closures, similar to Javascript, since functions are first class objects. But unlike Javascript, there are some subtle gotchas in regards to working with function scopes.

nonlocal vs. global

The keyword nonlocal is used when a inner function wants to access a variable in the outer function. nonlocal is introduced in Python 3.

For Python 2.x, you can access the outer function variable by referring to the variable preceded with the dot operator and the outer function name. For example, if x is defined in fcn1 and you want to assign x = 1 in a nested inner function called fcn2, simply do fcn1.x = 1

The keyword global is used when a function or a class wants to access a variable in the global scope.

Here is an example of the nonlocal keyword being used:

def find_max_min_path(M):  
    n = len(M)
    m = len(M[0])
    result = -float('inf')

    def _find_max_min(i, j, min_so_far):
        if i >= m or j >= n:
            return

        curr_min = min(M[j][i], min_so_far)
        nonlocal result

        if i == m-1 and j == n-1:
            result = max(result, curr_min)
            return

        _find_max_min(i+1, j, curr_min)
        _find_max_min(i, j+1, curr_min)

Threading, GIL, and Processes

In Python, there is a misconception that people regard it as a single-threaded language. That however, is wrong.

You can make multiple threads in Python.

However, there is a caveat. GIL (global interpreter lock) says that in a single Python process, you can only execute Python code in one thread.

This means that in a single Python process (note: not the same as a Linux process), if you were to make 10 threads, only one of those threads will actually execute Python code (the CPython interpreter executing compiled Python bytecode, to be exact), while the other 9 threads can do other useful things (I/O operations, network calls, etc.)

This also means that if all you are running is Python code across 10 threads, then CPU usage-wise, you are never going to use more than 100% CPU utilization (100% indicates max CPU utilization of one CPU core). This is due to the GIL limitation - only one thread at a time can execute Python code. You can delegate the execution to different threads, but this is simply just context switching, meaning that other threads will simply be blocked until the main thread that handles Python code has finished executing.

This is very different from multi-threading in other languages like C++ or Java, where compiled code can actually be executed in parallel among various CPU threads.

Working around GIL

To get around GIL limitations with multi-threading and single-CPU core usage, there are a few options available:

  • Assuming you are using Cython implementation of Python, you could handle parallelizing CPU intensive work at the C extensions level
  • You can use the multiprocessing library to spin up separate Python subprocesses (not threads), for intensive CPU demands. This allows you to efficiently parallelize work across multiple CPU cores, since GIL does not apply to Python subprocesses.
  • You can use asyncio (Python 3.7+) to handle concurrent operations with a single-threaded event-loop. This is generally useful for network-bound operations where waiting on the response of a network call results in delays.

Pickle

In Python, you can use the pickle module to serialize, or marshall Python objects into bytecode (also known as byte streams) and deserialize them back to Python objects. The API for pickling is very similar to the json module, which allows you to serialize Python objects into strings and deserialize strings back to Python objects.

Few use cases for pickling:

  • You want to convert executable logic in Python into a byte stream and vice-versa. You can pickle functions (i.e. recursive) and classes in Python. When these objects are un-pickled, it might also execute some random code. Note that this can be a security concern, where hackers can maliciously pack dangerous scripts into pickled objects; json is more preferable if you are worried about not running dangerous scripts at un-pickling.
  • Interoperability and portability with different Python environments; pickled objects are not understandable by other languages.
  • Need to transfer Python data over the network. Ray, a distributed computing framework built on top of Python and C++, uses pickling to transfer Python classes and functions over to different worker nodes.

Generators

Generators allow you to achieve lazy iteration by yielding values in a function. By using the yield keyword in a function, it becomes a generator. Each time you yield a value in a generator, that value is returned as the generator is being iterated. When the next iteration starts, the generator then resumes the code execution from the last yield keyword and repeats the process again.

The yield keyword acts as a bookmark for where the function resumes code execution. This is useful when you are iterating through large sets of data (i.e. a list of billions of objects) which takes up more memory than you have available on your machine running the Python code.

Duck Typing

TODO


Data Structures


Empty array

list = []  

Array with values

list = [1, 2, 3]  

Empty hash map

hashmap = {}  

Hash map with values

hashmap = {'a': 'b'}  

Empty set

s = set()  

Set with values

s = set([1, 2, 3])  

Min heap

import heapq  
q = []  
heapq.heappush(q, 1)  
heapq.heappush(q, 2)  
heapq.heappush(q, 3)  
heapq.heappop(q)  # 1  

Max heap

import heapq  
q = []  
# Invert the values when inserting
heapq._heappush(q, -1)  
heapq._heappush(q, -2)  
heapq._heappush(q, -3)  
# Invert the values again after fetching
heapq.heappop(q) * -1  # 3  

Pair/tuple

some_pair = (1, 3)  
print(some_pair[0]) # 1  
print(some_pair[1]) # 3  

Queue

from collections import deque

# A deque works as a queue, since popleft is quicker than a List's remove(0)
queue = deque([1, 2, 3])

# Dequeue
queue.popleft()  # 1

# Enqueue
queue.append(4)

print(queue)  # deque([2, 3, 4])  

Stack

# Lists are the easiest way to represent stacks, since .append() is synonymous with a stack push
# and .pop() is synonymous with a stack pop
stack = []  
stack.push(3)  
stack.push(6)

# Pop 6
stack.pop()

stack.push(9)  
print(stack)  # [3, 9]