Scopes
Python has closures, similar to Javascript, since functions are first class objects. But unlike Javascript, there are some subtle gotchas in regards to working with function scopes.
nonlocal vs. global
The keyword nonlocal
is used when a inner function wants to access a variable in the outer function. nonlocal
is introduced in Python 3.
For Python 2.x, you can access the outer function variable by referring to the variable preceded with the dot operator and the outer function name. For example, if x
is defined in fcn1
and you want to assign x = 1
in a nested inner function called fcn2
, simply do fcn1.x = 1
The keyword global
is used when a function or a class wants to access a variable in the global scope.
Here is an example of the nonlocal
keyword being used:
def find_max_min_path(M):
n = len(M)
m = len(M[0])
result = -float('inf')
def _find_max_min(i, j, min_so_far):
if i >= m or j >= n:
return
curr_min = min(M[j][i], min_so_far)
nonlocal result
if i == m-1 and j == n-1:
result = max(result, curr_min)
return
_find_max_min(i+1, j, curr_min)
_find_max_min(i, j+1, curr_min)
Threading, GIL, and Processes
In Python, there is a misconception that people regard it as a single-threaded language. That however, is wrong.
You can make multiple threads in Python.
However, there is a caveat. GIL (global interpreter lock) says that in a single Python process, you can only execute Python code in one thread.
This means that in a single Python process (note: not the same as a Linux process), if you were to make 10 threads, only one of those threads will actually execute Python code (the CPython interpreter executing compiled Python bytecode, to be exact), while the other 9 threads can do other useful things (I/O operations, network calls, etc.)
This also means that if all you are running is Python code across 10 threads, then CPU usage-wise, you are never going to use more than 100% CPU utilization (100% indicates max CPU utilization of one CPU core). This is due to the GIL limitation - only one thread at a time can execute Python code. You can delegate the execution to different threads, but this is simply just context switching, meaning that other threads will simply be blocked until the main thread that handles Python code has finished executing.
This is very different from multi-threading in other languages like C++ or Java, where compiled code can actually be executed in parallel among various CPU threads.
Working around GIL
To get around GIL limitations with multi-threading and single-CPU core usage, there are a few options available:
- Assuming you are using Cython implementation of Python, you could handle parallelizing CPU intensive work at the C extensions level
- You can use the multiprocessing library to spin up separate Python subprocesses (not threads), for intensive CPU demands. This allows you to efficiently parallelize work across multiple CPU cores, since GIL does not apply to Python subprocesses.
- You can use asyncio (Python 3.7+) to handle concurrent operations with a single-threaded event-loop. This is generally useful for network-bound operations where waiting on the response of a network call results in delays.
Pickle
In Python, you can use the pickle
module to serialize, or marshall Python objects into bytecode (also known as byte streams) and deserialize them back to Python objects. The API for pickling is very similar to the json
module, which allows you to serialize Python objects into strings and deserialize strings back to Python objects.
Few use cases for pickling:
- You want to convert executable logic in Python into a byte stream and vice-versa. You can pickle functions (i.e. recursive) and classes in Python. When these objects are un-pickled, it might also execute some random code. Note that this can be a security concern, where hackers can maliciously pack dangerous scripts into pickled objects; json is more preferable if you are worried about not running dangerous scripts at un-pickling.
- Interoperability and portability with different Python environments; pickled objects are not understandable by other languages.
- Need to transfer Python data over the network. Ray, a distributed computing framework built on top of Python and C++, uses pickling to transfer Python classes and functions over to different worker nodes.
Generators
Generators allow you to achieve lazy iteration by yielding values in a function. By using the yield
keyword in a function, it becomes a generator. Each time you yield a value in a generator, that value is returned as the generator is being iterated. When the next iteration starts, the generator then resumes the code execution from the last yield
keyword and repeats the process again.
The yield
keyword acts as a bookmark for where the function resumes code execution. This is useful when you are iterating through large sets of data (i.e. a list of billions of objects) which takes up more memory than you have available on your machine running the Python code.
Duck Typing
TODO
Data Structures
Empty array
list = []
Array with values
list = [1, 2, 3]
Empty hash map
hashmap = {}
Hash map with values
hashmap = {'a': 'b'}
Empty set
s = set()
Set with values
s = set([1, 2, 3])
Min heap
import heapq
q = []
heapq.heappush(q, 1)
heapq.heappush(q, 2)
heapq.heappush(q, 3)
heapq.heappop(q) # 1
Max heap
import heapq
q = []
# Invert the values when inserting
heapq._heappush(q, -1)
heapq._heappush(q, -2)
heapq._heappush(q, -3)
# Invert the values again after fetching
heapq.heappop(q) * -1 # 3
Pair/tuple
some_pair = (1, 3)
print(some_pair[0]) # 1
print(some_pair[1]) # 3
Queue
from collections import deque
# A deque works as a queue, since popleft is quicker than a List's remove(0)
queue = deque([1, 2, 3])
# Dequeue
queue.popleft() # 1
# Enqueue
queue.append(4)
print(queue) # deque([2, 3, 4])
Stack
# Lists are the easiest way to represent stacks, since .append() is synonymous with a stack push
# and .pop() is synonymous with a stack pop
stack = []
stack.push(3)
stack.push(6)
# Pop 6
stack.pop()
stack.push(9)
print(stack) # [3, 9]