Slides

Useful Tools#

Introduction#

In this section, we will explore some essential tools that are widely used in software development and data science. These tools include:

  • Bash: A Unix shell and command language that provides powerful text processing and scripting capabilities. Bash is essential for automating repetitive tasks, managing system operations, and handling text processing efficiently.

  • Git: A distributed version control system that helps you track changes in your code and collaborate with others. Git is crucial for maintaining a history of your project, enabling collaboration among team members, and managing different versions of your codebase.

  • Python: A versatile programming language that is popular for its simplicity and readability, and is extensively used in data analysis, machine learning, and web development. Python’s extensive libraries and frameworks make it a go-to choice for various applications, from scripting to building complex applications.

By mastering these tools, you will be able to streamline your workflow, manage your projects more effectively, and enhance your productivity. This section assumes that you have a basic understanding of programming concepts and are familiar with using the command line.

Bash#

Bash (Bourne Again SHell) is a Unix shell and command language that is widely used for system administration, scripting, and text processing. It provides a powerful set of tools for automating tasks and managing system operations.

Basic Commands#

Here are some basic Bash commands that you should know:

  • ls: List directory contents. Example: ls -l lists files in long format. Example: ls -a lists all files, including hidden files.

ls -l
ls -a
  • cd: Change the current directory. Example: cd /path/to/directory changes to the specified directory.

cd /path/to/directory
  • pwd: Print the current working directory. Example: pwd displays the full path of the current directory.

pwd
  • cp: Copy files and directories. Example: cp source.txt destination.txt copies the source file to the destination.

cp source.txt destination.txt
  • mv: Move or rename files and directories. Example: mv oldname.txt newname.txt renames the file.

mv oldname.txt newname.txt
  • rm: Remove files or directories. Example: rm file.txt deletes the specified file.

rm file.txt
  • echo: Display a line of text. Example: echo "Hello, World!" prints the text to the terminal.

echo "Hello, World!"
  • cat: Concatenate and display file content. Example: cat file.txt displays the content of the file.

cat file.txt
  • mkdir: Create a new directory. Example: mkdir new_directory creates a new directory with the specified name.

mkdir new_directory
  • mkdir -p: Create a new directory and any necessary parent directories. Example: mkdir -p /path/to/new_directory creates the specified directory along with any missing parent directories.

mkdir -p /path/to/new_directory
  • touch: Create a new empty file or update the timestamp of an existing file. Example: touch newfile.txt creates an empty file with the specified name.

touch newfile.txt

You can combine touch with other commands to create multiple files at once or to create files in specific directories. Here are some examples:

  • Create multiple files:

touch file1.txt file2.txt file3.txt
  • Create a file in a specific directory:

touch /path/to/directory/newfile.txt

Scripting#

Bash scripting allows you to automate repetitive tasks by writing scripts. A Bash script is simply a text file containing a series of commands. Here is an example of a simple Bash script:

#!/bin/bash
echo "Hello, World!"

To run the script, save it to a file (e.g., hello.sh), make it executable (chmod +x hello.sh), and then execute it (./hello.sh).

Text Processing#

Bash provides powerful text processing capabilities through commands like grep, awk, and sed. These tools allow you to search, filter, and transform text data efficiently. Here are some examples:

  • grep: Search for patterns in files. Example: grep 'pattern' file.txt searches for ‘pattern’ in the file.

grep 'pattern' file.txt
  • awk: A programming language for text processing. Example: awk '{print $1}' file.txt prints the first column of each line in the file.

awk '{print $1}' file.txt
  • sed: Stream editor for filtering and transforming text. Example: sed 's/old/new/g' file.txt replaces ‘old’ with ‘new’ in the file.

sed 's/old/new/g' file.txt

Using Wildcards#

Wildcards are special characters that can be used to match patterns in filenames. Here are some common wildcards:

  • *: Matches any number of characters. Example: ls *.txt lists all files with a .txt extension.

ls *.txt
  • ?: Matches a single character. Example: ls file?.txt matches file1.txt, file2.txt, etc.

ls file?.txt
  • []: Matches any one of the characters inside the brackets. Example: ls file[12].txt matches file1.txt and file2.txt.

ls file[12].txt

Tips and Tricks for Basic Terminal Use#

Here are some tips and tricks to enhance your terminal usage:

  • Use Tab for auto-completion of commands and filenames.

  • Use Ctrl + R to search through your command history.

  • Use Ctrl + C to cancel the current command.

  • Use Ctrl + L to clear the terminal screen.

  • Use !! to repeat the last command.

  • Use !<command> to repeat the last occurrence of a specific command. Example: !ls repeats the last ls command.

File Permissions and Ownership#

Managing file permissions and ownership is crucial for system security and proper access control. Here are some commands related to file permissions and ownership:

  • chmod: Change file permissions. Example: chmod 755 file.txt sets the file permissions to read, write, and execute for the owner, and read and execute for others.

chmod 755 file.txt
  • chown: Change file ownership. Example: chown user:group file.txt changes the owner and group of the file.

chown user:group file.txt

Viewing Running Processes#

You can view and manage running processes using the following commands:

  • ps: Display information about running processes. Example: ps aux shows detailed information about all running processes.

ps aux
  • top: Display real-time information about running processes. Example: top shows a dynamic view of system processes.

top

Creating Aliases#

Aliases allow you to create shortcuts for frequently used commands. Here are some examples:

  • Create an alias for ls -la:

alias ll='ls -la'
  • Make the alias permanent by adding it to your .bashrc or .bash_profile file:

echo "alias ll='ls -la'" >> ~/.bashrc
source ~/.bashrc

Further Documentation#

For more information and advanced usage, you can refer to the official Bash documentation and resources:

man ls

By mastering Bash, you can significantly enhance your productivity and streamline your workflow.

Git#

Git is a distributed version control system that helps you track changes in your code and collaborate with others. It is crucial for maintaining a history of your project, enabling collaboration among team members, and managing different versions of your codebase.

Basic Commands#

Here are some basic Git commands that you should know:

  • git init: Initialize a new Git repository.

git init
  • git clone: Clone an existing repository.

git clone https://github.com/user/repo.git
  • git status: Show the working directory status.

git status
  • git add: Add files to the staging area.

git add filename
  • git commit: Commit changes to the repository.

git commit -m "Commit message"
  • git push: Push changes to a remote repository.

git push origin branchname
  • git pull: Pull changes from a remote repository.

git pull origin branchname

Branching and Merging#

Git allows you to create branches to work on different features or fixes independently. Here are some commands related to branching and merging:

  • git branch: List, create, or delete branches.

git branch
git branch new-branch
git branch -d old-branch
  • git checkout: Switch to a different branch.

git checkout branchname
  • git merge: Merge changes from one branch into another.

git checkout main
git merge branchname

Resolving Merge Conflicts#

Merge conflicts occur when changes from different branches conflict with each other. To resolve merge conflicts:

  • Identify the conflicting files using git status.

git status
  • Open the conflicting files and manually resolve the conflicts.

  • Mark the conflicts as resolved by adding the resolved files to the staging area.

git add resolved_file.txt
  • Commit the resolved changes.

git commit -m "Resolved merge conflicts"

Using .gitignore#

The .gitignore file specifies which files and directories to ignore in a Git repository. Here is an example of a .gitignore file:

# Ignore all .log files
*.log

# Ignore the node_modules directory
node_modules/

# Ignore all files in the temp directory
temp/

Working with Remote Repositories#

You can manage remote repositories using the following commands:

  • git remote: Manage remote repositories. Example: git remote add origin https://github.com/user/repo.git adds a remote repository.

git remote add origin https://github.com/user/repo.git
  • git fetch: Fetch changes from a remote repository without merging them.

git fetch origin
  • git pull: Fetch and merge changes from a remote repository.

git pull origin branchname
  • git push: Push changes to a remote repository.

git push origin branchname

Advanced Features#

Here are some advanced Git features that can help you manage your codebase more effectively:

  • git rebase: Reapply commits on top of another base tip. Example: git rebase main rebases the current branch onto the main branch.

git rebase main
  • git stash: Stash changes in a dirty working directory. Example: git stash stashes the current changes.

git stash
  • git stash pop: Apply stashed changes and remove them from the stash list.

git stash pop

Further Documentation#

For more information and advanced usage, you can refer to the official Git documentation and resources:

By mastering Git, you can effectively manage your codebase, collaborate with others, and maintain a history of your project.

Python#

Python is a versatile programming language that is popular for its simplicity and readability. It is widely used for data analysis, machine learning, web development, scripting, and more.

Virtual Environments#

Virtual environments isolate project dependencies, preventing conflicts between different projects and ensuring a consistent environment. Here is a brief comparison of venv vs. conda:

venv:

  • Included with Python (no separate install).

  • Lightweight, straightforward.

  • Activates a virtual environment in the current shell.

Installation and usage:

# Create a virtual environment
python -m venv myenv
# Activate it (Linux/macOS)
source myenv/bin/activate
# Activate it (Windows)
myenv\Scripts\activate

conda:

  • Requires Anaconda or Miniconda installation.

  • Manages Python versions and packages efficiently.

  • Provides extensive data science libraries and environment management.

Installation and usage:

# Create a conda environment
conda create --name myenv python=3.12
# Activate the environment
conda activate myenv
# Deactivate the environment
conda deactivate

You can find flavors of conda via Anaconda and Miniconda.

Installing Packages in Virtual Environments#

When using a virtual environment, you can install packages with pip (if you are using venv or within a conda environment):

pip install requests

For conda environments:

conda install numpy

This ensures packages are isolated within your environment.

Installing Packages from requirements.txt#

If you have a requirements.txt file listing the dependencies:

requests==2.28.1
numpy==1.24.0

Install them in your active environment with:

pip install -r requirements.txt

For conda (convert dependencies to conda format if you can, or install pip-based packages in a conda environment):

conda install --file requirements.txt

This helps manage project dependencies in a single file.

Creating Packages with pyproject.toml#

Modern Python packaging often uses pyproject.toml files, following PEP 518 and related standards. A simple pyproject.toml might look like:

[project]
name = "my_package"
version = "0.1.0"
description = "A sample Python package"
authors = [
    { name="Your Name", email="your_email@example.com" }
]
dependencies = [
  "requests~=2.28.0",
  "numpy~=1.24.0"
]

readme = "README.md"
license = { text = "MIT" }
requires-python = ">=3.7"

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

Recommended PEP standards include:

For more in-depth guidance on writing pyproject.toml and Python packaging, see Python Packaging Official Docs

Editable Installs#

If you have a pyproject.toml for your project, you can install it in editable mode, allowing local changes to be immediately reflected:

pip install -e .

Writing and Running Tests#

Testing is essential for ensuring the correctness and stability of your code. One popular testing framework is pytest. A typical setup:

  1. Install pytest:

pip install pytest
  1. Create a test file, for example test_example.py:

def test_addition():
    assert 1 + 1 == 2
  1. Run tests:

pytest

Used with continuous integration tools, testing helps maintain a reliable codebase. By mastering virtual environments, packaging with pyproject.toml, adhering to PEP standards, and writing thorough tests, you will build robust and maintainable Python projects.

Python Classes#

In Python, classes are used to create user-defined data structures. They allow you to bundle data and functionality together. Creating a new class creates a new type of object, allowing new instances of that type to be made. Each class instance can have attributes attached to it for maintaining its state. Class instances can also have methods (defined by its class) for modifying its state.

Defining a Class#

Here is an example of a simple class definition:

class Shape:
    def __init__(self, color):
        self.color = color

    def describe(self):
        return f"This shape is {self.color}."

The __init__ method is a special method that is called when an instance of the class is created. It initializes the instance with the given arguments. The self parameter refers to the instance being created.

The self parameter is a reference to the current instance of the class. It is used to access variables and methods associated with the instance.

Creating an Instance#

You can create an instance of the class by calling the class name and passing the required arguments:

my_shape = Shape("red")
print(my_shape.describe())  # Output: This shape is red.

Inheritance#

Classes can inherit from other classes. This allows you to create a hierarchy of classes that share a set of attributes and methods. Here is an example of a Rectangle class that inherits from Shape:

class Rectangle(Shape):
    def __init__(self, color, width, height):
        super().__init__(color)
        self.width = width
        self.height = height

    def area(self):
        return self.width * self.height

    def describe(self):
        return f"This rectangle is {self.color}, with an area of {self.area()} square units."

Using the Subclass#

You can create an instance of the subclass and use its methods:

my_rectangle = Rectangle("blue", 3, 4)
print(my_rectangle.describe())  # Output: This rectangle is blue, with an area of 12 square units.

The __call__ Method#

The __call__ method allows an instance of the class to be called as a function. Here is an example:

class Circle(Shape):
    def __init__(self, color, radius):
        super().__init__(color)
        self.radius = radius

    def area(self):
        return 3.14 * self.radius * self.radius

    def __call__(self):
        return f"This circle is {self.color}, with an area of {self.area()} square units."

Using the __call__ Method#

You can create an instance of the class and call it as a function:

my_circle = Circle("green", 5)
print(my_circle())  # Output: This circle is green, with an area of 78.5 square units.

Further Documentation#

For more information on Python classes and object-oriented programming, refer to the following resources: