WebDevGrad: Python - Data and API Requests

Common File Formats

Tabular data:

Format	Description
.csv	Comma-Separated Values
.tsv	Tab-Separated Values
.xlsx	Excel spreadsheets

Non-tabular data:

Format	Description
.txt	Plain text
.rtf	Rich text format
.xml	Markup-based structured data

Images and Binary:

.png, .jpg, .tif — image files
.dat — generic data files (format depends on source)

Reading Data from Files

Python's built-in open() function helps you access file contents.

        
with open('file.txt') as file:
    contents = file.read()
print(contents)

with open(...) handles file closing automatically.
'r' = read (default), 'w' = write, 'a' = append.

        
# Reading line by line
with open('file.txt') as file:
    for line in file:
        print(line.strip())

Writing to a file.

        
with open('newfile.txt', 'w') as file:
    file.write("Hello, world!")

Reading and writing CSV Files.

        
import csv

# Reading
with open('file.csv', newline='') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['column_name'])

# Writing
with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['name', 'age'])
    writer.writerow(['Adam', 34])


    

Syntax	Description
with open('file.txt', 'w')	Opens it as writable
with open('file.txt', 'a')	A for append
with open('file.txt', 'r')	“r” is default so no need to specify; used to read a file
.read()	Grabs the whole document as a single string
.readlines()	Single line at a time. It iterates through a document, first time you do it you get first line, second time second line etc.
.readline()	Read one line at a time
.seek()	Moves to a particular point in the file
.write(“string”)	Creates a new file with the contents of string
for line in filename:	Iterates through the lines in the file

Working with APIs Using Requests

Python's requests library is your go-to tool for fetching web data. Install it with:

pip install requests

Make get requests with:

        
import requests

r = requests.get("https://api.example.com/data")
print(r.text)       # Raw response
print(r.json())     # Parsed JSON data
# Saving the parsed JSON data as a variable
data = requests.get("https://api.example.com/data").json()

API queries:

        
url = "https://api.census.gov/data/2020/acs/acs5?get=NAME,B08303_001E&for=state:*"
response = requests.get(url)
print(response.json())

API Query Structure:

After ? are query parameters
get=... defines which variables to retrieve
for=state:* means "for all states"
Separate multiple criteria using commas: &for=state:06,49

Data Collection

Primary Collected by you (e.g. surveys, scraping, simulations)
Secondary Collected by others and made public (e.g. government databases, open APIs)

When collecting data, ask:

What data is needed?
How much is enough?
Where can it be found?
Are there legal or privacy concerns?

Useful resources for scraping and datasets:

Harvard Dataverse

Google Sheets as a Data Source

You can use Google Sheets like a cloud-based database using the gspread library and Google Drive API.

Setup Steps:

Enable the Google Sheets API at https://console.cloud.google.com/
Download the JSON credentials and rename it creds.json
Add creds.json to your .gitignore (it contains sensitive data!)
Share your spreadsheet with the client_email in the creds file (Editor access)

Install required libraries:

pip install gspread google-auth

Example usage:

        
import gspread
from google.oauth2.service_account import Credentials

scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"]
creds = Credentials.from_service_account_file("creds.json", scopes=scope)
client = gspread.authorize(creds)

sheet = client.open("MySpreadsheet").sheet1
data = sheet.get_all_records()
print(data)