Course schedule¶
Here is the current week-by-week schedule 📅 . We may adjust as we go along. To get started, we’re going to create the calendar of weeks for the course programmatically rather than manually!
## import modules
import pandas as pd
import re
import numpy as np
## tell python to display output and print multiple objects
from IPython.display import display, HTML
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
## create range b/t start and end date
## of course
start_date = pd.to_datetime("2021-03-29")
end_date = pd.to_datetime("2021-06-02")
st_alldates = pd.date_range(start_date, end_date)
## subset to days in that range equal to Tuesday or Thursday
st_tuth = st_alldates[st_alldates.day_name().isin(['Tuesday', 'Thursday'])]
## create data frame with that information
st_dates = [re.sub("2021\\-", "", str(day.date())) for day in st_tuth]
course_sched = pd.DataFrame({'dow': st_tuth.day_name(), 'st_tuth': st_dates})
course_sched['date_toprint'] = course_sched.dow.astype(str) + " " + \
course_sched.st_tuth.astype(str)
course_sched = course_sched['date_toprint']
## display the resulting date sequence
display(course_sched)
## next block of code creates the
## actual content; can click "show"
## to see the underlying code
0 Tuesday 03-30
1 Thursday 04-01
2 Tuesday 04-06
3 Thursday 04-08
4 Tuesday 04-13
5 Thursday 04-15
6 Tuesday 04-20
7 Thursday 04-22
8 Tuesday 04-27
9 Thursday 04-29
10 Tuesday 05-04
11 Thursday 05-06
12 Tuesday 05-11
13 Thursday 05-13
14 Tuesday 05-18
15 Thursday 05-20
16 Tuesday 05-25
17 Thursday 05-27
18 Tuesday 06-01
Name: date_toprint, dtype: object
## create the actual content
### list of concepts
concepts = ["Course intro. and checking software setup",
"Workflow basics: command line, Github workflow, basic LaTeX syntax, pre-analysis plans",
"Workflow basics (continued)",
"Python pandas review: aggregation, joins, lambda and user-defined functions",
"Python pandas review: aggregation, joins, lambda and user-defined functions",
"Problem set one work session",
"Intro to merging",
"Regex",
"Probabilistic matching: part one",
"Probabilistic matching: part two",
"Text as data: part one",
"Text as data: part two",
"Problem set two work session and intro to APIs",
"APIs continued",
"APIs continued; SQL part one",
"SQL part two",
"Final project work session",
"Scraping + final project work session",
"Final presentations"]
len(course_sched)
len(concepts)
## combine
course_sched_concepts = pd.DataFrame({'Week': course_sched,
'Concepts': concepts})
df = course_sched_concepts.copy()
print(df)
19
19
Week Concepts
0 Tuesday 03-30 Course intro. and checking software setup
1 Thursday 04-01 Workflow basics: command line, Github workflow...
2 Tuesday 04-06 Workflow basics (continued)
3 Thursday 04-08 Python pandas review: aggregation, joins, lamb...
4 Tuesday 04-13 Python pandas review: aggregation, joins, lamb...
5 Thursday 04-15 Problem set one work session
6 Tuesday 04-20 Intro to merging
7 Thursday 04-22 Regex
8 Tuesday 04-27 Probabilistic matching: part one
9 Thursday 04-29 Probabilistic matching: part two
10 Tuesday 05-04 Text as data: part one
11 Thursday 05-06 Text as data: part two
12 Tuesday 05-11 Problem set two work session and intro to APIs
13 Thursday 05-13 APIs continued
14 Tuesday 05-18 APIs continued; SQL part one
15 Thursday 05-20 SQL part two
16 Tuesday 05-25 Final project work session
17 Thursday 05-27 Scraping + final project work session
18 Tuesday 06-01 Final presentations
## add datacamp modules conditionally
col = "Concepts"
### older code on more exhaustive modules
# topics = [df[col] == "Python basic data wrangling: data structures (vectors; lists; dataframes; matrices), control flow, and loops",
# df[col] == "Python basic data wrangling: basic regular expressions and text mining",
# df[col] == "Python basic data wrangling: combining data (row binds, column binds, joins); aggregation",
# df[col] == "Review of visualization: ggplot; plotnine",
# df[col] == "Python: writing your own functions",
# df[col] == "Python: text data using nltk and gensim",
# df[col] == "SQL: reading data from a database and basic SQL (postgres) syntax",
# df[col] == "SQL: more advanced SQL syntax (subqueries; window functions)",
# df[col] == "Python: reading data from APIs and basic web scraping"]
# datacamp_modules = ["Python basics; python lists; Pandas: extracting and transforming data; Intermediate python for data science (loops)",
# "First three modules of regular expressions in Python",
# "Merging DataFrames with Pandas",
# "Introduction to Data Visualization with ggplot2",
# "Python data science toolbox (Part one): user-written functions, default args, lambda functions and error handling",
# "Natural language processing fundamentals in Python",
# "Introduction to databases in Python",
# "Intermediate SQL",
# "Importing JSON data and working with APIs; Importing data from the Internet"]
topics_trunc = [df[col] == "Workflow basics (continued)",
df[col] == "Merging (continued) and PSET 1 review"]
datacamp_modules_trunc = ["Data manipulation with Pandas",
"Regular expressions for pattern matching"]
df["DataCamp module(s) (if any)"] = np.select(topics_trunc,
datacamp_modules_trunc,
default = "")
date_col = "Week"
due_dates = [df[date_col] == "Tuesday 04-20",
df[date_col] == "Tuesday 04-27",
df[date_col] == "Thursday 05-13",
df[date_col] == "Tuesday 05-18",
df[date_col] == "Thursday 05-20",
df[date_col] == "Tuesday 06-01"]
assig = ["Problem set one",
"Final project step 1",
"Problem set two: part one",
"Problem set two: part two",
"Final project step 2 (due Sunday 05.23 at 11:59 PM EST)",
"Slides for final presentation (due Tuesday 06.01 at 5 PM EST)"]
df["Due (11:59 PM EST unless otherwise specified)"] = np.select(due_dates,
assig,
default = "")
## add slides or tutorial link
# df['Link to slides or tutorial'] = np.select([df["Concepts"] == "Course intro. and checking software setup",
# df["Concepts"] == "Workflow basics: command line, Github workflow, basic LaTeX syntax, pre-analysis plans"],
# ["https://github.com/rebeccajohnson88/qss20_slides_activities/blob/main/slides/qss20_s21_class1.pdf",
# "https://github.com/rebeccajohnson88/qss20_slides_activities/blob/main/slides/qss20_s21_class2.pdf"],
# default = "")
# df['Link to slides or tutorial'] = np.where(df['Link to slides or tutorial'] != "",
# '<a target="_blank" href=' + df['Link to slides or tutorial'] + '><div>' + "Link" + '</div></a>',
# "")
# df['Link to activity (blank)'] = np.select([df["Concepts"] == "Workflow basics: command line, Github workflow, basic LaTeX syntax, pre-analysis plans"],
# ["https://github.com/rebeccajohnson88/qss20_slides_activities/blob/main/activities/00_latex_output_examples.ipynb"],
# default = "")
# df['Link to activity (blank)'] = np.where(df['Link to activity (blank)'] != "",
# '<a target="_blank" href=' + df['Link to activity (blank)'] + '><div>' + "Link" + '</div></a>',
# "")
HTML(df.to_html(index=False, escape = False))
Week | Concepts | DataCamp module(s) (if any) | Due (11:59 PM EST unless otherwise specified) |
---|---|---|---|
Tuesday 03-30 | Course intro. and checking software setup | ||
Thursday 04-01 | Workflow basics: command line, Github workflow, basic LaTeX syntax, pre-analysis plans | ||
Tuesday 04-06 | Workflow basics (continued) | Data manipulation with Pandas | |
Thursday 04-08 | Python pandas review: aggregation, joins, lambda and user-defined functions | ||
Tuesday 04-13 | Python pandas review: aggregation, joins, lambda and user-defined functions | ||
Thursday 04-15 | Problem set one work session | ||
Tuesday 04-20 | Intro to merging | Problem set one | |
Thursday 04-22 | Regex | ||
Tuesday 04-27 | Probabilistic matching: part one | Final project step 1 | |
Thursday 04-29 | Probabilistic matching: part two | ||
Tuesday 05-04 | Text as data: part one | ||
Thursday 05-06 | Text as data: part two | ||
Tuesday 05-11 | Problem set two work session and intro to APIs | ||
Thursday 05-13 | APIs continued | Problem set two: part one | |
Tuesday 05-18 | APIs continued; SQL part one | Problem set two: part two | |
Thursday 05-20 | SQL part two | Final project step 2 (due Sunday 05.23 at 11:59 PM EST) | |
Tuesday 05-25 | Final project work session | ||
Thursday 05-27 | Scraping + final project work session | ||
Tuesday 06-01 | Final presentations | Slides for final presentation (due Tuesday 06.01 at 5 PM EST) |