Share
Explore

icon picker
University Scraping

Hey Mia!
We need your help to automate 5050 sourcing and find all the awesome scientists (PhDs and postdocs out there). This is the flow that will help us find all the awesome scientists:
Given a university → we identify the deeptech departments → we identify the labs in these departments → we identify scientists in these labs → we collect scientists info → we outreach to these scientists to invite them into 5050.
You’ll help us with the steps in blue. Usually, we would just go and copy paste all the links for all the labs in each deep tech department, if you think this will be simpler, easier, then we can just add all the links to the labs in the departments. However, Kia (one of my cowokers) has built an automation workflow to find info for all scientists if we have the lab websites.
It’s important to find all scientists so that we outreach to all potential candidates for 5050 so let’s make sure we do a thorough search for these departments and labs so we don’t miss anyone :).

Detailed Steps:

1. Identify deep tech departments

For every university in the “Target University List” gather all the labs that work on Deep tech departments. , you’ll see a list of Target Universities. This is the spreadsheet we’ll work with!
Look for PhD programs in science, biology, engineering, technology, and climate.
Let’s make sure these programs focus on applied science, hard tech/creating innovations. Avoid programs focusing on theory such as studying Health Anthropology or Climate Economics. Does that make sense?
Refer to the “Deeptech” sheet where I listed deeptech areas. For example, for MIT focus on findng labs within the Aeronautics and Astronautics, Biological Engineering, Biology, Chemical Engineering, etc shown by the “keep” tag I added on that sheet.
When in doubt, just tag me in the spreadsheet!

Let’s start with Universities on Priority 1 list:
Berkeley Caltech Carnegie Mellon Columbia Harvard MIT Northwestern Penn Princeton Stanford UCSF University of Texas at Austin University of Washington

2. Find the best link to collect lab websites.

Once you find the target departments for a given university, find the best website url that lists all the lab websites. Use one of these approaches, you only need to fill out one of these columns, the best data source for each department might be different, that’s why I added different options.

Department Page with Lab Links: If the department’s website links directly to all lab websites, use this.
Example: Stanford’s Chemical Engineering department
.
Kia can use this website to scrape all lab websites.
Enter this link in column A of the spreadsheet.
Faculty Directory with Lab Links: If the department’s website lists faculty members and each profile links to their lab, use this.
Example: MIT’s Bioengineering department link.
Kia can use to scrape all lab websites.
Enter this link in column E of the spreadsheet.
PhDs and Postdocs Directory per Department
Some departments like Berkeley Plant Biology give us the full list for all their PhD and Postdocs studying in the given department.
Enter this in column F
Disorganized Department Website → Add individual lab links one by one: If the department website is disorganized, find each lab website individually.
Enter each lab link in column G of the spreadsheet.

Example Workflow

MIT Chemical Engineering:
Main page:
(not useful for scraping).
Specific research area: . (we can use this to Scrape, enter to clumn )
Faculty profile: .
Lab website: .

(There might be more ways to do this. Let me know if you find other strategies!)


Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.