Templates /
How to Scrape AngelList for Investors in a Particular City using Import.io

How to Scrape AngelList for Investors in a Particular City using Import.io

Run this checklist to scrape AngelList investors in a particular city using Import.io
1
Introduction
2
Browse to AngelList and pick the city you want to scrape
3
Download Import.io
4
Configure Import.io crawler for AngelList
5
Extract source code from AngelList for URL list
6
Parse source code with BuzzStream Tool
7
Format list in excel
8
Start Scrape
9
Enjoy your freshly minted list of investors

Introduction

Fundraising is not easy. It can be a long and hard journey.

But whatever your approach to fundraising is, the first step is usually getting a list of target investors to approach. 

We all know AngelList is the best place to find investors, but who wants to troll through thousands of profile pages to figure out which investors invest in your location / model / industry. 

Wouldn’t it be easier to just have them all in an excel spreadsheet you could skim through in a few minutes? 

In this checklist, I’m going to show you exactly how to do just that. 

Check out step 2 to get started.

If you want to hand this off to a Virtual Assistant, make sure to create a Process Street account so you can collaborate with them on scraping the whole investor world! 

Browse to AngelList and pick the city you want to scrape

Our first step is to browse AngelList for the city you wish to search. Remember to store data from this step in the form fields below to easily reference this data in future.




Head over to AngelList

The first step is identifying the city you want to scrape, for this example I chose Boston. 

Type "boston" in the AngelList search, then click on Investors.

It should bring you to this URL: https://angel.co/boston/investors

Here you can see there are currently 847 investors who reside in Boston.

Download Import.io

Next, you need to make sure that you grab Import.io, which will serve as our scraping tool.

It’s currently free to use and available on Mac and PC.

Configure Import.io crawler for AngelList

This step can take a little while, but you must now configure Import.io to crawl through your AngelList results correctly. Start by recording the names of the top five investors in your search results and their page URLs, then move on to see how to configure Import.io step-by-step.






Open import.io and start a new "Crawler" campaign

Browse to Boston Investors and select the first investor. 

Select, single product page option.

Add a new column

Name the column "name"

Select the investors name on the page (it might be white but you can hover over it to see it), then select "Train"

Complete the rest of the fields you want. Fields I use in this example are:

  1. name
  2. summary
  3. twitter (note you need to select the link option for twitter URLs as shown below)
  4. locations
  5. markets
  6. previous investments

You now need to repeat this 4 more times on different investor profile pages. Click "Add another page"

Browse to the second investor in the list of Boston investors.

Import.io will automatically try to source the data for you, but usually elements on a page move around depending on what fields the investor decided to complete. Here you can see the Twitter URL didn’t work.

Fixing this is easy, just select the column and re-train it. 

Continue this process 3 more times, until you complete 5 pages.

Upload your campaign to import.io

Once finished, run the crawler.

Extract source code from AngelList for URL list

Now it’s time to get the list of URLs you want to scrape. AngelList is a HUGE site with a ton of pages that are not logically connected via their URL structure.

So, the only way I found to actually scrape all the investors from a specific location is to pull all their profile URLs first, then import them into import.io

This is how you do it. 

First, open your normal browser (Firefox/Chrome) and browse to the Boston investors. 

Next, scroll to the bottom and click "MORE"

Keep clicking "MORE" until you get to the BOTTOM of the list. Make sure all investors are loaded, this is very important.

Once at the bottom of the list, right-click on the page and click "inspect element"

Select the opening HTML tag from the code, right click and select "Edit HTML".

Select all (cmd+a) and copy (cmd+c) all the code to your clipboard.

Save a copy of this code in the form field below, to save yourself a headache if you want to see this again later.


Parse source code with BuzzStream Tool

Now that you have the source code from the investor page, you need to create a .csv from it using BuzzStream’s tool. Once you have created this file, save a copy of it with the form field below.

Head over to BuzzStream’s URL extractor tool

http://tools.buzzstream.com/link-building-extract-urls

Paste in the code you just copied. 

Click "Create CSV"

Format list in excel

Now you need to format your new CSV in Excel. Once you have done this (see below for details), upload a copy of the file to the form field below.

Open the CSV you made in Excel. 

Delete the top few rows until you get to the start of the investors.

Delete the bottom rows after the investors finish.

De-duplicate the URL list as shown below.

Finally, delete columns B and C, so you are left with a list like the below. The list should be between 50-100% bigger than your investor list results in AngelList.

Here you can see my investor results were 847 and my spreadsheet is 1370 rows after formatting. This is a good list to be left with.

Copy this list to your clipboard and head back over to import.io

Start Scrape

Now it’s finally time to start scraping!

Head back over to Import.io, paste in the URLs from the spreadsheet and click "Go".

Enjoy your freshly minted list of investors

Once your scrape is complete, download a copy of the spreadsheet and go get ’em! Before you do, however, take a moment to upload a copy of the final spreadsheet to the form field below for safe keeping.

For reference, whilst your scrape is running and you should see a screen like the one below. 

Check out my finished spreadsheet below:

Take control of your workflows today.