From nycsubway.org
Welcome to our guide to web scraping with R, a collection of articles and tutorials which walk you through how to automate grabbing data from the web and unpacking it into a data frame. The first step is to look at the source you want to scrape. Pull up the “developer tools” section in your favorite web browser and look at the page. FMiner is a software for web scraping, web data extraction, screen scraping, web harvesting, web crawling and web macro support for windows and Mac OS X. It is an easy to use web data extraction tool that combines best-in-class features with an intuitive visual project design tool, to make your next data mining project a breeze. Web Scraping with R. There are several different R packages that can be used to download web pages and then extract data from them. In general, you’ll want to download files first, and then process them later. It’s easy to make a mistake in processing, so you’ll want to be working from local copies of the files, not retrieving them from a.
Roster
Designation | Unit Numbers | Manufacturer | Years |
---|---|---|---|
R-27 | 8020-8249 | St. Louis Car | 1960 |
R-30 | 8250-8351 | St. Louis Car | 1961 |
R-30 | 8412-8569 | St. Louis Car | 1961 |
R-30A | 8352-8411 | St. Louis Car | 1961 |
Web Scraping In R Datacamp
- Cab Arrangement: Half-width driving cab at 'A' end, half-width conductor control cab at 'B' end
- Coupling/Numbering Arrangement: All married pairs.
- The last of the R-30 cars were retired from passenger service in1993. The Electric Railroaders' Association sponsored a Farewell to the R-30 fan trip.
Datasheets
Photo Gallery
Five Random Images | ||||
Image 4467 (163k, 1024x661) Photo by: Joe Testagrose Location: Hewes Street | Image 4534 (177k, 1024x669) Photo by: Joe Testagrose Location: 39th/Beebe Aves. | Image 7563 (250k, 1044x746) Photo by: Mark S. Feinman Location: Smith/9th Street | Image 24496 (449k, 1200x800) Collection of: David Pirmann Location: 205th Street | Image 40261 (240k, 1024x685) Photo by: Doug Grotjahn Collection of: Joe Testagrose Location: Avenue H |
Rvest R
Web Scraping In R
Car Notes
Green | Preserved, saved for preservation, or exists in some state | Yellow | Converted to work service (and later scrapped or still in use) | Red | Wrecked/Damaged in accident (and possibly repaired), or scrapped prior to the bulk of the type |
Number | Notes | ||||
---|---|---|---|---|---|
8027, 8396 | Equipped early with individual guard lights for each door leaf. | ||||
8120 | Paired for a time with R-16 6318; mate 8121 had been involved in a fire. | ||||
8145, 8246 | Was at Pitkin Yard, used as a school car. Scrapped October 2013. 8246 also retained as a work car along with 8145, but survived only a short time.
| ||||
8176-8177 | Heavily damaged in accident in Jamaica Yard in 1963. Repaired and returned to service by 11/67 (opening of Chrystie St.) It is believed one car had a storm door from an R-16.
| ||||
8202-8203, 8236-8237, 8512 | Destroyed/damaged by fire at Metropolitan Ave., 1976. At least 8236 was repaired and returned to service.
| ||||
8217 | In accident at Coney Island Yard with BMT Standard 2761. Car repaired and returned to service by using end of R-16 6494.
| ||||
8265-8336 | Survives at Concourse Yard, used as a school car. Scrapped by 2013.
| ||||
8275, 8394, 8397, 8401, 8408 | Several R-30 cars went west to Hollywood where they were used in the films Die Hard with a Vengeance, Blade, and Money Train. 8408 appeared in all three films. Car 8394 returned to New York City in 2014 and was installed in an ASICS athletic gear shop in Times Square.
| ||||
8289-8290 | Had been installed at Coney Island Yard, used as a police training facility. In July 2007, these cars ware moved to the SBK yard for asbestos abatement and scrapping. Left property by barge January 2008 for 'reefing'.
| ||||
8293, 8392, 8521 | Delivered with rough exterior paint surface. | ||||
8337 | Was installed at the East New York High School of Transit Technology. To be replaced in 2009 with a newer car (R-42 4737).
| ||||
8392-8401 | Were used at Coney Island Yard, used as fire training school cars. They were replaced in July 2004, with R-110B 3004 and 3006. 8392 and 8401 were moved to the SBK yard for asbestos abatement and scrapping, July 2007.
| ||||
8424-8425 | Was at Coney Island Yard, used as school car. Scrapped October 2013.
| ||||
8429, 8558 | Converted to Rail Adhesion Cars.
| ||||
8463 | Was at 36th St. Yard (yard office?). Scrapped October 2013.
| ||||
8506 | New York Transit Museum collection.
| ||||
8507, 8545 | Collision on Astoria Line at 31st St., 5/22/1975. 8507's mate (8506) went to Transit Museum. 8507 scrapped, 8545 repaired and returned to service. | ||||
8522, 8481 | Used as office space in 207th St. Yard.
|
|
rvest is new package that makes it easy to scrape (or harvest) data from html web pages, inspired by libraries like beautiful soup. It is designed to work with magrittr so that you can express complex operations as elegant pipelines composed of simple, easily understood pieces. Install it with:
rvest in action
To see rvest in action, imagine we’d like to scrape some information about The Lego Movie from IMDB. We start by downloading and parsing the file with html()
:
To extract the rating, we start with selectorgadget to figure out which css selector matches the data we want: strong span
. (If you haven’t heard of selectorgadget, make sure to read vignette('selectorgadget')
- it’s the easiest way to determine which selector extracts the data that you’re interested in.) We use html_node()
to find the first node that matches that selector, extract its contents with html_text()
, and convert it to numeric with as.numeric()
:
We use a similar process to extract the cast, using html_nodes()
to find all nodes that match the selector:
The titles and authors of recent message board postings are stored in a the third table on the page. We can use html_node()
and [[
to find it, then coerce it to a data frame with html_table()
:
Other important functions
If you prefer, you can use xpath selectors instead of css:
html_nodes(doc, xpath = '//table//td')
).Extract the tag names with
html_tag()
, text withhtml_text()
, a single attribute withhtml_attr()
or all attributes withhtml_attrs()
.Detect and repair text encoding problems with
guess_encoding()
andrepair_encoding()
.Navigate around a website as if you’re in a browser with
html_session()
,jump_to()
,follow_link()
,back()
, andforward()
. Extract, modify and submit forms withhtml_form()
,set_values()
andsubmit_form()
. (This is still a work in progress, so I’d love your feedback.)
To see these functions in action, check out package demos with demo(package = 'rvest')
.