Upvote:3
A good starting place would be your local public or academic library. There are a number of publishers of collections of historical newspapers that offer text-searchability, which implies that the newspapers have been scanned and OCRed.
One such is Newspaper Archive, possibly available at your library, which does not limit itself to the 1800s, but covers the geographic and chronological parameters you listed.
As an example of an excellent 1800s-specific source, the Gale Group, a Cengage Learning company, publishes a primary source collection titled Nineteenth Century U.S. Newspapers. Access is by subscription only; however, your local public or academic library may have a subscription that they make available to library cardholders.
Gale's catalog of primary sources includes quite a number of 19th-century English-language titles. I chose this one because I'm familiar with it, having used it before, and my public library provides access.
The description of Nineteenth Century U.S. Newspapers reads:
As a new American nation emerged in the 1800s, the first draft of history was written by those who experienced it and recorded it in newspaper pages from coast to coast. Nineteenth Century U.S. Newspapers provides an as-it-happened window on events, culture, and daily life in nineteenth-century America that is of interest to both professional and general researchers. With 1.8 million pages available, the collection features publications of all kinds, from the political party newspapers at the beginning of the nineteenth century to the mammoth dailies that shaped the nation at the century's end. Major newspapers stand alongside those published by African Americans, Native Americans, womenβs rights groups, labor groups, and the Confederacy. Titles were selected by leading scholars of the nineteenth-century American press, and headnotes have been included for the individual titles.
The collection has been OCRed and the text of the collection is searchable. Researchers can also browse by title and/or publication date or use the advanced search or other research tools. Readers are able to view a document image with search text highlighted, the OCRed plain text, or both side-by-side.
For example, a search for "base ball" finds this 1859 New York Herald article "Cricket versus Base Ball", shown with its OCRed plain text in this image:
I recommend that you start your search at the library. Many libraries have a Virtual Reference Service, AKA chat with a librarian, or you could email them with what you're trying to do.
Note: there may be copyright or other legal restrictions involved in attempting to form a corpus by mass downloading of text. Sampling may be a viable alternative.