This tutorial takes you through the steps to set up data to create a big data file share. A big data file share is an item created in your portal that references feature data (tables, points, polylines, and polygons) on a location available to your GeoAnalytics Server. The big data file share item in your portal enables you to manage and browse for your registered data so that you can run GeoAnalytics Tools on your datasets. Once you create a big data file share, you'll consume the data using the Aggregate Points tool. In this tutorial, you will download a dataset of taxi cab drop-off and pick-up locations and use GeoAnalytics Tools to determine where taxi drop-offs occur more frequently.
Be sure your ArcGIS Enterprise administrator has configured GeoAnalytics Server. To learn more, see Set up ArcGIS GeoAnalytics Server.
Prepare the data
To download and prepare the data used in this example, follow these steps:
- Create a folder named BigDataExample in a location available to your GeoAnalytics Server. Within the BigDataExample folder, create a folder named NYCTaxi.
- Go to https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page and download Yellow Taxi data from January and February 2014 to the BigDataExample > NYCTaxi folder.
Create a big data file share
Once you save the data in a location accessible to all GeoAnalytics Server machines, register it with your GeoAnalytics Server as a big data file share through your portal. A big data file share creates a big data catalog service, which can be consumed in GeoAnalytics Server tools. To create the big data file share, follow these steps:
- Sign in to your ArcGIS Enterprise portal.
The URL is in the format https://webadaptorhost.domain.com/arcgis/home, where arcgis is the name of the web adaptor registered with your portal.
- Browse to Content > New item and select Data store.
- Type a name for the big data file share in the Title field.
- Select the Big Data File Share option. Click Next to move to Step 2: Configure connection.
- Choose the first option for File Share and click Next.
- In the Path field, type the file path to your BigDataExample folder.
For example, for a folder named BigDataExample on Microsoft Windows in a directory named sharedLocation, type \\sharedLocation\BigDataExample. For the same folder path on Linux, type /sharedLocation/BigDataExample.
- Click Next to move to Step 3: Configure servers.
- Wait for the GeoAnalytics Server to validate.
- When a green check mark appears for the Status column, click Add data store.
This creates two items: a big data file share item and a data store item. The big data file share item exposes your datasets so you can review and update properties such as the schema, geometry, and time. The big data file share item corresponds with an underlying big data catalog service available through a URL in the following format:
In the example URL above, FileShareName is the title you specified for the data store when you registered it with the GeoAnalytics Server.
Edit a big data file share
In this tutorial, the big data file share contains one dataset, NYCTaxi, named after the folder in your big data file share.
This dataset has multiple date and time fields. Inspect the dataset to make sure that you're using the correct fields. To edit and view the datasets in the big data file share, browse to your new Big Data File Share portal content item, go to the Datasets page, and click the edit button next to the dataset. When the big data file share is created, the geometry and time parameters are set to use the pick-up information. For this tutorial, you are interested in running analysis on the drop-off locations.
When the big data file share is created, a best guess is applied to find fields used to represent geometry and time.
In this tutorial, you will modify the dataset properties to use the drop-off times and drop-off locations. This means that the analysis will aggregate the drop-off locations instead of the pick-up locations. Either set of geometry (pick-up or drop-off) can be used for analysis. The correct one to use depends on what you are trying to solve. These changes will be made on the Edit Dataset Properties dialog box for your big data file share dataset.
This can also be completed by downloading the manifest, editing, and uploading the edited manifest. To learn more about editing the manifest itself, see Understanding a big data file share manifest.
- In your big data file share item page, browse to the Datasets tab.
- Click the edit button next to your NYCTaxi dataset to edit its properties.
The Edit Dataset Properties dialog box appears.
- The Geometry tab shows that the fields currently used to represent x- and y-values are pickup_longitude and pickup_latitude. Change the values as follows:
- Change the X field value from pickup_longitude to dropoff_longitude.
- Change the Y field value from pickup_latitude to dropoff_latitude.
- The Time tab shows that the field currently used to represent time values is pickup_datetime with the format yyyy-MM-dd HH:mm:ss. Change the time Field setting from pickup_datetime to dropoff_datetime.
- Click the Save button to save the changes to your big data file share dataset.
Run analysis on your taxi data in the ArcGIS Enterprise portal
After you create the data and the big data file share item, browse to the big data file share item in your portal organization to access your datasets. You can use these datasets to run GeoAnalytics Server tools.
Data that's registered with your GeoAnalytics Server is not uploaded to your server; it's only registered with the GeoAnalytics Server and uses a manifest to define the schema.
- In the portal, click Map to go to Map Viewer Classic.
- Click the Analysis button.
If you have both feature and raster analysis available, click Feature Analysis, and click GeoAnalytics Tools > Summarize Data > Aggregate Points.
- Type New York in the Find address or place search bar and click Search.
Your map zooms to the extent of New York City.
- To add the New York City taxi cab dataset as the layer to aggregate into, select Choose Analysis Layer for the first tool parameter. On the dialog box that appears, choose Content and browse to your big data file share. Choose theNew York City taxi cab layer and click Select.
- Aggregate into square bins with a size of 1 kilometer.
- Because the data is time-enabled, you can apply time stepping. From downloading the data, you know that there are two months of data. In this tutorial, examine the first week of each month. To do so, set Time step interval to 1 week, set How often to repeat the time step to 1 month, and set time to align time steps to to January 1st 2017, at 12:00 AM. Although the test data is for 2014, the Aggregate Points tool allows you to align analysis both forward and backward in time.
- Select statistics of interest; some examples are the Mean value of total_amount, or the Variance value of Trip Distance.
- Set the spatial reference to a local New York projection using the following steps:
- Click the settings button to access the analysis settings.
- Choose As specified from the Processing coordinate system drop-down list.
- Click the globe and browse to the UTM Zone 18N by clicking Spatial References > PCS > UTM > WGS 1984 UTM Zone 18 N
- Click OK and click Apply.
- Zoom to the New York City region, ensure that Use current map extent is checked on the Aggregate Points tool, and run the analysis.
The analysis runs on the machines in your GeoAnalytics Server. When the analysis is complete, results are added to your map. Results will be square polygons representing the count of taxi drop-off locations in each polygon as well as the additional statistics you calculated. Your results will have approximately 3,500 to 4,000 features. Results will vary based on the extent of the map on your screen and your time zone.