Configuration

Entwine provides 4 sub-commands for indexing point cloud data:

Command	Description
build	Generate an EPT dataset from point cloud data
info	Gather information about point clouds before building
merge	Merge datasets build as subsets

These commands are invoked via the command line as:

entwine <command> <arguments>

Although most options to entwine commands are configurable via command line, each command accepts configuration via JSON. A configuration file may be specified with the -c command line argument.

Command line argument settings are applied in order, so earlier settings can be overwritten by later-specified settings. This includes configuration file arguments, allowing them to be used as templates for common settings that may be overwritten with command line options.

Internally, Entwine CLI invocation builds a JSON configuration that is passed along to the corresponding sub-command, so CLI arguments and their equivalent JSON configuration formats will be described for each command. For example, with configuration file config.json:

{
    "input": "~/data/chicago.laz",
    "output": "~/entwine/chicago"
}

The following Entwine invocations are equivalent:

entwine build -i ~/data/chicago.laz -o ~/entwine/chicago
entwine build -c config.json

Throughout Entwine, a wide variety of point cloud formats are supported as input data, as any PDAL-readable format may be indexed. Paths are not required to be local filesystem paths - they may be local, S3, GCS, Dropbox, or any other Arbiter-readable format.

Each command accepts some common options, detailed at common.

Build

The build command generates an Entwine Point Tile (EPT) dataset from point cloud data.

entwine build (<options>)

Options

Key	Description
input	Input file(s) or directories to include in the build
output	Output directory for the resulting EPT dataset
config	Optional configuration file for templating common options
tmp	Directory for temporary files
srs	Set the SRS metadata entry of the output
reprojection	Reproject input data to a different SRS
hammer	Force use of user-supplied input SRS, overriding file headers
threads	Number of parallel threads
force	Overwrite an existing build instead of continuing it
dataType	Data encoding type for serialized output (`laszip`, `zstandard`, `binary`)
span	Number of voxels in each spatial dimension for data nodes
noOriginId	Disable OriginId tracking for point source files
bounds	Explicit spatial bounds for filtering points
deep	Force full file reads during analysis instead of header-only reads
absolute	Use absolute double-precision XYZ values instead of scaled integers
scale	Set coordinate scale factor
limit	Limit number of files to insert in this build session
subset	Specify a portion of a parallel/subset build
maxNodeSize	Maximum number of points in a node before overflow
minNodeSize	Minimum number of overflowed points before new node creation
cacheSize	Number of nodes cached in memory before serialization
hierarchyStep	Step size for hierarchy file splitting (testing only)
sleepCount	Count per thread after which idle nodes are serialized
progress	Interval (seconds) for progress logging (0 disables)
laz_14	Write LAZ 1.4 content encoding
profile	AWS CLI profile name for S3 access
sse	Enable AWS server-side encryption
requester-pays	Enable AWS S3 requester-pays flag
allow-instance-profile	Allow EC2 instance profile credentials for S3 access

input

The point cloud data paths to be indexed. This may be a string, as in:

{ "input": "~/data/autzen.laz" }

This string may be:

a file path: ~/data/autzen.laz or s3://entwine.io/sample-data/red-rocks.laz
a directory (non-recursive): ~/data or ~/data/*
a recursive directory: ~/data/**
an info directory path: ~/entwine/info/autzen-files/
an info output file: ~/entwine/info/autzen-files/1.json

This field may also be a JSON array of multiples of each of the above strings:

{ "input": ["autzen.laz", "~/data/"] }

Paths that do not contain PDAL-readable file extensions will be silently ignored.

output

A directory for Entwine to write its EPT output. May be local or remote.

config

Path to a JSON configuration file for templating common parameters.
Command-line arguments override configuration file values.

--config template.json -i in.laz -o out

tmp

A local directory for Entwine’s temporary data.

--tmp /tmp/entwine

srs

Specification for the output coordinate system. Setting this value does not invoke a reprojection, it simply sets the srs field in the resulting EPT metadata.

If input files have coordinate systems specified (and they all match), then this will typically be inferred from the files themselves.

reprojection

Coordinate system reprojection specification. Specified as a JSON object with up to 3 keys.

If only the output projection is specified, then the input coordinate system will be inferred from the file headers. If no coordinate system information can be found for a given file, then this file will not be inserted.

--reprojection EPSG:3857
--reprojection EPSG:26915 EPSG:3857

JSON form:

{ "reprojection": { "in": "EPSG:26915", "out": "EPSG:3857" } }

An input SRS may also be specified, which will be overridden by SRS information determined from file headers.

{
    "reprojection": {
        "in": "EPSG:26915",
        "out": "EPSG:3857"
    }
}

To force an input SRS that overrides any file header information, the hammer key should be set to true.

{
    "reprojection": {
        "in": "EPSG:26915",
        "out": "EPSG:3857" ,
        "hammer": true
    }
}

When using this option, the output value will be set as the coordinate system in the resulting EPT metadata, so the srs option does not need to be specified.

threads

Number of threads for parallelization. By default, a third of these threads will be allocated to point insertion and the rest will perform serialization work.

--threads 12

{ "threads": 9 }

This field may also be an array of two numbers explicitly setting the number of worker threads and serialization threads, with the worker threads specified first.

{ "threads": [2, 7] }

force

By default, if an Entwine index already exists at the output path, any new files from the input will be added to the existing index. To force a new index instead, this field may be set to true.

--force

{ "force": true }

dataType

Specification for the output storage type for point cloud data. Currently acceptable values are laszip, zstandard, and binary. For a binary selection, data is laid out according to the schema. Zstandard data consists of binary data according to the schema that is then compressed with Zstandard compression.

--dataType laszip

{ "dataType": "laszip" }

span

Number of voxels in each spatial dimension which defines the grid size of the octree. For example, a span value of 256 results in a 256 * 256 * 256 cubic resolution.

--span 128

noOriginId

Disable insertion of the OriginId dimension, which tracks the original source file for each point.

--noOriginId

bounds

Total bounds for all points to be index. These bounds are final, in that they may not be expanded later after indexing has begun. Typically this field does not need to be supplied as it will be inferred from the data itself. This field is specified as an array of the format [xmin, ymin, zmin, xmax, ymax, zmax].

--bounds 0 0 0 100 100 100
--bounds "[0,0,0,100,100,100]"

{ "bounds": [0, 500, 30, 800, 1300, 50] }

deep

By default, file headers for point cloud formats that contain information like number of points and bounds are considered trustworthy. If file headers are known to be incorrect, this value can be set to true to require a deep scan of all the points in each file.

absolute

Scaled values at a fixed precision are preferred by Entwine (and required for the laszip dataType). To use absolute double-precision values for XYZ instead, this value may be set to true.

scale

A scale factor for the spatial coordinates of the output. An offset will be determined automatically. May be a number like 0.01, or a 3-length array of numbers for non-uniform scaling.

--scale 0.1
--scale "[0.1, 0.1, 0.025]"

{ "scale": 0.01 }

{ "scale": [0.01, 0.01, 0.025] }

limit

If a build should not run to completion of all input files, a limit may be specified to run a fixed maximum number of files. The build may be continued by providing the same output value to a later build.

--limit 20

{ "limit": 25 }

subset

Entwine builds may be split into multiple subset tasks, and then be merged later with the merge command. Subset builds must contain exactly the same configuration aside from this subset field.

Subsets are specified with a 1-based id for the task ID and an of key for the total number of tasks. The total number of tasks must be a power of 4.

--subset 1 4

{ "subset": { "id": 1, "of": 16 } }

maxNodeSize

A soft limit on the maximum number of points that may be stored in a data node. This limit is only applicable to points that are “overflow” for a node - so points that fit natively in the span * span * span grid can grow beyond this size.

minNodeSize

A limit on the minimum number of points that may reside in a dedicated node. For would-be nodes containing less than this number, they will be grouped in with their parent node.

cacheSize

When data nodes have not been touched recently during point insertion, they are eligible for serialization. This parameter specifies the number of unused nodes that may be held in memory before serialization, so that if they are used again soon enough they won’t need to be serialized and then reawakened from remote storage.

hierarchyStep

For large datasets with lots of data files, the hierarchy describing the octree layout is split up to avoid large downloads. This value describes the depth modulo at which hierarchy files are split up into child files. In general, this should be set only for testing purposes as Entwine will heuristically determine a value if the output hierarchy is large enough to warrant splitting.

sleepCount

Serialization frequency for idle nodes (per-thread count before flushing).

progress

Progress logging interval in seconds.
Set to 0 to disable (default: 10).

laz_14

By default, laszip encoded output will be written as LAS 1.2. Set laz_14 to true to write 1.4 data instead.

profile

Specify an AWS CLI profile to use for S3 access.

--profile john

sse

Enable AWS Server-Side Encryption (SSE) for S3 writes.

requester-pays

Enable S3 requester-pays mode.

allow-instance-profile

Allow EC2 instance profile credentials for S3 access.

Info

The info command is used to aggregate information about unindexed point cloud data prior to building an Entwine Point Tile dataset.

Most options here are common to build and perform exactly the same function in the info command, aside from output, described below.

Key	Description
input	Path(s) to build
output	Output directory
tmp	Temporary directory
srs	Output coordinate system
reprojection	Coordinate system reprojection
threads	Number of parallel threads
deep	Specify whether file headers are trustworthy

output (info)

The output is a directory path to write detailed per-file metadata. This directory may then be used as the input for a build command.

Merge

The merge command is used to combine subset builds into a full Entwine Point Tile dataset. All subsets must be completed.

Note: This command is not used to merge unrelated EPT datasets.

Key	Description
output	Output directory of subsets
tmp	Temporary directory
threads	Number of parallel threads

output (merge)

The output path must be a directory containing n completed subset builds, where n is the of value from the subset specification.

Common

Key	Description
verbose	Enable verbose output
arbiter	Remote file access settings for S3, GCS, Dropbox, etc.

verbose

Defaults to false, and setting to true will enable a more verbose output to STDOUT.

arbiter

This value may be set to an object representing settings for remote file access. Amazon S3, Google Cloud Storage, and Dropbox settings can be placed here to be passed along to Arbiter. Some examples follow.

Enable Amazon S3 server-side encryption for the default profile:

{ "arbiter": {
    "s3": {
        "sse": true
    }
} }

Enable IO between multiple S3 buckets with different authentication settings. Profiles other than default must use prefixed paths of the form profile@s3://<path>, for example second@s3://lidar-data/usa:

{ "arbiter": {
    "s3": [
        {
            "profile": "default",
            "access": "<access key here>",
            "secret": "<secret key here>"
        },
        {
            "profile": "second",
            "access": "<access key here>",
            "secret": "<secret key here>",
            "region": "eu-central-1",
            "sse": true
        }
    ]
} }

Setting the S3 profile is also accessible via command line with --profile <profile>, and server-side encryption can be enabled by using --sse.

Miscellaneous

S3

Entwine can read and write S3 paths. The simplest way to make use of this functionality is to install AWSCLI and run aws configure, which will write credentials to ~/.aws.

If you’re using Docker, you’ll need to map that directory as a volume. Entwine’s Docker container runs as user root, so that mapping is as simple as adding -v ~/.aws:/root/.aws to your docker run invocation.

Cesium

Creating 3D Tiles point cloud datasets for display in Cesium is a two-step process.

First, an Entwine Point Tile datset must be created with an output projection of earth-centered earth-fixed, i.e. EPSG:4978:

mkdir ~/entwine
docker run -it -v ~/entwine:/entwine connormanning/entwine build \
    -i https://entwine.io/sample-data/autzen.laz \
    -o /entwine/autzen-ecef \
    -r EPSG:4978

Then, entwine convert must be run to create a 3D Tiles tileset:

docker run -it -v ~/entwine:/entwine connormanning/entwine convert \
    -i /entwine/autzen-ecef \
    -o /entwine/cesium/autzen

Statically serve the tileset locally:

docker run -it -v ~/entwine/cesium:/var/www -p 8080:8080 \
    connormanning/http-server

And browse the tileset with Cesium.