Configuration

Entwine provides 4 sub-commands for indexing point cloud data:

Command Description
build Generate an EPT dataset from point cloud data
info Gather information about point clouds before building
merge Merge datasets build as subsets

These commands are invoked via the command line as:

entwine <command> <arguments>

Although most options to entwine commands are configurable via command line, each command accepts configuration via JSON. A configuration file may be specified with the -c command line argument.

Command line argument settings are applied in order, so earlier settings can be overwritten by later-specified settings. This includes configuration file arguments, allowing them to be used as templates for common settings that may be overwritten with command line options.

Internally, Entwine CLI invocation builds a JSON configuration that is passed along to the corresponding sub-command, so CLI arguments and their equivalent JSON configuration formats will be described for each command. For example, with configuration file config.json:

{
    "input": "~/data/chicago.laz",
    "output": "~/entwine/chicago"
}

The following Entwine invocations are equivalent:

entwine build -i ~/data/chicago.laz -o ~/entwine/chicago
entwine build -c config.json

Throughout Entwine, a wide variety of point cloud formats are supported as input data, as any PDAL-readable format may be indexed. Paths are not required to be local filesystem paths - they may be local, S3, GCS, Dropbox, or any other Arbiter-readable format.

Each command accepts some common options, detailed at common.

Build

The build command is used to generate an Entwine Point Tile (EPT) dataset from point cloud data.

Key Description
input Path(s) to build
output Output directory
tmp Temporary directory
srs Output coordinate system
reprojection Coordinate system reprojection
threads Number of parallel threads
force Force a new build at this output
dataType Point cloud data storage type
hierarchyType Hierarchy storage type
span Voxel resolution in one dimension
allowOriginId Specify per-point source file tracking
bounds Dataset bounds
schema Attributes to store
trustHeaders Specify whether file headers are trustworthy
absolute Set double precision spatial coordinates
scale Scaling factor for scaled integral coordinates
run Insert a fixed number of files
subset Run a subset portion of a larger build
overflowDepth Depth at which nodes may contain overflow
maxNodeSize Soft point count at which nodes may overflow
minNodeSize Soft minimum on the point count of nodes
cacheSize Number of recently-unused nodes to hold in reserve
hierarchyStep Step size at which to split hierarchy files

input

The point cloud data paths to be indexed. This may be a string, as in:

{ "input": "~/data/autzen.laz" }

This string may be:

  • a file path: ~/data/autzen.laz or s3://entwine.io/sample-data/red-rocks.laz

  • a directory (non-recursive): ~/data or ~/data/*

  • a recursive directory: ~/data/**

  • an info directory path: ~/entwine/info/autzen-files/

  • an info output file: ~/entwine/info/autzen-files/1.json

This field may also be a JSON array of multiples of each of the above strings:

{ "input": ["autzen.laz", "~/data/"] }

Paths that do not contain PDAL-readable file extensions will be silently ignored.

output

A directory for Entwine to write its EPT output. May be local or remote.

tmp

A local directory for Entwine’s temporary data.

srs

Specification for the output coordinate system. Setting this value does not invoke a reprojection, it simply sets the srs field in the resulting EPT metadata.

If input files have coordinate systems specified (and they all match), then this will typically be inferred from the files themselves.

reprojection

Coordinate system reprojection specification. Specified as a JSON object with up to 3 keys.

If only the output projection is specified, then the input coordinate system will be inferred from the file headers. If no coordinate system information can be found for a given file, then this file will not be inserted.

{ "reprojection": { "out": "EPSG:3857" } }

An input SRS may also be specified, which will be overridden by SRS information determined from file headers.

{
    "reprojection": {
        "in": "EPSG:26915",
        "out": "EPSG:3857"
    }
}

To force an input SRS that overrides any file header information, the hammer key should be set to `true.

{
    "reprojection": {
        "in": "EPSG:26915",
        "out": "EPSG:3857" ,
        "hammer": true
    }
}

When using this option, the output value will be set as the coordinate system in the resulting EPT metadata, so the srs option does not need to be specified.

threads

Number of threads for parallelization. By default, a third of these threads will be allocated to point insertion and the rest will perform serialization work.

{ "threads": 9 }

This field may also be an array of two numbers explicitly setting the number of worker threads and serialization threads, with the worker threads specified first.

{ "threads": [2, 7] }

force

By default, if an Entwine index already exists at the output path, any new files from the input will be added to the existing index. To force a new index instead, this field may be set to true.

{ "force": true }

dataType

Specification for the output storage type for point cloud data. Currently acceptable values are laszip, zstandard, and binary. For a binary selection, data is laid out according to the schema. Zstandard data consists of binary data according to the schema that is then compressed with Zstandard compression.

{ "dataType": "laszip" }

hierarchyType

Specification for the hierarchy storage format. Hierarchy information is always stored as JSON, but this field may indicate compression. Currently acceptable values are json and gzip.

{ "hierarchyType": "json" }

span

Number of voxels in each spatial dimension which defines the grid size of the octree. For example, a span value of 256 results in a 256 * 256 * 256 cubic resolution.

allowOriginId

For lossless capability, Entwine inserts an OriginId dimension tracking each point back to their original source file. If this value is present and set to false, this behavior will be disabled.

bounds

Total bounds for all points to be index. These bounds are final, in that they may not be expanded later after indexing has begun. Typically this field does not need to be supplied as it will be inferred from the data itself. This field is specified as an array of the format [xmin, ymin, zmin, xmax, ymax, zmax].

{ "bounds": [0, 500, 30, 800, 1300, 50] }

schema

An array of objects representing the dimensions to be stored in the output. Each dimension is specified with a string name, a string type, and a string size. Typically this field does not need to be specified as it will be inferred from the data itself.

Valid type values are: signed, unsigned, and float.

Size values are the number of bytes used for each dimension. For example, an unsigned type with size 2 is capable of storing any uint16 value. Likewise, n unsigned type with size 4 is capable of storing any uint32.

{
    "schema": [
        { "name": "X", "type": "unsigned", "size": 4 },
        { "name": "Y", "type": "unsigned", "size": 4 },
        { "name": "Z", "type": "unsigned", "size": 4 },
        { "name": "Intensity", "type": "int8", "size": 1 }
    ]
}

trustHeaders

By default, file headers for point cloud formats that contain information like number of points and bounds are considered trustworthy. If file headers are known to be incorrect, this value can be set to false to require a deep scan of all the points in each file.

absolute

Scaled values at a fixed precision are preferred by Entwine (and required for the laszip dataType). To use absolute double-precision values for XYZ instead, this value may be set to true.

scale

A scale factor for the spatial coordinates of the output. An offset will be determined automatically. May be a number like 0.01, or a 3-length array of numbers for non-uniform scaling.

{ "scale": 0.01 }
{ "scale": [0.01, 0.01, 0.025] }

run

If a build should not run to completion of all input files, a run count may be specified to run a fixed maximum number of files. The build may be continued by providing the same output value to a later build.

{ "run": 25 }

subset

Entwine builds may be split into multiple subset tasks, and then be merged later with the merge command. Subset builds must contain exactly the same configuration aside from this subset field.

Subsets are specified with a 1-based id for the task ID and an of key for the total number of tasks. The total number of tasks must be a power of 4.

{ "subset": { "id": 1, "of": 16 } }

overflowDepth

There may be performance benefits by not allowing nodes near the top of the octree to contain overflow. The depth at which overflow may begin is specified by this parameter.

overflowThreshold

For nodes at depths of at least the overflowDepth, this parameter specifies the threshold at which they will split into bisected child nodes.

maxNodeSize

A soft limit on the maximum number of points that may be stored in a data node. This limit is only applicable to points that are “overflow” for a node - so points that fit natively in the span * span * span grid can grow beyond this size.

minNodeSize

A limit on the minimum number of points that may reside in a dedicated node. For would-be nodes containing less than this number, they will be grouped in with their parent node.

cacheSize

When data nodes have not been touched recently during point insertion, they are eligible for serialization. This parameter specifies the number of unused nodes that may be held in memory before serialization, so that if they are used again soon enough they won’t need to be serialized and then reawakened from remote storage.

hierarchyStep

For large datasets with lots of data files, the hierarchy describing the octree layout is split up to avoid large downloads. This value describes the depth modulo at which hierarchy files are split up into child files. In general, this should be set only for testing purposes as Entwine will heuristically determine a value if the output hierarchy is large enough to warrant splitting.

Info

The info command is used to aggregate information about unindexed point cloud data prior to building an Entwine Point Tile dataset.

Most options here are common to build and perform exactly the same function in the info command, aside from output, described below.

Key Description
input Path(s) to build
output Output directory
tmp Temporary directory
srs Output coordinate system
reprojection Coordinate system reprojection
threads Number of parallel threads
trustHeaders Specify whether file headers are trustworthy

output (info)

The output is a directory path to write detailed per-file metadata. This directory may then be used as the input for a build command.

Merge

The merge command is used to combine subset builds into a full Entwine Point Tile dataset. All subsets must be completed.

Note: This command is not used to merge unrelated EPT datasets.

Key Description
output Output directory of subsets
tmp Temporary directory
threads Number of parallel threads

output (merge)

The output path must be a directory containing n completed subset builds, where n is the of value from the subset specification.

Common

Key Description
verbose Enable verbose output
arbiter Remote file access settings for S3, GCS, Dropbox, etc.

verbose

Defaults to false, and setting to true will enable a more verbose output to STDOUT.

arbiter

This value may be set to an object representing settings for remote file access. Amazon S3, Google Cloud Storage, and Dropbox settings can be placed here to be passed along to Arbiter. Some examples follow.

Enable Amazon S3 server-side encryption for the default profile:

{ "arbiter": {
    "s3": {
        "sse": true
    }
} }

Enable IO between multiple S3 buckets with different authentication settings. Profiles other than default must use prefixed paths of the form profile@s3://<path>, for example second@s3://lidar-data/usa:

{ "arbiter": {
    "s3": [
        {
            "profile": "default",
            "access": "<access key here>",
            "secret": "<secret key here>"
        },
        {
            "profile": "second",
            "access": "<access key here>",
            "secret": "<secret key here>",
            "region": "eu-central-1",
            "sse": true
        }
    ]
} }

Setting the S3 profile is also accessible via command line with --profile <profile>, and server-side encryption can be enabled by using --sse.

Miscellaneous

S3

Entwine can read and write S3 paths. The simplest way to make use of this functionality is to install AWSCLI and run aws configure, which will write credentials to ~/.aws.

If you’re using Docker, you’ll need to map that directory as a volume. Entwine’s Docker container runs as user root, so that mapping is as simple as adding -v ~/.aws:/root/.aws to your docker run invocation.

Cesium

Creating 3D Tiles point cloud datasets for display in Cesium is a two-step process.

First, an Entwine Point Tile datset must be created with an output projection of earth-centered earth-fixed, i.e. EPSG:4978:

mkdir ~/entwine
docker run -it -v ~/entwine:/entwine connormanning/entwine build \
    -i https://entwine.io/sample-data/autzen.laz \
    -o /entwine/autzen-ecef \
    -r EPSG:4978

Then, entwine convert must be run to create a 3D Tiles tileset:

docker run -it -v ~/entwine:/entwine connormanning/entwine convert \
    -i /entwine/autzen-ecef \
    -o /entwine/cesium/autzen

Statically serve the tileset locally:

docker run -it -v ~/entwine/cesium:/var/www -p 8080:8080 \
    connormanning/http-server

And browse the tileset with Cesium.