Configuration¶
Entwine provides 4 sub-commands for indexing point cloud data:
Command | Description |
---|---|
build | Generate an EPT dataset from point cloud data |
info | Gather information about point clouds before building |
merge | Merge datasets build as subsets |
These commands are invoked via the command line as:
entwine <command> <arguments>
Although most options to entwine commands are configurable via command line,
each command accepts configuration via JSON. A
configuration file may be specified with the -c
command line argument.
Command line argument settings are applied in order, so earlier settings can be overwritten by later-specified settings. This includes configuration file arguments, allowing them to be used as templates for common settings that may be overwritten with command line options.
Internally, Entwine CLI invocation builds a JSON configuration that is passed
along to the corresponding sub-command, so CLI arguments and their equivalent
JSON configuration formats will be described for each command. For example,
with configuration file config.json
:
{
"input": "~/data/chicago.laz",
"output": "~/entwine/chicago"
}
The following Entwine invocations are equivalent:
entwine build -i ~/data/chicago.laz -o ~/entwine/chicago
entwine build -c config.json
Throughout Entwine, a wide variety of point cloud formats are supported as input data, as any PDAL-readable format may be indexed. Paths are not required to be local filesystem paths - they may be local, S3, GCS, Dropbox, or any other Arbiter-readable format.
Each command accepts some common options, detailed at common.
Build¶
The build
command is used to generate an
Entwine Point Tile (EPT) dataset from
point cloud data.
Key | Description |
---|---|
input | Path(s) to build |
output | Output directory |
tmp | Temporary directory |
srs | Output coordinate system |
reprojection | Coordinate system reprojection |
threads | Number of parallel threads |
force | Force a new build at this output |
dataType | Point cloud data storage type |
hierarchyType | Hierarchy storage type |
span | Voxel resolution in one dimension |
allowOriginId | Specify per-point source file tracking |
bounds | Dataset bounds |
schema | Attributes to store |
trustHeaders | Specify whether file headers are trustworthy |
absolute | Set double precision spatial coordinates |
scale | Scaling factor for scaled integral coordinates |
run | Insert a fixed number of files |
subset | Run a subset portion of a larger build |
overflowDepth | Depth at which nodes may contain overflow |
maxNodeSize | Soft point count at which nodes may overflow |
minNodeSize | Soft minimum on the point count of nodes |
cacheSize | Number of recently-unused nodes to hold in reserve |
hierarchyStep | Step size at which to split hierarchy files |
input¶
The point cloud data paths to be indexed. This may be a string, as in:
{ "input": "~/data/autzen.laz" }
This string may be:
a file path:
~/data/autzen.laz
ors3://entwine.io/sample-data/red-rocks.laz
a directory (non-recursive):
~/data
or~/data/*
a recursive directory:
~/data/**
an info directory path:
~/entwine/info/autzen-files/
an info output file:
~/entwine/info/autzen-files/1.json
This field may also be a JSON array of multiples of each of the above strings:
{ "input": ["autzen.laz", "~/data/"] }
Paths that do not contain PDAL-readable file extensions will be silently ignored.
output¶
A directory for Entwine to write its EPT output. May be local or remote.
tmp¶
A local directory for Entwine’s temporary data.
srs¶
Specification for the output coordinate system. Setting this value does not
invoke a reprojection, it simply sets the srs
field in the resulting EPT
metadata.
If input files have coordinate systems specified (and they all match), then this will typically be inferred from the files themselves.
reprojection¶
Coordinate system reprojection specification. Specified as a JSON object with up to 3 keys.
If only the output projection is specified, then the input coordinate system will be inferred from the file headers. If no coordinate system information can be found for a given file, then this file will not be inserted.
{ "reprojection": { "out": "EPSG:3857" } }
An input SRS may also be specified, which will be overridden by SRS information determined from file headers.
{
"reprojection": {
"in": "EPSG:26915",
"out": "EPSG:3857"
}
}
To force an input SRS that overrides any file header information, the hammer
key should be set to `true.
{
"reprojection": {
"in": "EPSG:26915",
"out": "EPSG:3857" ,
"hammer": true
}
}
When using this option, the output
value will be set as the coordinate system
in the resulting EPT metadata, so the srs
option does not need to be
specified.
threads¶
Number of threads for parallelization. By default, a third of these threads will be allocated to point insertion and the rest will perform serialization work.
{ "threads": 9 }
This field may also be an array of two numbers explicitly setting the number of worker threads and serialization threads, with the worker threads specified first.
{ "threads": [2, 7] }
force¶
By default, if an Entwine index already exists at the output
path, any new
files from the input
will be added to the existing index. To force a new
index instead, this field may be set to true
.
{ "force": true }
dataType¶
Specification for the output storage type for point cloud data. Currently
acceptable values are laszip
, zstandard
, and binary
. For a binary
selection, data is laid out according to the schema. Zstandard
data consists of binary data according to the schema that is then
compressed with Zstandard compression.
{ "dataType": "laszip" }
hierarchyType¶
Specification for the hierarchy storage format. Hierarchy information is
always stored as JSON, but this field may indicate compression. Currently
acceptable values are json
and gzip
.
{ "hierarchyType": "json" }
span¶
Number of voxels in each spatial dimension which defines the grid size of the
octree. For example, a span
value of 256
results in a 256 * 256 * 256
cubic resolution.
allowOriginId¶
For lossless capability, Entwine inserts an OriginId dimension tracking each
point back to their original source file. If this value is present and set to
false
, this behavior will be disabled.
bounds¶
Total bounds for all points to be index. These bounds are final, in that they
may not be expanded later after indexing has begun. Typically this field does
not need to be supplied as it will be inferred from the data itself. This field
is specified as an array of the format [xmin, ymin, zmin, xmax, ymax, zmax]
.
{ "bounds": [0, 500, 30, 800, 1300, 50] }
schema¶
An array of objects representing the dimensions to be stored in the output.
Each dimension is specified with a string name
and a string type
. Typically
this field does not need to be specified as it will be inferred from the data
itself.
Valid type
values are: int8
, int16
, int32
, int64
, uint8
, uint16
,
uint32
, uint64
, float
, and double
.
{
"schema": [
{ "name": "X", "type": "uint32" },
{ "name": "Y", "type": "uint32" },
{ "name": "Z", "type": "uint32" },
{ "name": "Intensity", "type": "int8" }
]
}
trustHeaders¶
By default, file headers for point cloud formats that contain information like
number of points and bounds are considered trustworthy. If file headers are
known to be incorrect, this value can be set to false
to require a deep scan
of all the points in each file.
absolute¶
Scaled values at a fixed precision are preferred by Entwine (and required for
the laszip
dataType). To use absolute double-precision values
for XYZ instead, this value may be set to true
.
scale¶
A scale factor for the spatial coordinates of the output. An offset will be
determined automatically. May be a number like 0.01
, or a 3-length array of
numbers for non-uniform scaling.
{ "scale": 0.01 }
{ "scale": [0.01, 0.01, 0.025] }
run¶
If a build should not run to completion of all input files, a run
count may be
specified to run a fixed maximum number of files. The build may be continued
by providing the same output
value to a later build.
{ "run": 25 }
subset¶
Entwine builds may be split into multiple subset tasks, and then be merged later
with the merge command. Subset builds must contain exactly the same
configuration aside from this subset
field.
Subsets are specified with a 1-based id
for the task ID and an of
key for
the total number of tasks. The total number of tasks must be a power of 4.
{ "subset": { "id": 1, "of": 16 } }
overflowDepth¶
There may be performance benefits by not allowing nodes near the top of the octree to contain overflow. The depth at which overflow may begin is specified by this parameter.
overflowThreshold¶
For nodes at depths of at least the overflowDepth
, this parameter specifies
the threshold at which they will split into bisected child nodes.
maxNodeSize¶
A soft limit on the maximum number of points that may be stored in a data node.
This limit is only applicable to points that are “overflow” for a node - so
points that fit natively in the span * span * span
grid can grow beyond this
size.
minNodeSize¶
A limit on the minimum number of points that may reside in a dedicated node. For would-be nodes containing less than this number, they will be grouped in with their parent node.
cacheSize¶
When data nodes have not been touched recently during point insertion, they are eligible for serialization. This parameter specifies the number of unused nodes that may be held in memory before serialization, so that if they are used again soon enough they won’t need to be serialized and then reawakened from remote storage.
hierarchyStep¶
For large datasets with lots of data files, the hierarchy describing the octree layout is split up to avoid large downloads. This value describes the depth modulo at which hierarchy files are split up into child files. In general, this should be set only for testing purposes as Entwine will heuristically determine a value if the output hierarchy is large enough to warrant splitting.
Info¶
The info
command is used to aggregate information about unindexed point cloud
data prior to building an Entwine Point Tile dataset.
Most options here are common to build
and perform exactly the same function in
the info
command, aside from output
, described below.
Key | Description |
---|---|
input | Path(s) to build |
output | Output directory |
tmp | Temporary directory |
srs | Output coordinate system |
reprojection | Coordinate system reprojection |
threads | Number of parallel threads |
trustHeaders | Specify whether file headers are trustworthy |
output (info)¶
The output
is a directory path to write detailed per-file metadata. This
directory may then be used as the input
for a build command.
Merge¶
The merge
command is used to combine subset builds into a full
Entwine Point Tile dataset. All subsets must be completed.
Note: This command is not used to merge unrelated EPT datasets.
Key | Description |
---|---|
output | Output directory of subsets |
tmp | Temporary directory |
threads | Number of parallel threads |
output (merge)¶
The output path must be a directory containing n
completed subset builds,
where n
is the of
value from the subset specification.
Common¶
Key | Description |
---|---|
verbose | Enable verbose output |
arbiter | Remote file access settings for S3, GCS, Dropbox, etc. |
verbose¶
Defaults to false
, and setting to true
will enable a more verbose output to STDOUT.
arbiter¶
This value may be set to an object representing settings for remote file access. Amazon S3, Google Cloud Storage, and Dropbox settings can be placed here to be passed along to Arbiter. Some examples follow.
Enable Amazon S3 server-side encryption for the default profile:
{ "arbiter": {
"s3": {
"sse": true
}
} }
Enable IO between multiple S3 buckets with different authentication settings. Profiles other than default
must use prefixed paths of the form profile@s3://<path>
, for example second@s3://lidar-data/usa
:
{ "arbiter": {
"s3": [
{
"profile": "default",
"access": "<access key here>",
"secret": "<secret key here>"
},
{
"profile": "second",
"access": "<access key here>",
"secret": "<secret key here>",
"region": "eu-central-1",
"sse": true
}
]
} }
Setting the S3 profile is also accessible via command line with --profile <profile>
, and server-side encryption can be enabled by using --sse
.
Miscellaneous¶
S3¶
Entwine can read and write S3 paths. The simplest way to make use of this
functionality is to install AWSCLI and run
aws configure
, which will write credentials to ~/.aws
.
If you’re using Docker, you’ll need to map that directory as a volume.
Entwine’s Docker container runs as user root
, so that mapping is as simple as
adding -v ~/.aws:/root/.aws
to your docker run
invocation.
Cesium¶
Creating 3D Tiles point cloud datasets for display in Cesium is a two-step process.
First, an Entwine Point Tile datset must be created with an output projection of
earth-centered earth-fixed, i.e. EPSG:4978
:
mkdir ~/entwine
docker run -it -v ~/entwine:/entwine connormanning/entwine build \
-i https://entwine.io/sample-data/autzen.laz \
-o /entwine/autzen-ecef \
-r EPSG:4978
Then, entwine convert
must be run to create a 3D Tiles tileset:
docker run -it -v ~/entwine:/entwine connormanning/entwine convert \
-i /entwine/autzen-ecef \
-o /entwine/cesium/autzen
Statically serve the tileset locally:
docker run -it -v ~/entwine/cesium:/var/www -p 8080:8080 \
connormanning/http-server
And browse the tileset with Cesium.