DATA FORMAT
For those of you who will be
inserting data from the first census, please bring the
following files to the database workshop. They should be tab-delimited text
files with a header using the column names outlined below. Avoid special
characters and quotes (single or double).
Following is a detailed
explanation of the format for each of the above files. The files may contain
more columns of data, but the following describe the variables that the files
should contain.
1. Species file
The columns spcode, genus, species, and idlevel
are required, the other two are optional.
spcode
- a code used in the field
to identify the species of the tree
genus
the taxonomic genus name (in case of unknown genus, please use
Unidentified)
species
the taxonomic species name
family
the taxonomic family name
authority (optional) author of the species name
idlevel to indicate whether the species is a valid species
or a morphospecies, and if the latter, identified to
what level. The possibilities are:
species
subspecies
genus
family
none
multiple
The latter indicates that the
species code may refer to more than one morphospecies.
Please check:
a. that this is a complete list of the species codes
used in the datasets.
b. misspellings.
2. Tree
measurement codes
The following two fields are
required.
code one or more letter codes used in the codes field in the census tables
description a brief description of what the code means
Please check that this is a
complete list of codes used in all the census datasets.
The following descriptions
are required in your list (the code you decide to use is up to you):
For datasets with more than one census:
3. Quadrat file
This contains a complete list
of the quadrat names used in the plot. The following
fields are required.
quadrat the name of the quadrat
startx the x coordinate of the lower left corner of the quadrat
starty the y coordinate of the lower left corner of the quadrat
dimx the x dimension of the quadrat
dimy the y dimension of
the quadrat
The x and y coordinates (startx and starty) refer to the
distance in meters of the quadrat from the lower left
corner of the entire plot.
Most of the sites have 20m x
20m quadrats and use the same naming system (0000, 0001,
.0024, 0100, 0101,
.4900, 4901,
.4924) for a 1000m
x 500m plot.
However, some sites may use
10x10m quadrats or name your quadrats
starting from 0101 (no 00 used). There is at least one site where quadrat 0000 refers to a quadrat
in the center of the plot.
This table should reflect the
names used in your plot, as long as the startx and starty of the quadrat indicate
where this quadrat is in relation to the lower left
corner or the entire plot, assuming that this lower left corner has x,y coordinates 0,0.
4.
Personnel file
This file contains the names
of the people who are or were involved with the plot, and the role that they
played. The first and last names should be separate. The personnel include the
field technicians, the data entry technicians, the supervisors, students,
volunteers, as well as the principal investigator, among others.
firstname the first name of the person
lastname - the last
name of the person
role
the role the person played in the census. This should match exactly one of
the descriptions in the RoleReference table.
If a person has more than one
role (for example, he was a field technician in one census, then promoted to
field supervisor in a later census), then that name should be entered twice.
5.
Census data
Bring the tree data from each
census in a separate file. Each file must have the fields listed below. The
columns can be in any order. You may have extra fields in the dataset, but they
will not be uploaded into the tables of the database.
tag the tag of the tree (should be unique)
stemtag the tag of the stem. If your site does not use stem
tags, you may leave this column blank. The header however, should include this
variable name.
spcode the species code of the tree. All species codes
should appear in the species file.
quadrat the name of the quadrat the
tree is located in
lx the x coordinate in meters of the tree within its quadrat.
ly the y coordinate in meters of the tree within
its quadrat.
These coordinates are the
coordinates that result from digitizing.
dbh the diameter of the tree. If there is no diameter
measurement (missing, dead, or resprout), please put NA or NULL.
codes
tree or measurement codes. If there is more than one code, they have to be
delimited with semicolons. This allows for codes with more than one letter.
Each and every code should be accounted for in the Codes table in (2) above.
The codes field may be left blank if there are no codes.
hom height (in meters) where the diameter was measured, if
different from 1.3 m. You may leave this field blank if the stem was measured
at 1.3 m, and just fill it in when the hom is
different from 1.3
date date the stem was measured. This date should be in yyyy-mm-dd format. Example, 2011-02-24.
Note that all the multiple
stems should be included in these files you may indicate in the codes field
which one is the main stem. If the tree only has one stem, you do not
have to include the main stem code. The rest of the information should be
repeated for each multiple stem - make sure that the information (species code,
date, etc.) is the same for all multiple stems of the same tree.
The dataset for the first
census should only contain trees and stems from that census. The dataset for
subsequent censuses should contain stems from the previous census, including
those that died, and new recruits. All dead trees must have the code dead in the codes field.