DATA FORMAT

 

For those of you who will be inserting data from the first census, please bring the following files to the database workshop. They should be tab-delimited text files with a header using the column names outlined below. Avoid special characters and quotes (single or double).

 

  1. A species file
  2. A list of all tree measurement codes used in the censuses
  3. A quadrat file
  4. A file with the names of the personnel involved with the census
  5. If you have already completed a census, one file with the tree and stem measurements for each census; that is, with 3 different censuses, 3 different files

 

Following is a detailed explanation of the format for each of the above files. The files may contain more columns of data, but the following describe the variables that the files should contain.

 

 

1. Species file

 

The columns spcode, genus, species, and idlevel are required, the other two are optional.

 

spcode  - a code used in the field to identify the species of the tree

genus – the taxonomic genus name (in case of unknown genus, please use “Unidentified”)

species – the taxonomic species name

family – the taxonomic family name

authority – (optional) author of the species name

idlevel – to indicate whether the species is a valid species or a morphospecies, and if the latter, identified to what level. The possibilities are:

 

species

subspecies

genus

family

none

multiple

 

The latter indicates that the species code may refer to more than one morphospecies.

 

Please check:

a. that this is a complete list of the species codes used in the datasets.

b. misspellings.

 

 

2. Tree measurement codes

 

The following two fields are required.

 

code – one or more letter codes used in the codes field in the census tables

description – a brief description of what the code means

 

Please check that this is a complete list of codes used in all the census datasets.

 

The following descriptions are required in your list (the code you decide to use is up to you):

 

  1. main – to indicate the main stem in multiple-stemmed trees; that is, one of the codes should have description main

 

     For datasets with more than one census:

 

  1. dead  - this code should indicate that the entire tree is dead. Do not use this word in the description of any other code. If only one stem is “dead” and there are other live stems, use the following code (stem lost) or some other description.
  2. stem lost  - to indicate that a stem that was previously 1 cm dbh or above broke off and is still alive but is < 1 cm dbh

 

 

3. Quadrat file

 

This contains a complete list of the quadrat names used in the plot. The following fields are required.

 

quadrat – the name of the quadrat

startx – the x coordinate of the lower left corner of the quadrat

starty – the y coordinate of the lower left corner of the quadrat

dimx – the x dimension of the quadrat

dimy – the y dimension of  the quadrat

 

The x and y coordinates (startx and starty) refer to the distance in meters of the quadrat from the lower left corner of the entire plot.

 

Most of the sites have 20m x 20m quadrats and use the same naming system (0000, 0001, ….0024, 0100, 0101, ….4900, 4901, ….4924) for a 1000m x 500m plot.

 

However, some sites may use 10x10m quadrats or name your quadrats starting from 0101 (no 00 used). There is at least one site where quadrat 0000 refers to a quadrat in the center of the plot.

 

This table should reflect the names used in your plot, as long as the startx and starty of the quadrat indicate where this quadrat is in relation to the lower left corner or the entire plot, assuming that this lower left corner has x,y coordinates 0,0.

 

 

4.     Personnel file

 

This file contains the names of the people who are or were involved with the plot, and the role that they played. The first and last names should be separate. The personnel include the field technicians, the data entry technicians, the supervisors, students, volunteers, as well as the principal investigator, among others.

 

firstname – the first name of the person

lastname -  the last name of the person

role – the role the person played in the census. This should match exactly one of the descriptions in the RoleReference table.

 

If a person has more than one role (for example, he was a field technician in one census, then promoted to field supervisor in a later census), then that name should be entered twice.

 

 

5.     Census data

 

Bring the tree data from each census in a separate file. Each file must have the fields listed below. The columns can be in any order. You may have extra fields in the dataset, but they will not be uploaded into the tables of the database.

 

tag – the tag of the tree (should be unique)

 

stemtag – the tag of the stem. If your site does not use stem tags, you may leave this column blank. The header however, should include this variable name.

 

spcode – the species code of the tree. All species codes should appear in the species file.

 

quadrat – the name of the quadrat the tree is located in

 

lx – the x coordinate in meters of the tree within its quadrat.

ly – the y coordinate in meters of the tree within its quadrat.

These coordinates are the coordinates that result from digitizing.

 

dbh – the diameter of the tree. If there is no diameter measurement (missing, dead, or resprout), please put NA or NULL.

 

codes – tree or measurement codes. If there is more than one code, they have to be delimited with semicolons. This allows for codes with more than one letter. Each and every code should be accounted for in the Codes table in (2) above. The codes field may be left blank if there are no codes.

 

hom – height (in meters) where the diameter was measured, if different from 1.3 m. You may leave this field blank if the stem was measured at 1.3 m, and just fill it in when the hom is different from 1.3

 

date – date the stem was measured. This date should be in yyyy-mm-dd format. Example, 2011-02-24.

 

Note that all the multiple stems should be included in these files – you may indicate in the codes field which one is the main stem. If the tree only has one stem, you do not have to include the main stem code. The rest of the information should be repeated for each multiple stem - make sure that the information (species code, date, etc.) is the same for all multiple stems of the same tree.

 

The dataset for the first census should only contain trees and stems from that census. The dataset for subsequent censuses should contain stems from the previous census, including those that died, and new recruits. All dead trees must have the code dead in the codes field.