The Clostridium thermocellum F1 celJ gene, encoding endoglucanase J (CelJ), consists of an open reading frame (ORF) of 4,803 nucleotides and encodes a protein of 1,601 amino acids with a molecular weight of 178,055. The ORF was confirmed as celJ by comparison with the N-terminal sequence of a truncated CelJ derivative. CelJ is a modular enzyme composed of N-terminal signal peptide and six domains in the following order: an S-layer homology domain, a domain of unknown function (UD-1), a subfamily E1 endoglucanase domain, a family J endoglucanase domain, a docking domain, and another domain of unknown function (UD-2). UD-1 has no significant similarity to UD-2. CelJ hydrolyzed carboxymethylcellulose and xylan, and xylanase activity was ascribed to the family J domain. Antiserum raised against the truncated CelJ cross-reacted with proteins contained in the cellulosome of C. thermocellum F1. These results strongly suggest that CelJ is equivalent to S2, which was identified as the largest catalytic component in the cellulosome of C. thermocellum YS. A second but incomplete ORF encoding an enzyme classified in subfamily E2 endoglucanase, was located downstream of celJ.