The last few months have seen an evolution in the COBOL compiler. Up until now it could use either CP1252/ASCII or CP1140/EBCDIC to represent alphanumeric variables and numeric types that are stored as character strings. With these changes, those types can be represented in many other single-byte encodings, as well as UTF16 and UTF32 encodings. These changes required extensive changes. 1) The initial parsing has to handle the extended capabilities. 2) Each run-time variable designates its character set. 3) The run-time code has to be able to handle wide characters. Since the development took place over a period of time, other changes crept in. In particular, there is an expansion of bindings making certain POSIX functions available to the COBOL programmer. There has also been an expansion of gcobol's use of the GCC diagnostic framework. Co-Authored-By: Robert Dubner <rdubner@symas.com> Co-Authored-By: James K. Lowden <jklowden@cobolworx.com> gcc/cobol/ChangeLog: * cbldiag.h (struct cbl_loc_t): Diagnostics. (enum cbl_diag_id_t): Diagnostics. * cdf.y: Includes. * cobol1.cc (cobol_warning_suppress): Diagnostics. (cobol_langhook_handle_option): Implement -fexec-charset. Expand the use of diagnostics. * gcobc: Expand options and warnings. * gcobol.1: Documentation. * genapi.cc (level_88_helper): Charsets. (get_level_88_domain): Charsets. (get_class_condition_string): Charsets. (function_pointer_from_name): Charsets. (initialize_variable_internal): Charsets. (parser_initialize): Charsets. (get_binary_value_from_float): Charsets. (get_bytes_needed): Charsets. (cobol_compare): Charsets. (move_tree): Eliminate function. (move_tree_to_field): Eliminate function. (get_string_from): Eliminate function. (parser_init_list): Charsets. (psa_FldLiteralN): Charsets. (parser_accept_date_yymmdd): Charsets. (parser_accept_date_yyyymmdd): Charsets. (parser_accept_date_yyddd): Charsets. (parser_accept_date_yyyyddd): Charsets. (parser_accept_date_dow): Charsets. (parser_accept_date_hhmmssff): Charsets. (parser_alphabet): Charsets. (parser_alphabet_use): Charsets. (parser_display_internal): Charsets. (get_literalN_value): Charsets. (tree_type_from_field_type): Charsets. (program_end_stuff): Charsets. (walk_initialization): Charsets. (parser_xml_parse): Charsets. (initialize_the_data): Charsets. (establish_using): Charsets. (parser_setop): Charsets. (parser_set_conditional88): Charsets. (parser_file_add): Charsets. (get_the_filename): Eliminate function. (parser_file_open): Charsets. (parser_file_delete_file): Charsets. (parser_file_start): Charsets. (parser_module_name): Charsets. (parser_intrinsic_find_string): New function. (parser_intrinsic_numval_c): Charsets. (parser_intrinsic_convert): New function. (parser_intrinsic_call_1): Charsets. (create_and_call): Charsets. (mh_identical): Charsets. (mh_source_is_literalN): Charsets. (float_type_of): Charsets. (mh_dest_is_float): Charsets. (mh_numeric_display): Charsets. (mh_little_endian): Charsets. (mh_source_is_group): Charsets. (mh_source_is_literalA): Charsets. (move_helper): Charsets. (binary_initial): Eliminate function. (digits_from_int128): Eliminate function. (digits_from_float128): Eliminate function. (initial_from_initial): Eliminate function. (convert_data_initial): New function. (actually_create_the_static_field): Charsets. (psa_new_var_decl): Charsets. (psa_FldLiteralA): Charsets. (parser_local_add): Charsets. (parser_symbol_add): Charsets. * genapi.h (parser_intrinsic_convert): New function. (parser_intrinsic_find_string): New function. * genmath.cc (arithmetic_operation): Charsets. (largest_binary_term): Charsets. (fast_add): Charsets. (fast_subtract): Charsets. (fast_multiply): Charsets. (fast_divide): Charsets. (parser_subtract): Fix subtract float from float. * genutil.cc (get_any_capacity): Charsets. (get_and_check_refstart_and_reflen): Charsets. (get_data_offset): Charsets. (get_binary_value): Charsets. (tree_type_from_field): Charsets. (copy_little_endian_into_place): Charsets. (get_literal_string): Charsets. (refer_is_clean): Charsets. (refer_fill_depends): Charsets. (refer_size_source): Comment. * lang-specs.h: Charsets. * lang.opt: Charsets. * lexio.cc (parse_copy_directive): Diagnostics. * messages.cc (cbl_diagnostic_kind): Diagnostics. (cobol_warning_suppress): Diagnostics. * parse.y: Many changes for charsets and diagnostics. * parse_ante.h (MAXLENGTH_FORMATTED_DATE): Charsets. (MAXLENGTH_FORMATTED_TIME): Charsets. (MAXLENGTH_CALENDAR_DATE): Charsets. (MAXLENGTH_FORMATTED_DATETIME): Charsets. (consistent_encoding_check): Charsets. (enum data_clause_t): Charsets. (new_alphanumeric): Charsets. (name_of): Charsets. (class eval_subject_t): Charsets. (struct domain_t): Charsets. (struct file_list_t): Charsets. (current_encoding): Charsets. (new_tempnumeric): Charsets. (is_integer_literal): Charsets. (new_literal): Charsets. (new_constant): Charsets. (conditional_set): Charsets. (field_find): Charsets. (valid_redefine): Charsets. (field_value_all): Charsets. (parent_has_picture): Charsets. (parent_has_value): Charsets. (blank_pad_initial): Charsets. (blankit): Charsets. (cbl_field_t::blank_initial): Charsets. (value_encoding_check): Charsets. (cbl_field_t::set_initial): Charsets. (field_alloc): Charsets. (parser_move_carefully): Charsets. (data_division_ready): Charsets. (anybody_redefines): Charsets. (procedure_division_ready): Charsets. (file_section_parent_set): Charsets. (field_binary_usage): Charsets. (goodnight_gracie): Formatting. * scan.l: Charsets. * scan_ante.h (numstr_of): Charsets. (typed_name): Charsets. * show_parse.h: Charsets. * structs.cc (create_cblc_file_t): Charsets. * symbols.cc (symbol_table_extend): Charsets. (WARNING_FIELD): Diagnostics. (constq): Charsets. (elementize): Charsets. (field_size): Charsets. (cbl_field_t::set_attr): Eliminate run-time component. (cbl_field_t::clear_attr): Eliminate run-time component. (field_memsize): Charsets. (cbl_encoding_str): Charsets. (symbols_dump): Charsets. (is_variable_length): Formatting. (field_str): Charsets. (extend_66_capacity): Charsets. (operator<<): Charsets. (symbols_update): Charsets. (symbol_field_parent_set): Charsets. (symbol_table_init): Charsets. (numeric_group_attrs): Charsets. (symbol_field_add): Charsets. (symbol_field_alias): Charsets. (fd_record_size_cmp): Charsets. (symbol_file_record_sizes): Charsets. (cbl_alphabet_t::reencode): Charsets. (symbol_temporary_location): Charsets. (new_literal_2): Charsets. (new_alphanumeric): Charsets. (standard_internal): Charsets. (cbl_field_t::codeset_t::stride): Charsets. (cobol_alpha_encoding): Charsets. (cobol_national_encoding): Charsets. (new_temporary): Charsets. (new_literal_float): Charsets. (cbl_field_t::is_ascii): Charsets. (cbl_field_t::internalize): Eliminate function. (cbl_field_t::source_code_check): Charsets. (iconv_cd): Charsets. (cbl_field_t::encode): New function for charsets. (cbl_field_t::set_capacity): Charsets. (cbl_field_t::add_capacity): Charsets. (cbl_field_t::char_capacity): Charsets. (symbol_label_section_exists): Charsets. (size): Charsets. (validate_numeric_edited): Charsets. * symbols.h (cobol_alpha_encoding): Charsets. (cobol_national_encoding): Charsets. (consistent_encoding_check): Charsets. (class cbl_domain_elem_t): Charsets. (struct cbl_domain_t): Charsets. (struct cbl_field_data_t): Charsets. (class cbl_field_data_t): Charsets. (struct cbl_subtable_t): Charsets. (struct cbl_field_t): Charsets. (new_literal_float): Charsets. (new_temporary): Charsets. (new_literal_2): Charsets. (symbol_temporary_location): Charsets. (class temporaries_t): Charsets. (struct symbol_elem_t): Charsets. (symbol_elem_of): Charsets. (symbol_unique_index): Charsets. (cbl_field_type_name): Charsets. (validate_numeric_edited): Charsets. * token_names.h: Charsets. * util.cc (cdf_literalize): Charsets. (cbl_field_type_name): Charsets. (determine_intermediate_type): Charsets. (is_alpha_edited): Charsets. (cbl_field_data_t::is_alpha_edited): Charsets. (symbol_field_type_update): Charsets. (redefine_field): Charsets. (FIXED_WIDE_INT): Charsets. (dirty_to_binary): Charsets. (digits_from_int128): Charsets. (binary_initial): Charsets. (cbl_field_t::encode_numeric): Charsets. (FOR_JIM): Temporary conditional demonstration code. (parse_error_inc): Diagnostics. (parse_error_count): Diagnostics. (cbl_field_t::report_invalid_initial_value): Diagnostics. (valid_move): Diagnostics. (type_capacity): Charsets. (symbol_unique_index): New function. (cbl_unimplementedw): Formatting. libgcobol/ChangeLog: * charmaps.cc (__gg__encoding_iconv_name): Charsets. (__gg__encoding_iconv_valid): Charsets. (__gg__encoding_iconv_type): Charsets. (encoding_descr): Charsets. (__gg__encoding_iconv_descr): Charsets. (__gg__iconverter): Charsets. (__gg__miconverter): Charsets. * charmaps.h (NOT_A_CHARACTER): Charsets. (ascii_nul): Charsets. (ascii_bang): Charsets. (__gg__encoding_iconv_type): Charsets. (__gg__iconverter): Charsets. (__gg__miconverter): Charsets. (DEFAULT_32_ENCODING): Charsets. (class charmap_t): Charsets. (__gg__get_charmap): Charsets. * common-defs.h (enum cbl_field_attr_t): (enum cbl_figconst_t): Formatting. (LOW_VALUE_E): Handle enum arithmetic. (ZERO_VALUE_E): Handle enum arithmetic. (SPACE_VALUE_E): Handle enum arithmetic. (QUOTE_VALUE_E): Handle enum arithmetic. (HIGH_VALUE_E): Handle enum arithmetic. (enum convert_type_t): Enum for new FUNCTION CONVERT. (struct cbl_declarative_t): Formatting. * encodings.h (struct encodings_t): Charsets. * gcobolio.h: Charsets. * gfileio.cc (get_filename): Rename to establish filename. (establish_filename): Renamed from get_filename. (relative_file_delete): Charsets. (__io__file_remove): Moved. (trim_in_place): Charsets. (relative_file_start): Charsets. (relative_file_rewrite): Charsets. (relative_file_write): Charsets. (sequential_file_write): Charsets. (line_sequential_file_read): Charsets. (sequential_file_read): Charsets. (relative_file_read): Charsets. (__gg__file_reopen): Charsets. (__io__file_open): Charsets. (__io__file_close): Charsets. (gcobol_fileops): Charsets. (__gg__file_open): Charsets. (__gg__file_remove): Charsets. * gfileio.h (__gg__file_open): Charsets. * gmath.cc (__gg__subtractf1_float_phase2): Comment. (__gg__subtractf2_float_phase1): Comment. (__gg__multiplyf1_phase2): Comment. * intrinsic.cc (is_zulu_format): Charsets. (string_to_dest): Charsets. (get_all_time): Charsets. (ftime_replace): Charsets. (__gg__char): Charsets. (__gg__current_date): Charsets. (__gg__formatted_current_date): Charsets. (__gg__formatted_date): Charsets. (__gg__formatted_datetime): Charsets. (__gg__formatted_time): Charsets. (change_case): Charsets. (__gg__upper_case): Charsets. (numval): Charsets. (numval_c): Charsets. (__gg__trim): Charsets. (__gg__reverse): Charsets. (fill_cobol_tm): Charsets. (__gg__seconds_from_formatted_time): Charsets. (__gg__hex_of): Charsets. (__gg__numval_f): Charsets. (__gg__test_numval_f): Charsets. (__gg__locale_date): Charsets. (__gg__locale_time): Charsets. (__gg__locale_time_from_seconds): Charsets. * libgcobol.cc (NO_RDIGITS): Alias for (0). (__gg__move): Forward reference. (struct program_state): Charsets. (cstrncmp): Charsets. (__gg__init_program_state): Charsets. (edited_to_binary): Charsets. (var_is_refmod): Comment. (__gg__power_of_ten): Reworked data initialization. (__gg__scale_by_power_of_ten_1): Likewise. (__gg__scale_by_power_of_ten_2): Likewise. (value_is_too_big): Likewise. (binary_to_big_endian): Likewise. (binary_to_little_endian): Likewise. (int128_to_int128_rounded): Likewise. (get_binary_value_local): Likewise. (get_init_value): Likewise. (f128_to_i128_rounded): Likewise. (__gg__initialization_values): Likewise. (int128_to_field): Likewise. (__gg__get_date_yymmdd): Charsets. (__gg__field_from_string): Charsets. (field_from_ascii): Charsets. (__gg__get_date_yyyymmdd): Charsets. (__gg__get_date_yyddd): Charsets. (__gg__get_yyyyddd): Charsets. (__gg__get_date_dow): Charsets. (__gg__get_date_hhmmssff): Charsets. (collation_position): Charsets. (uber_compare): Charsets. (__gg__dirty_to_binary): Charsets. (__gg__dirty_to_float): Charsets. (format_for_display_internal): Charsets. (compare_88): Charsets. (get_float128): Reworked. (compare_field_class): Charsets. (interconvert): Charsets. (compare_strings): Charsets. (__gg__compare_2): Charsets. (compare_two_records): Charsets. (__gg__sort_table): Charsets. (init_var_both): Charsets. (__gg__initialize_variable_clean): Charsets. (alpha_to_alpha_move_from_location): Charsets. (__gg__memdup): New function. (alpha_to_alpha_move): Charsets. (__gg__sort_workfile): Charsets. (__gg__merge_files): Charsets. (funky_find_wide): Charsets. (funky_find_wide_backward): Charsets. (normalize_id): Charsets. (match_lengths): Charsets. (the_alpha_and_omega): Charsets. (the_alpha_and_omega_backward): Charsets. (inspect_backward_format_1): Charsets. (__gg__inspect_format_1): Charsets. (inspect_backward_format_2): Charsets. (__gg__inspect_format_2): Charsets. (normalize_for_inspect_format_4): Charsets. (__gg__inspect_format_4): Charsets. (move_string): Charsets. (brute_force_trim): Charsets. (__gg__string): Charsets. (display_both): Charsets. (__gg__display_string): Charsets. (__gg__bitwise_op): Charsets. (is_numeric_display_numeric): Charsets. (is_alpha_a_number): Charsets. (classify_numeric_type): Charsets. (classify_alphabetic_type): Charsets. (__gg__classify): Charsets. (__gg__convert_encoding): Charsets. (accept_envar): Charsets. (__gg__accept_envar): Charsets. (__gg__get_argc): Charsets. (__gg__get_argv): Charsets. (__gg__get_command_line): Charsets. (__gg__parser_set_conditional): Charsets. (__gg__literaln_alpha_compare): Charsets. (string_in): Charsets. (__gg__unstring): Charsets. (__gg__integer_from_float128): Charsets. (__gg__adjust_dest_size): Charsets. (__gg__just_mangle_name): Charsets. (__gg__function_handle_from_name): Charsets. (get_the_byte): Charsets. (__gg__refer_from_string): Charsets. (__gg__refer_from_psz): Charsets. (__gg__find_string): Charsets. (convert_for_convert): Charsets. (__gg__convert): Charsets. * libgcobol.h (__gg__compare_2): Charsets. (__gg__field_from_string): Charsets. (__gg__memdup): Charsets. * posix/bin/Makefile: Posix bindings. * posix/bin/scrape.awk: Posix bindings. * posix/bin/udf-gen: Posix bindings. * posix/udf/posix-lseek.cbl: Posix bindings. * posix/udf/posix-unlink.cbl: Posix bindings. * stringbin.cc (__gg__binary_to_string_encoded): Charsets. (__gg__numeric_display_to_binary): Charsets. * stringbin.h (__gg__binary_to_string_encoded): Charsets. * valconv.cc (__gg__string_to_numeric_edited): Charsets. * posix/cpy/psx-lseek.cpy: New file. * posix/shim/lseek.cc: New file. gcc/testsuite/ChangeLog: * cobol.dg/group2/CHAR_and_ORD_with_COLLATING_sequence_-_EBCDIC.cob: Change diagnostics message. * cobol.dg/group2/Multi-target_MOVE_with_subscript_re-evaluation.cob: Change diagnostics message. * cobol.dg/group2/floating-point_SUBTRACT_FORMAT_2.out: Change diagnostics message. * cobol.dg/group2/floating-point_literals.out: Change diagnostics message.
GCC COBOL Posix Functions and Adapter
Purpose
GCC COBOL provides COBOL bindings for some POSIX functions. Feel free to contribute more. Insofar as possible, the functions take the same parameters and return the same values as defined by POSIX. Among others, they are used by the COBOL compatibility library (see libgcobol/compat/lib/gnu). They are installed in source form. The user may choose to compile them to a library.
ISO COBOL does not specify any relationship to any particular
operating system, and does not reference POSIX. The raw capability is
there, of course, via the CALL statement. But that's not very
convenient, and offers no parameter validation.
For simple functions, e.g. unlink(2), the UDFs simply call the
underlying C library. More complex functions, though,
e.g. stat(2), pass or return a buffer. That buffer is normally
defined by what members must exist, but its exact layout is left up to
the C implementation and defined by the C header files, which are not
parsed by GCC COBOL. Consequently we do not know, at the COBOL level,
how to define the struct stat buffer required by stat(2). For
such functions, we use a C "shim" function that accepts a buffer
defined by GCC COBOL. That buffer has the members defined by POSIX
and a layout defined by GCC COBOL. The COBOL application calls the
COBOL POSIX binding, which uses the shim function to call the C
library.
To take stat(2) as an example,
COBOL program uses
COPY posix-stat.
01 stat-buf.
COPY posix-statbuf. *> gcc/cobol/posix/cpy
FUNCTION POSIX-STAT(filename, stat-buf)
libgcobol/posix/udf/posix-stat.cbl
passes stat-buf to
posix_stat in libgcobol
posix_stat calls stat(2),
and copies the returned values to its input buffer
Contents
The installed POSIX bindings and associated copybooks are in cpy and udf:
cpy/copybooks used by functions inudfudf/COBOL POSIX bindingst/simple tests demonstrating use of functions inudf
Any buffer shared between the COBOL application and a COBOL POSIX
function is defined in cpy/. While these buffers meet the POSIX
descriptions -- meaning they have members matching the standard --
they probably do not match the buffer defined by the C library in
/usr/include. GCC COBOL does not parse C, and therefore does not
parse C header files, and so has no access to those C buffer definitions.
The machine-shop tools are in bin/.
bin/developer tools to aid creation of POSIX bindingsscrape.awkextracts function prototypes from the SYNOPSIS of a man page.udf-genreads function declarations and, for each one, produces a COBOL User Defined Function (UDF) that calls the function.
Finally,
shim/C support for POSIX bindings, incorporated in libgcobol
Prerequisites
for developers, to generate COBOL POSIX bindings
To use the POSIX bindings, just use the COPY statement.
To create new ones, use udf-gen. udf-gen is a Python program that
imports the PLY pycparser module module,
which must be installed.
udf-gen is lightly documented, use udf-gen --help. It can be a
little tedious to set up the first time, but if you want to use more a
few functions, it will be faster than doing the work by hand.
Limitations
udf-gen does not
- generate a working UDF for function parameters of type
struct, such as is used by stat(2). This is because the information is not available in a standardized way in the SYNOPSIS of a man page. - define helpful Level 88 values for "magic" numbers, such as permission bits in chmod(2).
None of this is particularly difficult; it's just a matter of time and
need. The scrape.awk script finds 560 functions in the Ubuntu LTS
22.04 manual. Which of those is important is for users to decide.
Other Options
IBM and MicroFocus both supply intrinsic functions to interface with the OS, each in their own way. GnuCOBOL implements some of those functions.
Portability
The UDF produced by udf-gen is pure ISO COBOL. The code should be
compilable by any ISO COBOL compiler.