Files
Robert Dubner 2c95d4c742 cobol: Support National characters and Unicode runtime encoding.
The last few months have seen an evolution in the COBOL compiler.  Up
until now it could use either CP1252/ASCII or CP1140/EBCDIC to represent
alphanumeric variables and numeric types that are stored as character
strings.  With these changes, those types can be represented in many
other single-byte encodings, as well as UTF16 and UTF32 encodings.

These changes required extensive changes.

1) The initial parsing has to handle the extended capabilities.

2) Each run-time variable designates its character set.

3) The run-time code has to be able to handle wide characters.

Since the development took place over a period of time, other changes
crept in. In particular, there is an expansion of bindings making
certain POSIX functions available to the COBOL programmer.

There has also been an expansion of gcobol's use of the GCC diagnostic
framework.

Co-Authored-By: Robert Dubner <rdubner@symas.com>
Co-Authored-By: James K. Lowden <jklowden@cobolworx.com>

gcc/cobol/ChangeLog:

	* cbldiag.h (struct cbl_loc_t): Diagnostics.
	(enum cbl_diag_id_t): Diagnostics.
	* cdf.y: Includes.
	* cobol1.cc (cobol_warning_suppress): Diagnostics.
	(cobol_langhook_handle_option): Implement -fexec-charset.  Expand
	the use of diagnostics.
	* gcobc: Expand options and warnings.
	* gcobol.1: Documentation.
	* genapi.cc (level_88_helper): Charsets.
	(get_level_88_domain): Charsets.
	(get_class_condition_string): Charsets.
	(function_pointer_from_name): Charsets.
	(initialize_variable_internal):  Charsets.
	(parser_initialize): Charsets.
	(get_binary_value_from_float): Charsets.
	(get_bytes_needed): Charsets.
	(cobol_compare): Charsets.
	(move_tree): Eliminate function.
	(move_tree_to_field): Eliminate function.
	(get_string_from): Eliminate function.
	(parser_init_list): Charsets.
	(psa_FldLiteralN): Charsets.
	(parser_accept_date_yymmdd): Charsets.
	(parser_accept_date_yyyymmdd): Charsets.
	(parser_accept_date_yyddd): Charsets.
	(parser_accept_date_yyyyddd): Charsets.
	(parser_accept_date_dow): Charsets.
	(parser_accept_date_hhmmssff): Charsets.
	(parser_alphabet): Charsets.
	(parser_alphabet_use): Charsets.
	(parser_display_internal): Charsets.
	(get_literalN_value): Charsets.
	(tree_type_from_field_type): Charsets.
	(program_end_stuff): Charsets.
	(walk_initialization): Charsets.
	(parser_xml_parse): Charsets.
	(initialize_the_data): Charsets.
	(establish_using): Charsets.
	(parser_setop): Charsets.
	(parser_set_conditional88): Charsets.
	(parser_file_add): Charsets.
	(get_the_filename): Eliminate function.
	(parser_file_open): Charsets.
	(parser_file_delete_file): Charsets.
	(parser_file_start): Charsets.
	(parser_module_name): Charsets.
	(parser_intrinsic_find_string): New function.
	(parser_intrinsic_numval_c): Charsets.
	(parser_intrinsic_convert): New function.
	(parser_intrinsic_call_1): Charsets.
	(create_and_call): Charsets.
	(mh_identical): Charsets.
	(mh_source_is_literalN): Charsets.
	(float_type_of): Charsets.
	(mh_dest_is_float): Charsets.
	(mh_numeric_display): Charsets.
	(mh_little_endian): Charsets.
	(mh_source_is_group): Charsets.
	(mh_source_is_literalA): Charsets.
	(move_helper): Charsets.
	(binary_initial): Eliminate function.
	(digits_from_int128): Eliminate function.
	(digits_from_float128): Eliminate function.
	(initial_from_initial):  Eliminate function.
	(convert_data_initial): New function.
	(actually_create_the_static_field): Charsets.
	(psa_new_var_decl): Charsets.
	(psa_FldLiteralA): Charsets.
	(parser_local_add): Charsets.
	(parser_symbol_add): Charsets.
	* genapi.h (parser_intrinsic_convert): New function.
	(parser_intrinsic_find_string): New function.
	* genmath.cc (arithmetic_operation): Charsets.
	(largest_binary_term): Charsets.
	(fast_add): Charsets.
	(fast_subtract): Charsets.
	(fast_multiply): Charsets.
	(fast_divide): Charsets.
	(parser_subtract): Fix subtract float from float.
	* genutil.cc (get_any_capacity): Charsets.
	(get_and_check_refstart_and_reflen): Charsets.
	(get_data_offset): Charsets.
	(get_binary_value): Charsets.
	(tree_type_from_field): Charsets.
	(copy_little_endian_into_place): Charsets.
	(get_literal_string): Charsets.
	(refer_is_clean): Charsets.
	(refer_fill_depends): Charsets.
	(refer_size_source): Comment.
	* lang-specs.h: Charsets.
	* lang.opt: Charsets.
	* lexio.cc (parse_copy_directive): Diagnostics.
	* messages.cc (cbl_diagnostic_kind): Diagnostics.
	(cobol_warning_suppress): Diagnostics.
	* parse.y: Many changes for charsets and diagnostics.
	* parse_ante.h (MAXLENGTH_FORMATTED_DATE): Charsets.
	(MAXLENGTH_FORMATTED_TIME): Charsets.
	(MAXLENGTH_CALENDAR_DATE): Charsets.
	(MAXLENGTH_FORMATTED_DATETIME): Charsets.
	(consistent_encoding_check): Charsets.
	(enum data_clause_t): Charsets.
	(new_alphanumeric): Charsets.
	(name_of): Charsets.
	(class eval_subject_t): Charsets.
	(struct domain_t): Charsets.
	(struct file_list_t): Charsets.
	(current_encoding): Charsets.
	(new_tempnumeric): Charsets.
	(is_integer_literal): Charsets.
	(new_literal): Charsets.
	(new_constant): Charsets.
	(conditional_set): Charsets.
	(field_find): Charsets.
	(valid_redefine): Charsets.
	(field_value_all): Charsets.
	(parent_has_picture): Charsets.
	(parent_has_value): Charsets.
	(blank_pad_initial): Charsets.
	(blankit): Charsets.
	(cbl_field_t::blank_initial): Charsets.
	(value_encoding_check): Charsets.
	(cbl_field_t::set_initial): Charsets.
	(field_alloc): Charsets.
	(parser_move_carefully): Charsets.
	(data_division_ready): Charsets.
	(anybody_redefines): Charsets.
	(procedure_division_ready): Charsets.
	(file_section_parent_set): Charsets.
	(field_binary_usage): Charsets.
	(goodnight_gracie): Formatting.
	* scan.l: Charsets.
	* scan_ante.h (numstr_of): Charsets.
	(typed_name): Charsets.
	* show_parse.h: Charsets.
	* structs.cc (create_cblc_file_t): Charsets.
	* symbols.cc (symbol_table_extend): Charsets.
	(WARNING_FIELD): Diagnostics.
	(constq): Charsets.
	(elementize): Charsets.
	(field_size): Charsets.
	(cbl_field_t::set_attr): Eliminate run-time component.
	(cbl_field_t::clear_attr): Eliminate run-time component.
	(field_memsize): Charsets.
	(cbl_encoding_str): Charsets.
	(symbols_dump): Charsets.
	(is_variable_length): Formatting.
	(field_str): Charsets.
	(extend_66_capacity): Charsets.
	(operator<<): Charsets.
	(symbols_update): Charsets.
	(symbol_field_parent_set): Charsets.
	(symbol_table_init): Charsets.
	(numeric_group_attrs): Charsets.
	(symbol_field_add): Charsets.
	(symbol_field_alias): Charsets.
	(fd_record_size_cmp): Charsets.
	(symbol_file_record_sizes): Charsets.
	(cbl_alphabet_t::reencode): Charsets.
	(symbol_temporary_location): Charsets.
	(new_literal_2): Charsets.
	(new_alphanumeric): Charsets.
	(standard_internal): Charsets.
	(cbl_field_t::codeset_t::stride): Charsets.
	(cobol_alpha_encoding): Charsets.
	(cobol_national_encoding): Charsets.
	(new_temporary): Charsets.
	(new_literal_float): Charsets.
	(cbl_field_t::is_ascii): Charsets.
	(cbl_field_t::internalize): Eliminate function.
	(cbl_field_t::source_code_check): Charsets.
	(iconv_cd): Charsets.
	(cbl_field_t::encode): New function for charsets.
	(cbl_field_t::set_capacity): Charsets.
	(cbl_field_t::add_capacity): Charsets.
	(cbl_field_t::char_capacity): Charsets.
	(symbol_label_section_exists): Charsets.
	(size): Charsets.
	(validate_numeric_edited): Charsets.
	* symbols.h (cobol_alpha_encoding): Charsets.
	(cobol_national_encoding): Charsets.
	(consistent_encoding_check): Charsets.
	(class cbl_domain_elem_t): Charsets.
	(struct cbl_domain_t): Charsets.
	(struct cbl_field_data_t): Charsets.
	(class cbl_field_data_t): Charsets.
	(struct cbl_subtable_t): Charsets.
	(struct cbl_field_t): Charsets.
	(new_literal_float): Charsets.
	(new_temporary): Charsets.
	(new_literal_2): Charsets.
	(symbol_temporary_location): Charsets.
	(class temporaries_t): Charsets.
	(struct symbol_elem_t): Charsets.
	(symbol_elem_of): Charsets.
	(symbol_unique_index): Charsets.
	(cbl_field_type_name): Charsets.
	(validate_numeric_edited): Charsets.
	* token_names.h: Charsets.
	* util.cc (cdf_literalize): Charsets.
	(cbl_field_type_name): Charsets.
	(determine_intermediate_type): Charsets.
	(is_alpha_edited): Charsets.
	(cbl_field_data_t::is_alpha_edited): Charsets.
	(symbol_field_type_update): Charsets.
	(redefine_field): Charsets.
	(FIXED_WIDE_INT): Charsets.
	(dirty_to_binary): Charsets.
	(digits_from_int128): Charsets.
	(binary_initial): Charsets.
	(cbl_field_t::encode_numeric): Charsets.
	(FOR_JIM): Temporary conditional demonstration code.
	(parse_error_inc): Diagnostics.
	(parse_error_count): Diagnostics.
	(cbl_field_t::report_invalid_initial_value): Diagnostics.
	(valid_move): Diagnostics.
	(type_capacity): Charsets.
	(symbol_unique_index): New function.
	(cbl_unimplementedw): Formatting.

libgcobol/ChangeLog:

	* charmaps.cc (__gg__encoding_iconv_name): Charsets.
	(__gg__encoding_iconv_valid): Charsets.
	(__gg__encoding_iconv_type): Charsets.
	(encoding_descr): Charsets.
	(__gg__encoding_iconv_descr): Charsets.
	(__gg__iconverter): Charsets.
	(__gg__miconverter): Charsets.
	* charmaps.h (NOT_A_CHARACTER): Charsets.
	(ascii_nul): Charsets.
	(ascii_bang): Charsets.
	(__gg__encoding_iconv_type): Charsets.
	(__gg__iconverter): Charsets.
	(__gg__miconverter): Charsets.
	(DEFAULT_32_ENCODING): Charsets.
	(class charmap_t): Charsets.
	(__gg__get_charmap): Charsets.
	* common-defs.h (enum cbl_field_attr_t):
	(enum cbl_figconst_t): Formatting.
	(LOW_VALUE_E): Handle enum arithmetic.
	(ZERO_VALUE_E): Handle enum arithmetic.
	(SPACE_VALUE_E): Handle enum arithmetic.
	(QUOTE_VALUE_E): Handle enum arithmetic.
	(HIGH_VALUE_E): Handle enum arithmetic.
	(enum convert_type_t): Enum for new FUNCTION CONVERT.
	(struct cbl_declarative_t): Formatting.
	* encodings.h (struct encodings_t): Charsets.
	* gcobolio.h: Charsets.
	* gfileio.cc (get_filename): Rename to establish filename.
	(establish_filename): Renamed from get_filename.
	(relative_file_delete):  Charsets.
	(__io__file_remove): Moved.
	(trim_in_place): Charsets.
	(relative_file_start): Charsets.
	(relative_file_rewrite): Charsets.
	(relative_file_write): Charsets.
	(sequential_file_write): Charsets.
	(line_sequential_file_read): Charsets.
	(sequential_file_read): Charsets.
	(relative_file_read): Charsets.
	(__gg__file_reopen): Charsets.
	(__io__file_open): Charsets.
	(__io__file_close): Charsets.
	(gcobol_fileops): Charsets.
	(__gg__file_open): Charsets.
	(__gg__file_remove): Charsets.
	* gfileio.h (__gg__file_open): Charsets.
	* gmath.cc (__gg__subtractf1_float_phase2): Comment.
	(__gg__subtractf2_float_phase1): Comment.
	(__gg__multiplyf1_phase2): Comment.
	* intrinsic.cc (is_zulu_format): Charsets.
	(string_to_dest): Charsets.
	(get_all_time): Charsets.
	(ftime_replace): Charsets.
	(__gg__char): Charsets.
	(__gg__current_date): Charsets.
	(__gg__formatted_current_date): Charsets.
	(__gg__formatted_date): Charsets.
	(__gg__formatted_datetime): Charsets.
	(__gg__formatted_time): Charsets.
	(change_case): Charsets.
	(__gg__upper_case): Charsets.
	(numval): Charsets.
	(numval_c): Charsets.
	(__gg__trim): Charsets.
	(__gg__reverse): Charsets.
	(fill_cobol_tm): Charsets.
	(__gg__seconds_from_formatted_time): Charsets.
	(__gg__hex_of): Charsets.
	(__gg__numval_f): Charsets.
	(__gg__test_numval_f): Charsets.
	(__gg__locale_date): Charsets.
	(__gg__locale_time): Charsets.
	(__gg__locale_time_from_seconds): Charsets.
	* libgcobol.cc (NO_RDIGITS): Alias for (0).
	(__gg__move): Forward reference.
	(struct program_state): Charsets.
	(cstrncmp): Charsets.
	(__gg__init_program_state): Charsets.
	(edited_to_binary): Charsets.
	(var_is_refmod): Comment.
	(__gg__power_of_ten): Reworked data initialization.
	(__gg__scale_by_power_of_ten_1): Likewise.
	(__gg__scale_by_power_of_ten_2): Likewise.
	(value_is_too_big): Likewise.
	(binary_to_big_endian): Likewise.
	(binary_to_little_endian): Likewise.
	(int128_to_int128_rounded): Likewise.
	(get_binary_value_local): Likewise.
	(get_init_value): Likewise.
	(f128_to_i128_rounded): Likewise.
	(__gg__initialization_values): Likewise.
	(int128_to_field): Likewise.
	(__gg__get_date_yymmdd): Charsets.
	(__gg__field_from_string): Charsets.
	(field_from_ascii): Charsets.
	(__gg__get_date_yyyymmdd): Charsets.
	(__gg__get_date_yyddd): Charsets.
	(__gg__get_yyyyddd): Charsets.
	(__gg__get_date_dow): Charsets.
	(__gg__get_date_hhmmssff): Charsets.
	(collation_position): Charsets.
	(uber_compare): Charsets.
	(__gg__dirty_to_binary): Charsets.
	(__gg__dirty_to_float): Charsets.
	(format_for_display_internal): Charsets.
	(compare_88): Charsets.
	(get_float128): Reworked.
	(compare_field_class): Charsets.
	(interconvert): Charsets.
	(compare_strings): Charsets.
	(__gg__compare_2): Charsets.
	(compare_two_records): Charsets.
	(__gg__sort_table): Charsets.
	(init_var_both): Charsets.
	(__gg__initialize_variable_clean): Charsets.
	(alpha_to_alpha_move_from_location): Charsets.
	(__gg__memdup): New function.
	(alpha_to_alpha_move): Charsets.
	(__gg__sort_workfile): Charsets.
	(__gg__merge_files): Charsets.
	(funky_find_wide): Charsets.
	(funky_find_wide_backward): Charsets.
	(normalize_id): Charsets.
	(match_lengths): Charsets.
	(the_alpha_and_omega): Charsets.
	(the_alpha_and_omega_backward): Charsets.
	(inspect_backward_format_1): Charsets.
	(__gg__inspect_format_1): Charsets.
	(inspect_backward_format_2): Charsets.
	(__gg__inspect_format_2): Charsets.
	(normalize_for_inspect_format_4): Charsets.
	(__gg__inspect_format_4): Charsets.
	(move_string): Charsets.
	(brute_force_trim): Charsets.
	(__gg__string): Charsets.
	(display_both): Charsets.
	(__gg__display_string): Charsets.
	(__gg__bitwise_op): Charsets.
	(is_numeric_display_numeric): Charsets.
	(is_alpha_a_number): Charsets.
	(classify_numeric_type): Charsets.
	(classify_alphabetic_type): Charsets.
	(__gg__classify): Charsets.
	(__gg__convert_encoding): Charsets.
	(accept_envar): Charsets.
	(__gg__accept_envar): Charsets.
	(__gg__get_argc): Charsets.
	(__gg__get_argv): Charsets.
	(__gg__get_command_line): Charsets.
	(__gg__parser_set_conditional): Charsets.
	(__gg__literaln_alpha_compare): Charsets.
	(string_in): Charsets.
	(__gg__unstring): Charsets.
	(__gg__integer_from_float128): Charsets.
	(__gg__adjust_dest_size): Charsets.
	(__gg__just_mangle_name): Charsets.
	(__gg__function_handle_from_name): Charsets.
	(get_the_byte): Charsets.
	(__gg__refer_from_string): Charsets.
	(__gg__refer_from_psz): Charsets.
	(__gg__find_string): Charsets.
	(convert_for_convert): Charsets.
	(__gg__convert): Charsets.
	* libgcobol.h (__gg__compare_2): Charsets.
	(__gg__field_from_string): Charsets.
	(__gg__memdup): Charsets.
	* posix/bin/Makefile: Posix bindings.
	* posix/bin/scrape.awk: Posix bindings.
	* posix/bin/udf-gen: Posix bindings.
	* posix/udf/posix-lseek.cbl: Posix bindings.
	* posix/udf/posix-unlink.cbl: Posix bindings.
	* stringbin.cc (__gg__binary_to_string_encoded): Charsets.
	(__gg__numeric_display_to_binary): Charsets.
	* stringbin.h (__gg__binary_to_string_encoded): Charsets.
	* valconv.cc (__gg__string_to_numeric_edited): Charsets.
	* posix/cpy/psx-lseek.cpy: New file.
	* posix/shim/lseek.cc: New file.

gcc/testsuite/ChangeLog:

	* cobol.dg/group2/CHAR_and_ORD_with_COLLATING_sequence_-_EBCDIC.cob:
	Change diagnostics message.
	* cobol.dg/group2/Multi-target_MOVE_with_subscript_re-evaluation.cob:
	Change diagnostics message.
	* cobol.dg/group2/floating-point_SUBTRACT_FORMAT_2.out:
	Change diagnostics message.
	* cobol.dg/group2/floating-point_literals.out:
	Change diagnostics message.
2026-01-16 22:09:57 -05:00
..

GCC COBOL Posix Functions and Adapter

Purpose

GCC COBOL provides COBOL bindings for some POSIX functions. Feel free to contribute more. Insofar as possible, the functions take the same parameters and return the same values as defined by POSIX. Among others, they are used by the COBOL compatibility library (see libgcobol/compat/lib/gnu). They are installed in source form. The user may choose to compile them to a library.

ISO COBOL does not specify any relationship to any particular operating system, and does not reference POSIX. The raw capability is there, of course, via the CALL statement. But that's not very convenient, and offers no parameter validation.

For simple functions, e.g. unlink(2), the UDFs simply call the underlying C library. More complex functions, though, e.g. stat(2), pass or return a buffer. That buffer is normally defined by what members must exist, but its exact layout is left up to the C implementation and defined by the C header files, which are not parsed by GCC COBOL. Consequently we do not know, at the COBOL level, how to define the struct stat buffer required by stat(2). For such functions, we use a C "shim" function that accepts a buffer defined by GCC COBOL. That buffer has the members defined by POSIX and a layout defined by GCC COBOL. The COBOL application calls the COBOL POSIX binding, which uses the shim function to call the C library.

To take stat(2) as an example,

COBOL program uses 
    COPY posix-stat.
  01 stat-buf.
    COPY posix-statbuf. *> gcc/cobol/posix/cpy
  FUNCTION POSIX-STAT(filename, stat-buf)
libgcobol/posix/udf/posix-stat.cbl 
    passes stat-buf to 
    posix_stat in libgcobol
posix_stat calls stat(2), 
    and copies the returned values to its input buffer

Contents

The installed POSIX bindings and associated copybooks are in cpy and udf:

  • cpy/ copybooks used by functions in udf
  • udf/ COBOL POSIX bindings
  • t/ simple tests demonstrating use of functions in udf

Any buffer shared between the COBOL application and a COBOL POSIX function is defined in cpy/. While these buffers meet the POSIX descriptions -- meaning they have members matching the standard -- they probably do not match the buffer defined by the C library in /usr/include. GCC COBOL does not parse C, and therefore does not parse C header files, and so has no access to those C buffer definitions.

The machine-shop tools are in bin/.

  • bin/ developer tools to aid creation of POSIX bindings
    • scrape.awk extracts function prototypes from the SYNOPSIS of a man page.
    • udf-gen reads function declarations and, for each one, produces a COBOL User Defined Function (UDF) that calls the function.

Finally,

  • shim/ C support for POSIX bindings, incorporated in libgcobol

Prerequisites

for developers, to generate COBOL POSIX bindings

To use the POSIX bindings, just use the COPY statement.

To create new ones, use udf-gen. udf-gen is a Python program that imports the PLY pycparser module module, which must be installed.

udf-gen is lightly documented, use udf-gen --help. It can be a little tedious to set up the first time, but if you want to use more a few functions, it will be faster than doing the work by hand.

Limitations

udf-gen does not

  • generate a working UDF for function parameters of type struct, such as is used by stat(2). This is because the information is not available in a standardized way in the SYNOPSIS of a man page.
  • define helpful Level 88 values for "magic" numbers, such as permission bits in chmod(2).

None of this is particularly difficult; it's just a matter of time and need. The scrape.awk script finds 560 functions in the Ubuntu LTS 22.04 manual. Which of those is important is for users to decide.

Other Options

IBM and MicroFocus both supply intrinsic functions to interface with the OS, each in their own way. GnuCOBOL implements some of those functions.

Portability

The UDF produced by udf-gen is pure ISO COBOL. The code should be compilable by any ISO COBOL compiler.