mirror of
https://forge.sourceware.org/marek/gcc.git
synced 2026-02-22 12:00:11 -05:00
The following patch implements the C23 N3017 "#embed - a scannable,
tooling-friendly binary resource inclusion mechanism" paper.
The implementation is intentionally dumb, in that it doesn't significantly
speed up compilation of larger initializers and doesn't make it possible
to use huge #embeds (like several gigabytes large, that is compile time
and memory still infeasible).
There are 2 reasons for this. One is that I think like it is implemented
now in the patch is how we should use it for the smaller #embed sizes,
dunno with which boundary, whether 32 bytes or 64 or something like that,
certainly handling the single byte cases which is something that can appear
anywhere in the source where constant integer literal can appear is
desirable and I think for a few bytes it isn't worth it to come up with
something smarter and users would like to e.g. see it in -E readably as
well (perhaps the slow vs. fast boundary should be determined by command
line option). And the other one is to be able to more easily find
regressions in behavior caused by the optimizations, so we have something
to get back in git to compare against.
I'm definitely willing to work on the optimizations (likely introduce a new
CPP_* token type to refer to a range of libcpp owned memory (start + size)
and similarly some tree which can do the same, and can be at any time e.g.
split into 2 subparts + say INTEGER_CST in between if needed say for
const unsigned char d[] = {
#embed "2GB.dat" prefix (0, 0, ) suffix (, [0x40000000] = 42)
}; still without having to copy around huge amounts of data; STRING_CST
owns the memory it points to and can be only 2GB in size), but would
like to do that incrementally.
And would like to first include some extensions also not included in
this patch, like gnu::offset (off) parameter to allow to skip certain
constant amount of bytes at the start of the files, plus
gnu::base64 ("base64_encoded_data") parameter to add something which can
store more efficiently large amounts of the #embed data in preprocessed
source.
I've been cross-checking all the tests also against the LLVM implementation
https://github.com/llvm/llvm-project/pull/68620
which has been for a few hours even committed to LLVM trunk but reverted
afterwards. LLVM now has the support committed and I admit I haven't
rechecked whether the behavior on the below mentioned spots have been fixed
in it already or not yet.
The patch uses --embed-dir= option that clang plans to add above and doesn't
use other variants on the search directories yet, plus there are no
default directories at least for the time being where to search for embed
files. So, #embed "..." works if it is found in the same directory (or
relative to the current file's directory) and #embed "/..." or #embed </...>
work always, but relative #embed <...> doesn't unless at least one
--embed-dir= is specified. There is no reason to differentiate between
system and non-system directories, so we don't need -isystem like
counterpart, perhaps -iquote like counterpart could be useful in the future,
dunno what else. It has --embed-directory=dir and --embed-directory dir
as aliases.
There are some differences beyond clang ICEs, so I'd like to point them out
to make sure there is agreement on the choices in the patch. They are also
mentioned in the comments of the llvm pull request.
The most important is that the GCC patch (as well as the original thephd.dev
LLVM branch on godbolt) expands #embed (or acts as if it is expanded) into
a mere sequence of numbers like 123,2,35,26 rather then what clang
effectively treats as (unsigned char)123,(unsigned char)2,(unsigned
char)35,(unsigned char)26 but only does that when using integrated
preprocessor, not when using -save-temps where it acts as GCC.
JeanHeyd as the original author agrees that is how it is currently worded in
C23.
Another difference (not tested in the testsuite, not sure how to check for
effective target /dev/urandom nor am sure it is desirable to check that
during testsuite) is how to treat character devices, named pipes etc.
(block devices are errored on). The original paper uses /dev/urandom
in various examples and seems to assume that unlike regular files the
devices aren't really cached, so
#embed </dev/urandom> limit(1) prefix(int a = ) suffix(;)
#embed </dev/urandom> limit(1) prefix(int b = ) suffix(;)
usually results in a != b. That is what the godbolt thephd.dev branch
implements too and what this patch does as well, but clang actually seems
to just go from st.st_size == 0, ergo it must be zero-sized resource and
so just copies over if_empty if present. It is really questionable
what to do about the character devices/named pipes with __has_embed, for
regular files the patch doesn't read anything from them, relies on
st.st_size + limit for whether it is empty or non-empty. But I don't know
of a way to check if read on say a character device would read anything
or not (the </dev/null> limit (1) vs. </dev/zero> limit (1) cases), and
if we read something, that would be better cached for later because
#embed later if it reads again could read no further data even when it
first read something. So, the patch currently for __has_embed just
always returns 2 on the non-regular files, like the thephd.dev
branch does as well and like the clang pull request as well.
A question is also what to do for gnu::offset on the non-regular files
even for #embed, those aren't seekable and do we want to just read and throw
away the offset bytes each time we see it used?
clang also chokes on the
#if __has_embed (__FILE__ __limit__ (1) __prefix__ () suffix (1 / 0) \
__if_empty__ ((({{[0[0{0{0(0(0)1)1}1}]]}})))) != __STDC_EMBED_FOUND__
#error "__has_embed fail"
#endif
in embed-1.c, but thephd.dev branch accepts it and I don't see why
it shouldn't, (({{[0[0{0{0(0(0)1)1}1}]]}}))) is a balanced token
sequence and the file isn't empty, so it should just be parsed and
discarded.
clang also IMHO mishandles
const unsigned char w[] = {
#embed __FILE__ prefix([0] = 42, [15] =) limit(32)
};
but again only without -save-temps, seems like it
treats it as
[0] = 42, [15] = (99,111,110,115,116,32,117,110,115,105,103,110,101,100,
32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98)
rather than
[0] = 42, [15] = 99,111,110,115,116,32,117,110,115,105,103,110,101,100,
32,99,104,97,114,32,119,91,93,32,61,32,123,10,35,101,109,98
and warns on it for -Wunused-value and just compiles it as
[0] = 42, [15] = 98
And also
void foo (int, int, int, int);
void bar (void) { foo (
#embed __FILE__ limit (4) prefix (172 + ) suffix (+ 2)
); }
is treated as
172 + (118, 111, 105, 100) + 2
rather than
172 + 118, 111, 105, 100 + 2
which clang -save-temps or GCC treats it like, so results
in just one argument passed rather than 4.
if (!strstr ((const char *) magna_carta, "imprisonétur")) abort ();
in the testcase fails as well, but in that case calling it in gdb succeeds:
p ((char *(*)(char *, char *))__strstr_sse2) (magna_carta, "imprisonétur")
$2 = 0x555555558d3c <magna_carta+11564> "imprisonétur aut disseisiátur"...
so I guess they are just trying to constant evaluate strstr and do it
incorrectly.
They started with making the optimizations together in the initial patch
set, so they don't have the luxury to compare if it is just because of
the optimization they are trying to do or because that is how the
feature works for them. At least unless they use -save-temps for now.
There is also different behavior between clang and gcc on -M or other
dependency generating options. Seems clang includes the __has_embed
searched files in dependencies, while my patch doesn't. But so does
clang for __has_include and GCC doesn't. Emitting a hard dependency
on some header just because there was __has_include/__has_embed for it
seems wrong to me, because (at least when properly written) the source
likely doesn't mind if the file is missing, it will do something else,
so a hard error from make because of it doesn't seem right. Does
make have some weaker dependencies, such that if some file can be remade
it is but if it doesn't exist, it isn't fatal?
I wonder whether #embed <non-existent-file> really needs to be fatal
or whether we could simply after diagnosing it pretend the file exists
and is empty. For #include I think fatal errors make tons of sense,
but perhaps for #embed which is more localized we'd get better error
reporting if we didn't bail out immediately. Note, both GCC and clang
currently treat those as fatal errors.
clang also added -dE option which with -E instead of preprocessing
the #embed directives keeps them as is, but the preprocessed source
then isn't self-contained. That option looks more harmful than useful to
me.
Also, it isn't clear to me from C23 whether it is possible to have
__has_include/__has_c_attribute/__has_embed expressions inside of
the limit #embed/__has_embed argument.
6.10.3.2/2 says that defined should not appear there (and the patch
diagnoses it and testsuite tests), but for __has_include/__has_embed
etc. 6.10.1/11 says:
"The identifiers __has_include, __has_embed, and __has_c_attribute
shall not appear in any context not mentioned in this subclause."
If that subclause in that case means 6.10.1, then it presumably shouldn't
appear in #embed in 6.10.3, but __has_embed is in 6.10.1...
But 6.10.3.2/3 says that it should be parsed according to the 6.10.1
rules. Haven't included tests like
#if __has_embed (__FILE__ limit (__has_embed (__FILE__ limit (1))))
or
#embed __FILE__ limit (__has_include (__FILE__))
into the testsuite because of the doubts but I think the patch should
handle those right now.
The reason I've used Magna Carta text in some of the testcases is that
I hope it shouldn't be copyrighted after the centuries and I'd strongly
prefer not to have binary blobs in git after the xz backdoor lesson
and wanted something larger which doesn't change all the time.
Oh, BTW, I see in C23 draft 6.10.3.2 in Example 4
if (f_source == NULL);
return 1;
(note the spurious semicolon after closing paren), has that been fixed
already?
Like the thephd.dev and clang implementations, the patch always macro
expands the whole #embed and __has_embed directives except for the
embed keyword. That is most likely not what C23 says, my limited
understanding right now is that in #embed one needs to parse the whole
directive line with macro expansion disabled and check if it satisfies the
grammar, if not, the whole directive is macro expanded, if yes, only
the limit parameter argument is macro expanded and the prefix/suffix/if_empty
arguments are maybe macro expanded when actually used (and not at all if
unused). And I think __has_embed macro expansion has conflicting rules.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
PR c/105863
libcpp/
* include/cpplib.h: Implement C23 N3017 #embed - a scannable,
tooling-friendly binary resource inclusion mechanism paper.
(struct cpp_options): Add embed member.
(enum cpp_builtin_type): Add BT_HAS_EMBED.
(cpp_set_include_chains): Add another cpp_dir * argument to
the declaration.
* internal.h (enum include_type): Add IT_EMBED.
(struct cpp_reader): Add embed_include member.
(struct cpp_embed_params_tokens): New type.
(struct cpp_embed_params): New type.
(_cpp_get_token_no_padding): Declare.
(enum _cpp_find_file_kind): Add _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED.
(_cpp_stack_embed): Declare.
(_cpp_parse_expr): Change return type to cpp_num_part instead of
bool, change second argument from bool to const char * and add third
argument.
(_cpp_parse_embed_params): Declare.
* directives.cc (DIRECTIVE_TABLE): Add embed entry.
(end_directive): Don't call skip_rest_of_line for T_EMBED directive.
(_cpp_handle_directive): Return 2 rather than 1 for T_EMBED in
directives-only mode.
(parse_include): Don't Call check_eol for T_EMBED directive.
(skip_balanced_token_seq): New function.
(EMBED_PARAMS): Define.
(enum embed_param_kind): New type.
(embed_params): New variable.
(_cpp_parse_embed_params): New function.
(do_embed): New function.
(do_if): Adjust _cpp_parse_expr caller.
(do_elif): Likewise.
* expr.cc (parse_defined): Diagnose defined in #embed or __has_embed
parameters.
(_cpp_parse_expr): Change return type to cpp_num_part instead of
bool, change second argument from bool to const char * and add third
argument. Adjust function comment. For #embed/__has_embed parameters
add an artificial CPP_OPEN_PAREN. Use the second argument DIR
directly instead of string literals conditional on IS_IF.
For #embed/__has_embed parameter, stop on reaching CPP_CLOSE_PAREN
matching the artificial one. Diagnose negative or too large embed
parameter operands.
(num_binary_op): Use #embed instead of #if for diagnostics if inside
#embed/__has_embed parameter.
(num_div_op): Likewise.
* files.cc (struct _cpp_file): Add limit member and embed bitfield.
(search_cache): Add IS_EMBED argument, formatting fix. Skip over
files with different file->embed from the argument.
(find_file_in_dir): Don't call pch_open_file if file->embed.
(_cpp_find_file): Handle _cpp_FFK_EMBED and _cpp_FFK_HAS_EMBED.
(read_file_guts): Formatting fix.
(has_unique_contents): Ignore file->embed files.
(search_path_head): Handle IT_EMBED type.
(_cpp_stack_embed): New function.
(_cpp_get_file_stat): Formatting fix.
(cpp_set_include_chains): Add embed argument, save it to
pfile->embed_include and compute lens for the chain.
* init.cc (struct lang_flags): Add embed member.
(lang_defaults): Add embed initializers.
(cpp_set_lang): Initialize CPP_OPTION (pfile, embed).
(builtin_array): Add __has_embed entry.
(cpp_init_builtins): Predefine __STDC_EMBED_NOT_FOUND__,
__STDC_EMBED_FOUND__ and __STDC_EMBED_EMPTY__.
* lex.cc (cpp_directive_only_process): Handle #embed.
* macro.cc (cpp_get_token_no_padding): Rename to ...
(_cpp_get_token_no_padding): ... this. No longer static.
(builtin_has_include_1): New function.
(builtin_has_include): Use it. Use _cpp_get_token_no_padding
instead of cpp_get_token_no_padding.
(builtin_has_embed): New function.
(_cpp_builtin_macro_text): Handle BT_HAS_EMBED.
gcc/
* doc/cppdiropts.texi (--embed-dir=): Document.
* doc/cpp.texi (Binary Resource Inclusion): New chapter.
(__has_embed): Document.
* doc/invoke.texi (Directory Options): Mention --embed-dir=.
* gcc.cc (cpp_unique_options): Add %{-embed*}.
* genmatch.cc (main): Adjust cpp_set_include_chains caller.
* incpath.h (enum incpath_kind): Add INC_EMBED.
* incpath.cc (merge_include_chains): Handle INC_EMBED.
(register_include_chains): Adjust cpp_set_include_chains caller.
gcc/c-family/
* c.opt (-embed-dir=): New option.
(-embed-directory): New alias.
(-embed-directory=): New alias.
* c-opts.cc (c_common_handle_option): Handle OPT__embed_dir_.
gcc/testsuite/
* c-c++-common/cpp/embed-1.c: New test.
* c-c++-common/cpp/embed-2.c: New test.
* c-c++-common/cpp/embed-3.c: New test.
* c-c++-common/cpp/embed-4.c: New test.
* c-c++-common/cpp/embed-5.c: New test.
* c-c++-common/cpp/embed-6.c: New test.
* c-c++-common/cpp/embed-7.c: New test.
* c-c++-common/cpp/embed-8.c: New test.
* c-c++-common/cpp/embed-9.c: New test.
* c-c++-common/cpp/embed-10.c: New test.
* c-c++-common/cpp/embed-11.c: New test.
* c-c++-common/cpp/embed-12.c: New test.
* c-c++-common/cpp/embed-13.c: New test.
* c-c++-common/cpp/embed-14.c: New test.
* c-c++-common/cpp/embed-25.c: New test.
* c-c++-common/cpp/embed-26.c: New test.
* c-c++-common/cpp/embed-dir/embed-1.inc: New test.
* c-c++-common/cpp/embed-dir/embed-3.c: New test.
* c-c++-common/cpp/embed-dir/embed-4.c: New test.
* c-c++-common/cpp/embed-dir/magna-carta.txt: New test.
* gcc.dg/cpp/embed-1.c: New test.
* gcc.dg/cpp/embed-2.c: New test.
* gcc.dg/cpp/embed-3.c: New test.
* gcc.dg/cpp/embed-4.c: New test.
* g++.dg/cpp/embed-1.C: New test.
* g++.dg/cpp/embed-2.C: New test.
* g++.dg/cpp/embed-3.C: New test.
1015 lines
33 KiB
C++
1015 lines
33 KiB
C++
/* Part of CPP library.
|
|
Copyright (C) 1997-2024 Free Software Foundation, Inc.
|
|
|
|
This program is free software; you can redistribute it and/or modify it
|
|
under the terms of the GNU General Public License as published by the
|
|
Free Software Foundation; either version 3, or (at your option) any
|
|
later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program; see the file COPYING3. If not see
|
|
<http://www.gnu.org/licenses/>. */
|
|
|
|
/* This header defines all the internal data structures and functions
|
|
that need to be visible across files. It should not be used outside
|
|
cpplib. */
|
|
|
|
#ifndef LIBCPP_INTERNAL_H
|
|
#define LIBCPP_INTERNAL_H
|
|
|
|
#include "symtab.h"
|
|
#include "cpplib.h"
|
|
#include "rich-location.h"
|
|
|
|
#if HAVE_ICONV
|
|
#include <iconv.h>
|
|
#else
|
|
#define HAVE_ICONV 0
|
|
typedef int iconv_t; /* dummy */
|
|
#endif
|
|
|
|
#ifdef __cplusplus
|
|
extern "C" {
|
|
#endif
|
|
|
|
struct directive; /* Deliberately incomplete. */
|
|
struct pending_option;
|
|
struct op;
|
|
struct _cpp_strbuf;
|
|
|
|
typedef bool (*convert_f) (iconv_t, const unsigned char *, size_t,
|
|
struct _cpp_strbuf *);
|
|
struct cset_converter
|
|
{
|
|
convert_f func;
|
|
iconv_t cd;
|
|
int width;
|
|
const char* from;
|
|
const char* to;
|
|
};
|
|
|
|
#define BITS_PER_CPPCHAR_T (CHAR_BIT * sizeof (cppchar_t))
|
|
|
|
/* Test if a sign is valid within a preprocessing number. */
|
|
#define VALID_SIGN(c, prevc) \
|
|
(((c) == '+' || (c) == '-') && \
|
|
((prevc) == 'e' || (prevc) == 'E' \
|
|
|| (((prevc) == 'p' || (prevc) == 'P') \
|
|
&& CPP_OPTION (pfile, extended_numbers))))
|
|
|
|
#define DIGIT_SEP(c) ((c) == '\'' && CPP_OPTION (pfile, digit_separators))
|
|
|
|
#define CPP_OPTION(PFILE, OPTION) ((PFILE)->opts.OPTION)
|
|
#define CPP_BUFFER(PFILE) ((PFILE)->buffer)
|
|
#define CPP_BUF_COLUMN(BUF, CUR) ((CUR) - (BUF)->line_base)
|
|
#define CPP_BUF_COL(BUF) CPP_BUF_COLUMN(BUF, (BUF)->cur)
|
|
|
|
#define CPP_INCREMENT_LINE(PFILE, COLS_HINT) do { \
|
|
const class line_maps *line_table = PFILE->line_table; \
|
|
const struct line_map_ordinary *map = \
|
|
LINEMAPS_LAST_ORDINARY_MAP (line_table); \
|
|
linenum_type line = SOURCE_LINE (map, line_table->highest_line); \
|
|
linemap_line_start (PFILE->line_table, line + 1, COLS_HINT); \
|
|
} while (0)
|
|
|
|
/* Host alignment handling. */
|
|
struct dummy
|
|
{
|
|
char c;
|
|
union
|
|
{
|
|
double d;
|
|
int *p;
|
|
} u;
|
|
};
|
|
|
|
#define DEFAULT_ALIGNMENT offsetof (struct dummy, u)
|
|
#define CPP_ALIGN2(size, align) (((size) + ((align) - 1)) & ~((align) - 1))
|
|
#define CPP_ALIGN(size) CPP_ALIGN2 (size, DEFAULT_ALIGNMENT)
|
|
|
|
#define _cpp_mark_macro_used(NODE) \
|
|
(cpp_user_macro_p (NODE) ? (NODE)->value.macro->used = 1 : 0)
|
|
|
|
/* A generic memory buffer, and operations on it. */
|
|
typedef struct _cpp_buff _cpp_buff;
|
|
struct _cpp_buff
|
|
{
|
|
struct _cpp_buff *next;
|
|
unsigned char *base, *cur, *limit;
|
|
};
|
|
|
|
extern _cpp_buff *_cpp_get_buff (cpp_reader *, size_t);
|
|
extern void _cpp_release_buff (cpp_reader *, _cpp_buff *);
|
|
extern void _cpp_extend_buff (cpp_reader *, _cpp_buff **, size_t);
|
|
extern _cpp_buff *_cpp_append_extend_buff (cpp_reader *, _cpp_buff *, size_t);
|
|
extern void _cpp_free_buff (_cpp_buff *);
|
|
extern unsigned char *_cpp_aligned_alloc (cpp_reader *, size_t);
|
|
extern unsigned char *_cpp_unaligned_alloc (cpp_reader *, size_t);
|
|
|
|
#define BUFF_ROOM(BUFF) (size_t) ((BUFF)->limit - (BUFF)->cur)
|
|
#define BUFF_FRONT(BUFF) ((BUFF)->cur)
|
|
#define BUFF_LIMIT(BUFF) ((BUFF)->limit)
|
|
|
|
/* #include types. */
|
|
enum include_type
|
|
{
|
|
/* Directive-based including mechanisms. */
|
|
IT_INCLUDE, /* #include */
|
|
IT_INCLUDE_NEXT, /* #include_next */
|
|
IT_IMPORT, /* #import */
|
|
IT_EMBED, /* #embed */
|
|
|
|
/* Non-directive including mechanisms. */
|
|
IT_CMDLINE, /* -include */
|
|
IT_DEFAULT, /* forced header */
|
|
IT_MAIN, /* main, start on line 1 */
|
|
IT_PRE_MAIN, /* main, but there will be a preamble before line
|
|
1 */
|
|
|
|
IT_DIRECTIVE_HWM = IT_IMPORT + 1, /* Directives below this. */
|
|
IT_HEADER_HWM = IT_DEFAULT + 1 /* Header files below this. */
|
|
};
|
|
|
|
union utoken
|
|
{
|
|
const cpp_token *token;
|
|
const cpp_token **ptoken;
|
|
};
|
|
|
|
/* A "run" of tokens; part of a chain of runs. */
|
|
typedef struct tokenrun tokenrun;
|
|
struct tokenrun
|
|
{
|
|
tokenrun *next, *prev;
|
|
cpp_token *base, *limit;
|
|
};
|
|
|
|
/* Accessor macros for struct cpp_context. */
|
|
#define FIRST(c) ((c)->u.iso.first)
|
|
#define LAST(c) ((c)->u.iso.last)
|
|
#define CUR(c) ((c)->u.trad.cur)
|
|
#define RLIMIT(c) ((c)->u.trad.rlimit)
|
|
|
|
/* This describes some additional data that is added to the macro
|
|
token context of type cpp_context, when -ftrack-macro-expansion is
|
|
on. */
|
|
typedef struct
|
|
{
|
|
/* The node of the macro we are referring to. */
|
|
cpp_hashnode *macro_node;
|
|
/* This buffer contains an array of virtual locations. The virtual
|
|
location at index 0 is the virtual location of the token at index
|
|
0 in the current instance of cpp_context; similarly for all the
|
|
other virtual locations. */
|
|
location_t *virt_locs;
|
|
/* This is a pointer to the current virtual location. This is used
|
|
to iterate over the virtual locations while we iterate over the
|
|
tokens they belong to. */
|
|
location_t *cur_virt_loc;
|
|
} macro_context;
|
|
|
|
/* The kind of tokens carried by a cpp_context. */
|
|
enum context_tokens_kind {
|
|
/* This is the value of cpp_context::tokens_kind if u.iso.first
|
|
contains an instance of cpp_token **. */
|
|
TOKENS_KIND_INDIRECT,
|
|
/* This is the value of cpp_context::tokens_kind if u.iso.first
|
|
contains an instance of cpp_token *. */
|
|
TOKENS_KIND_DIRECT,
|
|
/* This is the value of cpp_context::tokens_kind when the token
|
|
context contains tokens resulting from macro expansion. In that
|
|
case struct cpp_context::macro points to an instance of struct
|
|
macro_context. This is used only when the
|
|
-ftrack-macro-expansion flag is on. */
|
|
TOKENS_KIND_EXTENDED
|
|
};
|
|
|
|
typedef struct cpp_context cpp_context;
|
|
struct cpp_context
|
|
{
|
|
/* Doubly-linked list. */
|
|
cpp_context *next, *prev;
|
|
|
|
union
|
|
{
|
|
/* For ISO macro expansion. Contexts other than the base context
|
|
are contiguous tokens. e.g. macro expansions, expanded
|
|
argument tokens. */
|
|
struct
|
|
{
|
|
union utoken first;
|
|
union utoken last;
|
|
} iso;
|
|
|
|
/* For traditional macro expansion. */
|
|
struct
|
|
{
|
|
const unsigned char *cur;
|
|
const unsigned char *rlimit;
|
|
} trad;
|
|
} u;
|
|
|
|
/* If non-NULL, a buffer used for storage related to this context.
|
|
When the context is popped, the buffer is released. */
|
|
_cpp_buff *buff;
|
|
|
|
/* If tokens_kind is TOKEN_KIND_EXTENDED, then (as we thus are in a
|
|
macro context) this is a pointer to an instance of macro_context.
|
|
Otherwise if tokens_kind is *not* TOKEN_KIND_EXTENDED, then, if
|
|
we are in a macro context, this is a pointer to an instance of
|
|
cpp_hashnode, representing the name of the macro this context is
|
|
for. If we are not in a macro context, then this is just NULL.
|
|
Note that when tokens_kind is TOKEN_KIND_EXTENDED, the memory
|
|
used by the instance of macro_context pointed to by this member
|
|
is de-allocated upon de-allocation of the instance of struct
|
|
cpp_context. */
|
|
union
|
|
{
|
|
macro_context *mc;
|
|
cpp_hashnode *macro;
|
|
} c;
|
|
|
|
/* This determines the type of tokens held by this context. */
|
|
enum context_tokens_kind tokens_kind;
|
|
};
|
|
|
|
struct lexer_state
|
|
{
|
|
/* 1 if we're handling a directive. 2 if it's an include-like
|
|
directive. */
|
|
unsigned char in_directive;
|
|
|
|
/* Nonzero if in a directive that will handle padding tokens itself.
|
|
#include needs this to avoid problems with computed include and
|
|
spacing between tokens. */
|
|
unsigned char directive_wants_padding;
|
|
|
|
/* True if we are skipping a failed conditional group. */
|
|
unsigned char skipping;
|
|
|
|
/* Nonzero if in a directive that takes angle-bracketed headers. */
|
|
unsigned char angled_headers;
|
|
|
|
/* Nonzero if in a #if or #elif directive. */
|
|
unsigned char in_expression;
|
|
|
|
/* Nonzero to save comments. Turned off if discard_comments, and in
|
|
all directives apart from #define. */
|
|
unsigned char save_comments;
|
|
|
|
/* Nonzero if lexing __VA_ARGS__ and __VA_OPT__ are valid. */
|
|
unsigned char va_args_ok;
|
|
|
|
/* Nonzero if lexing poisoned identifiers is valid. */
|
|
unsigned char poisoned_ok;
|
|
|
|
/* Nonzero to prevent macro expansion. */
|
|
unsigned char prevent_expansion;
|
|
|
|
/* Nonzero when parsing arguments to a function-like macro. */
|
|
unsigned char parsing_args;
|
|
|
|
/* Nonzero if prevent_expansion is true only because output is
|
|
being discarded. */
|
|
unsigned char discarding_output;
|
|
|
|
/* Nonzero to skip evaluating part of an expression. */
|
|
unsigned int skip_eval;
|
|
|
|
/* Nonzero when tokenizing a deferred pragma. */
|
|
unsigned char in_deferred_pragma;
|
|
|
|
/* Count to token that is a header-name. */
|
|
unsigned char directive_file_token;
|
|
|
|
/* Nonzero if the deferred pragma being handled allows macro expansion. */
|
|
unsigned char pragma_allow_expansion;
|
|
|
|
/* Nonzero if _Pragma should not be interpreted. */
|
|
unsigned char ignore__Pragma;
|
|
};
|
|
|
|
/* Special nodes - identifiers with predefined significance. */
|
|
struct spec_nodes
|
|
{
|
|
cpp_hashnode *n_defined; /* defined operator */
|
|
cpp_hashnode *n_true; /* C++ keyword true */
|
|
cpp_hashnode *n_false; /* C++ keyword false */
|
|
cpp_hashnode *n__VA_ARGS__; /* C99 vararg macros */
|
|
cpp_hashnode *n__VA_OPT__; /* C++ vararg macros */
|
|
|
|
enum {M_EXPORT, M_MODULE, M_IMPORT, M__IMPORT, M_HWM};
|
|
|
|
/* C++20 modules, only set when module_directives is in effect.
|
|
incoming variants [0], outgoing ones [1] */
|
|
cpp_hashnode *n_modules[M_HWM][2];
|
|
};
|
|
|
|
typedef struct _cpp_line_note _cpp_line_note;
|
|
struct _cpp_line_note
|
|
{
|
|
/* Location in the clean line the note refers to. */
|
|
const unsigned char *pos;
|
|
|
|
/* Type of note. The 9 'from' trigraph characters represent those
|
|
trigraphs, '\\' an escaped newline, ' ' an escaped newline with
|
|
intervening space, 0 represents a note that has already been handled,
|
|
and anything else is invalid. */
|
|
unsigned int type;
|
|
};
|
|
|
|
/* Tail padding required by search_line_fast alternatives. */
|
|
#ifdef HAVE_SSSE3
|
|
#define CPP_BUFFER_PADDING 64
|
|
#else
|
|
#define CPP_BUFFER_PADDING 16
|
|
#endif
|
|
|
|
/* Represents the contents of a file cpplib has read in. */
|
|
struct cpp_buffer
|
|
{
|
|
const unsigned char *cur; /* Current location. */
|
|
const unsigned char *line_base; /* Start of current physical line. */
|
|
const unsigned char *next_line; /* Start of to-be-cleaned logical line. */
|
|
|
|
const unsigned char *buf; /* Entire character buffer. */
|
|
const unsigned char *rlimit; /* Writable byte at end of file. */
|
|
const unsigned char *to_free; /* Pointer that should be freed when
|
|
popping the buffer. */
|
|
|
|
_cpp_line_note *notes; /* Array of notes. */
|
|
unsigned int cur_note; /* Next note to process. */
|
|
unsigned int notes_used; /* Number of notes. */
|
|
unsigned int notes_cap; /* Size of allocated array. */
|
|
|
|
struct cpp_buffer *prev;
|
|
|
|
/* Pointer into the file table; non-NULL if this is a file buffer.
|
|
Used for include_next and to record control macros. */
|
|
struct _cpp_file *file;
|
|
|
|
/* Saved value of __TIMESTAMP__ macro - date and time of last modification
|
|
of the assotiated file. */
|
|
const unsigned char *timestamp;
|
|
|
|
/* Value of if_stack at start of this file.
|
|
Used to prohibit unmatched #endif (etc) in an include file. */
|
|
struct if_stack *if_stack;
|
|
|
|
/* True if we need to get the next clean line. */
|
|
bool need_line : 1;
|
|
|
|
/* True if we have already warned about C++ comments in this file.
|
|
The warning happens only for C89 extended mode with -pedantic on,
|
|
or for -Wtraditional, and only once per file (otherwise it would
|
|
be far too noisy). */
|
|
bool warned_cplusplus_comments : 1;
|
|
|
|
/* True if we don't process trigraphs and escaped newlines. True
|
|
for preprocessed input, command line directives, and _Pragma
|
|
buffers. */
|
|
bool from_stage3 : 1;
|
|
|
|
/* At EOF, a buffer is automatically popped. If RETURN_AT_EOF is
|
|
true, a CPP_EOF token is then returned. Otherwise, the next
|
|
token from the enclosing buffer is returned. */
|
|
bool return_at_eof : 1;
|
|
|
|
/* One for a system header, two for a C system header file that therefore
|
|
needs to be extern "C" protected in C++, and zero otherwise. */
|
|
unsigned char sysp;
|
|
|
|
/* The directory of the this buffer's file. Its NAME member is not
|
|
allocated, so we don't need to worry about freeing it. */
|
|
struct cpp_dir dir;
|
|
|
|
/* Descriptor for converting from the input character set to the
|
|
source character set. */
|
|
struct cset_converter input_cset_desc;
|
|
};
|
|
|
|
/* The list of saved macros by push_macro pragma. */
|
|
struct def_pragma_macro {
|
|
/* Chain element to previous saved macro. */
|
|
struct def_pragma_macro *next;
|
|
/* Name of the macro. */
|
|
char *name;
|
|
/* The stored macro content. */
|
|
unsigned char *definition;
|
|
|
|
/* Definition line number. */
|
|
location_t line;
|
|
/* If macro defined in system header. */
|
|
unsigned int syshdr : 1;
|
|
/* Nonzero if it has been expanded or had its existence tested. */
|
|
unsigned int used : 1;
|
|
|
|
/* Mark if we save an undefined macro. */
|
|
unsigned int is_undef : 1;
|
|
/* Nonzero if it was a builtin macro. */
|
|
unsigned int is_builtin : 1;
|
|
};
|
|
|
|
/* A cpp_reader encapsulates the "state" of a pre-processor run.
|
|
Applying cpp_get_token repeatedly yields a stream of pre-processor
|
|
tokens. Usually, there is only one cpp_reader object active. */
|
|
struct cpp_reader
|
|
{
|
|
/* Top of buffer stack. */
|
|
cpp_buffer *buffer;
|
|
|
|
/* Overlaid buffer (can be different after processing #include). */
|
|
cpp_buffer *overlaid_buffer;
|
|
|
|
/* Lexer state. */
|
|
struct lexer_state state;
|
|
|
|
/* Source line tracking. */
|
|
class line_maps *line_table;
|
|
|
|
/* The line of the '#' of the current directive. */
|
|
location_t directive_line;
|
|
|
|
/* Memory buffers. */
|
|
_cpp_buff *a_buff; /* Aligned permanent storage. */
|
|
_cpp_buff *u_buff; /* Unaligned permanent storage. */
|
|
_cpp_buff *free_buffs; /* Free buffer chain. */
|
|
|
|
/* Context stack. */
|
|
struct cpp_context base_context;
|
|
struct cpp_context *context;
|
|
|
|
/* If in_directive, the directive if known. */
|
|
const struct directive *directive;
|
|
|
|
/* Token generated while handling a directive, if any. */
|
|
cpp_token directive_result;
|
|
|
|
/* When expanding a macro at top-level, this is the location of the
|
|
macro invocation. */
|
|
location_t invocation_location;
|
|
|
|
/* This is the node representing the macro being expanded at
|
|
top-level. The value of this data member is valid iff
|
|
cpp_in_macro_expansion_p() returns TRUE. */
|
|
cpp_hashnode *top_most_macro_node;
|
|
|
|
/* Nonzero if we are about to expand a macro. Note that if we are
|
|
really expanding a macro, the function macro_of_context returns
|
|
the macro being expanded and this flag is set to false. Client
|
|
code should use the function cpp_in_macro_expansion_p to know if we
|
|
are either about to expand a macro, or are actually expanding
|
|
one. */
|
|
bool about_to_expand_macro_p;
|
|
|
|
/* Search paths for include files. */
|
|
struct cpp_dir *quote_include; /* "" */
|
|
struct cpp_dir *bracket_include; /* <> */
|
|
struct cpp_dir no_search_path; /* No path. */
|
|
struct cpp_dir *embed_include; /* #embed <> */
|
|
|
|
/* Chain of all hashed _cpp_file instances. */
|
|
struct _cpp_file *all_files;
|
|
|
|
struct _cpp_file *main_file;
|
|
|
|
/* File and directory hash table. */
|
|
struct htab *file_hash;
|
|
struct htab *dir_hash;
|
|
struct file_hash_entry_pool *file_hash_entries;
|
|
|
|
/* Negative path lookup hash table. */
|
|
struct htab *nonexistent_file_hash;
|
|
struct obstack nonexistent_file_ob;
|
|
|
|
/* Nonzero means don't look for #include "foo" the source-file
|
|
directory. */
|
|
bool quote_ignores_source_dir;
|
|
|
|
/* Nonzero if any file has contained #pragma once or #import has
|
|
been used. */
|
|
bool seen_once_only;
|
|
|
|
/* Multiple include optimization. */
|
|
const cpp_hashnode *mi_cmacro;
|
|
const cpp_hashnode *mi_ind_cmacro;
|
|
bool mi_valid;
|
|
|
|
/* Lexing. */
|
|
cpp_token *cur_token;
|
|
tokenrun base_run, *cur_run;
|
|
unsigned int lookaheads;
|
|
|
|
/* Nonzero prevents the lexer from re-using the token runs. */
|
|
unsigned int keep_tokens;
|
|
|
|
/* Buffer to hold macro definition string. */
|
|
unsigned char *macro_buffer;
|
|
unsigned int macro_buffer_len;
|
|
|
|
/* Descriptor for converting from the source character set to the
|
|
execution character set. */
|
|
struct cset_converter narrow_cset_desc;
|
|
|
|
/* Descriptor for converting from the source character set to the
|
|
UTF-8 execution character set. */
|
|
struct cset_converter utf8_cset_desc;
|
|
|
|
/* Descriptor for converting from the source character set to the
|
|
UTF-16 execution character set. */
|
|
struct cset_converter char16_cset_desc;
|
|
|
|
/* Descriptor for converting from the source character set to the
|
|
UTF-32 execution character set. */
|
|
struct cset_converter char32_cset_desc;
|
|
|
|
/* Descriptor for converting from the source character set to the
|
|
wide execution character set. */
|
|
struct cset_converter wide_cset_desc;
|
|
|
|
/* Date and time text. Calculated together if either is requested. */
|
|
const unsigned char *date;
|
|
const unsigned char *time;
|
|
|
|
/* Time stamp, set idempotently lazily. */
|
|
time_t time_stamp;
|
|
int time_stamp_kind; /* Or errno. */
|
|
|
|
/* A token forcing paste avoidance, and one demarking macro arguments. */
|
|
cpp_token avoid_paste;
|
|
cpp_token endarg;
|
|
|
|
/* Opaque handle to the dependencies of mkdeps.cc. */
|
|
class mkdeps *deps;
|
|
|
|
/* Obstack holding all macro hash nodes. This never shrinks.
|
|
See identifiers.cc */
|
|
struct obstack hash_ob;
|
|
|
|
/* Obstack holding buffer and conditional structures. This is a
|
|
real stack. See directives.cc. */
|
|
struct obstack buffer_ob;
|
|
|
|
/* Pragma table - dynamic, because a library user can add to the
|
|
list of recognized pragmas. */
|
|
struct pragma_entry *pragmas;
|
|
|
|
/* Call backs to cpplib client. */
|
|
struct cpp_callbacks cb;
|
|
|
|
/* Identifier hash table. */
|
|
struct ht *hash_table;
|
|
|
|
/* Identifier ancillary data hash table. */
|
|
struct ht *extra_hash_table;
|
|
|
|
/* Expression parser stack. */
|
|
struct op *op_stack, *op_limit;
|
|
|
|
/* User visible options. */
|
|
struct cpp_options opts;
|
|
|
|
/* Special nodes - identifiers with predefined significance to the
|
|
preprocessor. */
|
|
struct spec_nodes spec_nodes;
|
|
|
|
/* Whether cpplib owns the hashtable. */
|
|
bool our_hashtable, our_extra_hashtable;
|
|
|
|
/* Traditional preprocessing output buffer (a logical line). */
|
|
struct
|
|
{
|
|
unsigned char *base;
|
|
unsigned char *limit;
|
|
unsigned char *cur;
|
|
location_t first_line;
|
|
} out;
|
|
|
|
/* Used for buffer overlays by traditional.cc. */
|
|
const unsigned char *saved_cur, *saved_rlimit, *saved_line_base;
|
|
|
|
/* A saved list of the defined macros, for dependency checking
|
|
of precompiled headers. */
|
|
struct cpp_savedstate *savedstate;
|
|
|
|
/* Next value of __COUNTER__ macro. */
|
|
unsigned int counter;
|
|
|
|
/* Table of comments, when state.save_comments is true. */
|
|
cpp_comment_table comments;
|
|
|
|
/* List of saved macros by push_macro. */
|
|
struct def_pragma_macro *pushed_macros;
|
|
|
|
/* If non-zero, the lexer will use this location for the next token
|
|
instead of getting a location from the linemap. */
|
|
location_t forced_token_location;
|
|
|
|
/* Location identifying the main source file -- intended to be line
|
|
zero of said file. */
|
|
location_t main_loc;
|
|
|
|
/* Returns true iff we should warn about UTF-8 bidirectional control
|
|
characters. */
|
|
bool warn_bidi_p () const
|
|
{
|
|
return (CPP_OPTION (this, cpp_warn_bidirectional)
|
|
& (bidirectional_unpaired|bidirectional_any));
|
|
}
|
|
};
|
|
|
|
/* Lists of tokens for #embed/__has_embed prefix/suffix/if_empty
|
|
parameters. */
|
|
struct cpp_embed_params_tokens
|
|
{
|
|
cpp_token *cur_token;
|
|
tokenrun base_run, *cur_run;
|
|
size_t count;
|
|
};
|
|
|
|
/* #embed and __has_embed parameters. */
|
|
struct cpp_embed_params
|
|
{
|
|
location_t loc;
|
|
bool has_embed;
|
|
cpp_num_part limit;
|
|
cpp_embed_params_tokens prefix, suffix, if_empty;
|
|
};
|
|
|
|
/* Character classes. Based on the more primitive macros in safe-ctype.h.
|
|
If the definition of `numchar' looks odd to you, please look up the
|
|
definition of a pp-number in the C standard [section 6.4.8 of C99].
|
|
|
|
In the unlikely event that characters other than \r and \n enter
|
|
the set is_vspace, the macro handle_newline() in lex.cc must be
|
|
updated. */
|
|
#define _dollar_ok(x) ((x) == '$' && CPP_OPTION (pfile, dollars_in_ident))
|
|
|
|
#define is_idchar(x) (ISIDNUM(x) || _dollar_ok(x))
|
|
#define is_numchar(x) ISIDNUM(x)
|
|
#define is_idstart(x) (ISIDST(x) || _dollar_ok(x))
|
|
#define is_numstart(x) ISDIGIT(x)
|
|
#define is_hspace(x) ISBLANK(x)
|
|
#define is_vspace(x) IS_VSPACE(x)
|
|
#define is_nvspace(x) IS_NVSPACE(x)
|
|
#define is_space(x) IS_SPACE_OR_NUL(x)
|
|
|
|
#define SEEN_EOL() (pfile->cur_token[-1].type == CPP_EOF)
|
|
|
|
/* This table is constant if it can be initialized at compile time,
|
|
which is the case if cpp was compiled with GCC >=2.7, or another
|
|
compiler that supports C99. */
|
|
#if HAVE_DESIGNATED_INITIALIZERS
|
|
extern const unsigned char _cpp_trigraph_map[UCHAR_MAX + 1];
|
|
#else
|
|
extern unsigned char _cpp_trigraph_map[UCHAR_MAX + 1];
|
|
#endif
|
|
|
|
#if !defined (HAVE_UCHAR) && !defined (IN_GCC)
|
|
typedef unsigned char uchar;
|
|
#endif
|
|
|
|
#define UC (const uchar *) /* Intended use: UC"string" */
|
|
|
|
/* Accessors. */
|
|
|
|
inline int
|
|
_cpp_in_system_header (cpp_reader *pfile)
|
|
{
|
|
return pfile->buffer ? pfile->buffer->sysp : 0;
|
|
}
|
|
#define CPP_PEDANTIC(PF) CPP_OPTION (PF, cpp_pedantic)
|
|
#define CPP_WTRADITIONAL(PF) CPP_OPTION (PF, cpp_warn_traditional)
|
|
|
|
/* Return true if we're in the main file (unless it's considered to be
|
|
an include file in its own right. */
|
|
inline int
|
|
_cpp_in_main_source_file (cpp_reader *pfile)
|
|
{
|
|
return (!CPP_OPTION (pfile, main_search)
|
|
&& pfile->buffer->file == pfile->main_file);
|
|
}
|
|
|
|
/* True if NODE is a macro for the purposes of ifdef, defined etc. */
|
|
inline bool _cpp_defined_macro_p (cpp_hashnode *node)
|
|
{
|
|
/* Do not treat conditional macros as being defined. This is due to
|
|
the powerpc port using conditional macros for 'vector', 'bool',
|
|
and 'pixel' to act as conditional keywords. This messes up tests
|
|
like #ifndef bool. */
|
|
return cpp_macro_p (node) && !(node->flags & NODE_CONDITIONAL);
|
|
}
|
|
|
|
/* In macro.cc */
|
|
extern bool _cpp_notify_macro_use (cpp_reader *pfile, cpp_hashnode *node,
|
|
location_t);
|
|
inline bool _cpp_maybe_notify_macro_use (cpp_reader *pfile, cpp_hashnode *node,
|
|
location_t loc)
|
|
{
|
|
if (!(node->flags & NODE_USED))
|
|
return _cpp_notify_macro_use (pfile, node, loc);
|
|
return true;
|
|
}
|
|
extern cpp_macro *_cpp_new_macro (cpp_reader *, cpp_macro_kind, void *);
|
|
extern void _cpp_free_definition (cpp_hashnode *);
|
|
extern bool _cpp_create_definition (cpp_reader *, cpp_hashnode *, location_t);
|
|
extern void _cpp_pop_context (cpp_reader *);
|
|
extern void _cpp_push_text_context (cpp_reader *, cpp_hashnode *,
|
|
const unsigned char *, size_t);
|
|
extern bool _cpp_save_parameter (cpp_reader *, unsigned, cpp_hashnode *,
|
|
cpp_hashnode *);
|
|
extern void _cpp_unsave_parameters (cpp_reader *, unsigned);
|
|
extern bool _cpp_arguments_ok (cpp_reader *, cpp_macro *, const cpp_hashnode *,
|
|
unsigned int);
|
|
extern const unsigned char *_cpp_builtin_macro_text (cpp_reader *,
|
|
cpp_hashnode *,
|
|
location_t = 0);
|
|
extern const cpp_token *_cpp_get_token_no_padding (cpp_reader *);
|
|
extern int _cpp_warn_if_unused_macro (cpp_reader *, cpp_hashnode *, void *);
|
|
extern void _cpp_push_token_context (cpp_reader *, cpp_hashnode *,
|
|
const cpp_token *, unsigned int);
|
|
extern void _cpp_backup_tokens_direct (cpp_reader *, unsigned int);
|
|
|
|
/* In identifiers.cc */
|
|
extern void
|
|
_cpp_init_hashtable (cpp_reader *, cpp_hash_table *, cpp_hash_table *);
|
|
extern void _cpp_destroy_hashtable (cpp_reader *);
|
|
|
|
/* In files.cc */
|
|
enum _cpp_find_file_kind
|
|
{ _cpp_FFK_NORMAL, _cpp_FFK_FAKE, _cpp_FFK_PRE_INCLUDE, _cpp_FFK_HAS_INCLUDE,
|
|
_cpp_FFK_EMBED, _cpp_FFK_HAS_EMBED };
|
|
extern _cpp_file *_cpp_find_file (cpp_reader *, const char *, cpp_dir *,
|
|
int angle, _cpp_find_file_kind, location_t);
|
|
extern bool _cpp_find_failed (_cpp_file *);
|
|
extern void _cpp_mark_file_once_only (cpp_reader *, struct _cpp_file *);
|
|
extern const char *_cpp_find_header_unit (cpp_reader *, const char *file,
|
|
bool angle_p, location_t);
|
|
extern int _cpp_stack_embed (cpp_reader *, const char *, bool,
|
|
cpp_embed_params *);
|
|
extern void _cpp_fake_include (cpp_reader *, const char *);
|
|
extern bool _cpp_stack_file (cpp_reader *, _cpp_file*, include_type, location_t);
|
|
extern bool _cpp_stack_include (cpp_reader *, const char *, int,
|
|
enum include_type, location_t);
|
|
extern int _cpp_compare_file_date (cpp_reader *, const char *, int);
|
|
extern void _cpp_report_missing_guards (cpp_reader *);
|
|
extern void _cpp_init_files (cpp_reader *);
|
|
extern void _cpp_cleanup_files (cpp_reader *);
|
|
extern void _cpp_pop_file_buffer (cpp_reader *, struct _cpp_file *,
|
|
const unsigned char *);
|
|
extern bool _cpp_save_file_entries (cpp_reader *pfile, FILE *f);
|
|
extern bool _cpp_read_file_entries (cpp_reader *, FILE *);
|
|
extern const char *_cpp_get_file_name (_cpp_file *);
|
|
extern struct stat *_cpp_get_file_stat (_cpp_file *);
|
|
extern bool _cpp_has_header (cpp_reader *, const char *, int,
|
|
enum include_type);
|
|
|
|
/* In expr.cc */
|
|
extern cpp_num_part _cpp_parse_expr (cpp_reader *, const char *,
|
|
const cpp_token *);
|
|
extern struct op *_cpp_expand_op_stack (cpp_reader *);
|
|
|
|
/* In lex.cc */
|
|
extern void _cpp_process_line_notes (cpp_reader *, int);
|
|
extern void _cpp_clean_line (cpp_reader *);
|
|
extern bool _cpp_get_fresh_line (cpp_reader *);
|
|
extern bool _cpp_skip_block_comment (cpp_reader *);
|
|
extern cpp_token *_cpp_temp_token (cpp_reader *);
|
|
extern const cpp_token *_cpp_lex_token (cpp_reader *);
|
|
extern cpp_token *_cpp_lex_direct (cpp_reader *);
|
|
extern unsigned char *_cpp_spell_ident_ucns (unsigned char *, cpp_hashnode *);
|
|
extern int _cpp_equiv_tokens (const cpp_token *, const cpp_token *);
|
|
extern void _cpp_init_tokenrun (tokenrun *, unsigned int);
|
|
extern cpp_hashnode *_cpp_lex_identifier (cpp_reader *, const char *);
|
|
extern int _cpp_remaining_tokens_num_in_context (cpp_context *);
|
|
extern void _cpp_init_lexer (void);
|
|
static inline void *_cpp_reserve_room (cpp_reader *pfile, size_t have,
|
|
size_t extra)
|
|
{
|
|
if (BUFF_ROOM (pfile->a_buff) < (have + extra))
|
|
_cpp_extend_buff (pfile, &pfile->a_buff, extra);
|
|
return BUFF_FRONT (pfile->a_buff);
|
|
}
|
|
extern void *_cpp_commit_buff (cpp_reader *pfile, size_t size);
|
|
|
|
/* In init.cc. */
|
|
extern void _cpp_maybe_push_include_file (cpp_reader *);
|
|
extern const char *cpp_named_operator2name (enum cpp_ttype type);
|
|
extern void _cpp_restore_special_builtin (cpp_reader *pfile,
|
|
struct def_pragma_macro *);
|
|
|
|
/* In directives.cc */
|
|
extern int _cpp_test_assertion (cpp_reader *, unsigned int *);
|
|
extern int _cpp_handle_directive (cpp_reader *, bool);
|
|
extern void _cpp_define_builtin (cpp_reader *, const char *);
|
|
extern char ** _cpp_save_pragma_names (cpp_reader *);
|
|
extern void _cpp_restore_pragma_names (cpp_reader *, char **);
|
|
extern int _cpp_do__Pragma (cpp_reader *, location_t);
|
|
extern void _cpp_init_directives (cpp_reader *);
|
|
extern void _cpp_init_internal_pragmas (cpp_reader *);
|
|
extern bool _cpp_parse_embed_params (cpp_reader *, struct cpp_embed_params *);
|
|
extern void _cpp_do_file_change (cpp_reader *, enum lc_reason, const char *,
|
|
linenum_type, unsigned int);
|
|
extern void _cpp_pop_buffer (cpp_reader *);
|
|
extern char *_cpp_bracket_include (cpp_reader *);
|
|
|
|
/* In errors.cc */
|
|
extern location_t cpp_diagnostic_get_current_location (cpp_reader *);
|
|
|
|
/* In traditional.cc. */
|
|
extern bool _cpp_scan_out_logical_line (cpp_reader *, cpp_macro *, bool);
|
|
extern bool _cpp_read_logical_line_trad (cpp_reader *);
|
|
extern void _cpp_overlay_buffer (cpp_reader *pfile, const unsigned char *,
|
|
size_t);
|
|
extern void _cpp_remove_overlay (cpp_reader *);
|
|
extern cpp_macro *_cpp_create_trad_definition (cpp_reader *);
|
|
extern bool _cpp_expansions_different_trad (const cpp_macro *,
|
|
const cpp_macro *);
|
|
extern unsigned char *_cpp_copy_replacement_text (const cpp_macro *,
|
|
unsigned char *);
|
|
extern size_t _cpp_replacement_text_len (const cpp_macro *);
|
|
|
|
/* In charset.cc. */
|
|
|
|
/* The normalization state at this point in the sequence.
|
|
It starts initialized to all zeros, and at the end
|
|
'level' is the normalization level of the sequence. */
|
|
|
|
struct normalize_state
|
|
{
|
|
/* The previous starter character. */
|
|
cppchar_t previous;
|
|
/* The combining class of the previous character (whether or not a
|
|
starter). */
|
|
unsigned char prev_class;
|
|
/* The lowest normalization level so far. */
|
|
enum cpp_normalize_level level;
|
|
};
|
|
#define INITIAL_NORMALIZE_STATE { 0, 0, normalized_KC }
|
|
#define NORMALIZE_STATE_RESULT(st) ((st)->level)
|
|
|
|
/* We saw a character C that matches ISIDNUM(), update a
|
|
normalize_state appropriately. */
|
|
#define NORMALIZE_STATE_UPDATE_IDNUM(st, c) \
|
|
((st)->previous = (c), (st)->prev_class = 0)
|
|
|
|
extern bool _cpp_valid_ucn (cpp_reader *, const unsigned char **,
|
|
const unsigned char *, int,
|
|
struct normalize_state *state,
|
|
cppchar_t *,
|
|
source_range *char_range,
|
|
cpp_string_location_reader *loc_reader);
|
|
|
|
extern bool _cpp_valid_utf8 (cpp_reader *pfile,
|
|
const uchar **pstr,
|
|
const uchar *limit,
|
|
int identifier_pos,
|
|
struct normalize_state *nst,
|
|
cppchar_t *cp);
|
|
|
|
extern void _cpp_destroy_iconv (cpp_reader *);
|
|
extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
|
|
unsigned char *, size_t, size_t,
|
|
const unsigned char **, off_t *);
|
|
extern const char *_cpp_default_encoding (void);
|
|
extern cpp_hashnode * _cpp_interpret_identifier (cpp_reader *pfile,
|
|
const unsigned char *id,
|
|
size_t len);
|
|
|
|
/* Utility routines and macros. */
|
|
#define DSC(str) (const unsigned char *)str, sizeof str - 1
|
|
|
|
/* These are inline functions instead of macros so we can get type
|
|
checking. */
|
|
static inline int ustrcmp (const unsigned char *, const unsigned char *);
|
|
static inline int ustrncmp (const unsigned char *, const unsigned char *,
|
|
size_t);
|
|
static inline size_t ustrlen (const unsigned char *);
|
|
static inline const unsigned char *uxstrdup (const unsigned char *);
|
|
static inline const unsigned char *ustrchr (const unsigned char *, int);
|
|
static inline int ufputs (const unsigned char *, FILE *);
|
|
|
|
/* Use a const char for the second parameter since it is usually a literal. */
|
|
static inline int ustrcspn (const unsigned char *, const char *);
|
|
|
|
static inline int
|
|
ustrcmp (const unsigned char *s1, const unsigned char *s2)
|
|
{
|
|
return strcmp ((const char *)s1, (const char *)s2);
|
|
}
|
|
|
|
static inline int
|
|
ustrncmp (const unsigned char *s1, const unsigned char *s2, size_t n)
|
|
{
|
|
return strncmp ((const char *)s1, (const char *)s2, n);
|
|
}
|
|
|
|
static inline int
|
|
ustrcspn (const unsigned char *s1, const char *s2)
|
|
{
|
|
return strcspn ((const char *)s1, s2);
|
|
}
|
|
|
|
static inline size_t
|
|
ustrlen (const unsigned char *s1)
|
|
{
|
|
return strlen ((const char *)s1);
|
|
}
|
|
|
|
static inline const unsigned char *
|
|
uxstrdup (const unsigned char *s1)
|
|
{
|
|
return (const unsigned char *) xstrdup ((const char *)s1);
|
|
}
|
|
|
|
static inline const unsigned char *
|
|
ustrchr (const unsigned char *s1, int c)
|
|
{
|
|
return (const unsigned char *) strchr ((const char *)s1, c);
|
|
}
|
|
|
|
static inline int
|
|
ufputs (const unsigned char *s, FILE *f)
|
|
{
|
|
return fputs ((const char *)s, f);
|
|
}
|
|
|
|
/* In line-map.cc. */
|
|
|
|
/* Create and return a virtual location for a token that is part of a
|
|
macro expansion-list at a macro expansion point. See the comment
|
|
inside struct line_map_macro to see what an expansion-list exactly
|
|
is.
|
|
|
|
A call to this function must come after a call to
|
|
linemap_enter_macro.
|
|
|
|
MAP is the map into which the source location is created. TOKEN_NO
|
|
is the index of the token in the macro replacement-list, starting
|
|
at number 0.
|
|
|
|
ORIG_LOC is the location of the token outside of this macro
|
|
expansion. If the token comes originally from the macro
|
|
definition, it is the locus in the macro definition; otherwise it
|
|
is a location in the context of the caller of this macro expansion
|
|
(which is a virtual location or a source location if the caller is
|
|
itself a macro expansion or not).
|
|
|
|
MACRO_DEFINITION_LOC is the location in the macro definition,
|
|
either of the token itself or of a macro parameter that it
|
|
replaces. */
|
|
location_t linemap_add_macro_token (const line_map_macro *,
|
|
unsigned int,
|
|
location_t,
|
|
location_t);
|
|
|
|
/* Return the source line number corresponding to source location
|
|
LOCATION. SET is the line map set LOCATION comes from. If
|
|
LOCATION is the location of token that is part of the
|
|
expansion-list of a macro expansion return the line number of the
|
|
macro expansion point. */
|
|
int linemap_get_expansion_line (const line_maps *,
|
|
location_t);
|
|
|
|
/* Return the path of the file corresponding to source code location
|
|
LOCATION.
|
|
|
|
If LOCATION is the location of a token that is part of the
|
|
replacement-list of a macro expansion return the file path of the
|
|
macro expansion point.
|
|
|
|
SET is the line map set LOCATION comes from. */
|
|
const char* linemap_get_expansion_filename (const line_maps *,
|
|
location_t);
|
|
|
|
/* A subclass of rich_location for emitting a diagnostic
|
|
at the current location of the reader, but flagging
|
|
it with set_escape_on_output (true). */
|
|
class encoding_rich_location : public rich_location
|
|
{
|
|
public:
|
|
encoding_rich_location (cpp_reader *pfile)
|
|
: rich_location (pfile->line_table,
|
|
cpp_diagnostic_get_current_location (pfile))
|
|
{
|
|
set_escape_on_output (true);
|
|
}
|
|
|
|
encoding_rich_location (cpp_reader *pfile, location_t loc)
|
|
: rich_location (pfile->line_table, loc)
|
|
{
|
|
set_escape_on_output (true);
|
|
}
|
|
};
|
|
|
|
#ifdef __cplusplus
|
|
}
|
|
#endif
|
|
|
|
#endif /* ! LIBCPP_INTERNAL_H */
|