1 /-- This set of tests is for UTF-8 support but not Unicode property support,
2 and is relevant only to the 8-bit library. --/
69 Failed: invalid UTF-8 string at offset 1
72 Failed: invalid UTF-8 string at offset 0
75 Failed: invalid UTF-8 string at offset 0
78 ------------------------------------------------------------------
83 ------------------------------------------------------------------
84 Capturing subpattern count = 0
85 Options: no_auto_possessify utf no_utf_check
91 Error -10 (bad UTF-8 string) offset=0 reason=1
93 Error -10 (bad UTF-8 string) offset=0 reason=2
95 Error -10 (bad UTF-8 string) offset=0 reason=1
97 Error -10 (bad UTF-8 string) offset=0 reason=3
99 Error -10 (bad UTF-8 string) offset=0 reason=2
101 Error -10 (bad UTF-8 string) offset=0 reason=1
103 Error -10 (bad UTF-8 string) offset=0 reason=4
105 Error -10 (bad UTF-8 string) offset=0 reason=3
107 Error -10 (bad UTF-8 string) offset=0 reason=2
109 Error -10 (bad UTF-8 string) offset=0 reason=1
111 Error -10 (bad UTF-8 string) offset=0 reason=5
113 Error -10 (bad UTF-8 string) offset=0 reason=4
115 Error -10 (bad UTF-8 string) offset=0 reason=3
117 Error -10 (bad UTF-8 string) offset=0 reason=2
119 Error -10 (bad UTF-8 string) offset=0 reason=1
121 Error -10 (bad UTF-8 string) offset=0 reason=6
123 Error -10 (bad UTF-8 string) offset=0 reason=6
125 Error -10 (bad UTF-8 string) offset=0 reason=7
127 Error -10 (bad UTF-8 string) offset=0 reason=6
129 Error -10 (bad UTF-8 string) offset=0 reason=7
131 Error -10 (bad UTF-8 string) offset=0 reason=8
133 Error -10 (bad UTF-8 string) offset=0 reason=6
135 Error -10 (bad UTF-8 string) offset=0 reason=7
137 Error -10 (bad UTF-8 string) offset=0 reason=8
139 Error -10 (bad UTF-8 string) offset=0 reason=9
140 \xfd\x7f\x80\x80\x80\x80
141 Error -10 (bad UTF-8 string) offset=0 reason=6
142 \xfd\x80\x7f\x80\x80\x80
143 Error -10 (bad UTF-8 string) offset=0 reason=7
144 \xfd\x80\x80\x7f\x80\x80
145 Error -10 (bad UTF-8 string) offset=0 reason=8
146 \xfd\x80\x80\x80\x7f\x80
147 Error -10 (bad UTF-8 string) offset=0 reason=9
148 \xfd\x80\x80\x80\x80\x7f
149 Error -10 (bad UTF-8 string) offset=0 reason=10
151 Error -10 (bad UTF-8 string) offset=0 reason=14
153 Error -10 (bad UTF-8 string) offset=0 reason=15
155 Error -10 (bad UTF-8 string) offset=0 reason=16
157 Error -10 (bad UTF-8 string) offset=0 reason=17
159 Error -10 (bad UTF-8 string) offset=0 reason=18
160 \xfc\x80\x80\x80\x80\x8f
161 Error -10 (bad UTF-8 string) offset=0 reason=19
163 Error -10 (bad UTF-8 string) offset=0 reason=20
165 Error -10 (bad UTF-8 string) offset=0 reason=21
167 Error -10 (bad UTF-8 string) offset=0 reason=21
171 Error -10 (bad UTF-8 string) offset=0 reason=11
172 \xfd\x80\x80\x80\x80\x80
173 Error -10 (bad UTF-8 string) offset=0 reason=12
175 Error -10 (bad UTF-8 string) offset=0 reason=13
179 Error -25 (short UTF-8 string) offset=0 reason=1
181 Error -25 (short UTF-8 string) offset=0 reason=2
183 Error -25 (short UTF-8 string) offset=0 reason=1
185 Error -25 (short UTF-8 string) offset=0 reason=3
187 Error -25 (short UTF-8 string) offset=0 reason=2
189 Error -25 (short UTF-8 string) offset=0 reason=1
191 Error -25 (short UTF-8 string) offset=0 reason=4
193 Error -25 (short UTF-8 string) offset=0 reason=3
195 Error -25 (short UTF-8 string) offset=0 reason=2
197 Error -25 (short UTF-8 string) offset=0 reason=1
199 Error -25 (short UTF-8 string) offset=0 reason=5
201 Error -25 (short UTF-8 string) offset=0 reason=4
203 Error -25 (short UTF-8 string) offset=0 reason=3
205 Error -25 (short UTF-8 string) offset=0 reason=2
206 \P\P\xfd\x80\x80\x80\x80
207 Error -25 (short UTF-8 string) offset=0 reason=1
211 Error -10 (bad UTF-8 string) offset=0 reason=15
213 Error -10 (bad UTF-8 string) offset=0 reason=15
215 Error -10 (bad UTF-8 string) offset=0 reason=16
217 Error -10 (bad UTF-8 string) offset=0 reason=17
219 Error -10 (bad UTF-8 string) offset=0 reason=18
220 \xfc\x83\x80\x80\x80\x80
221 Error -10 (bad UTF-8 string) offset=0 reason=19
222 \xfe\x80\x80\x80\x80\x80
223 Error -10 (bad UTF-8 string) offset=0 reason=21
224 \xff\x80\x80\x80\x80\x80
225 Error -10 (bad UTF-8 string) offset=0 reason=21
237 Error -10 (bad UTF-8 string) offset=0 reason=11
239 Error -10 (bad UTF-8 string) offset=0 reason=11
240 \xfc\x84\x80\x80\x80\x80
241 Error -10 (bad UTF-8 string) offset=0 reason=12
242 \xfd\x83\x80\x80\x80\x80
243 Error -10 (bad UTF-8 string) offset=0 reason=12
244 \?\xf8\x88\x80\x80\x80
246 \?\xf9\x87\x80\x80\x80
248 \?\xfc\x84\x80\x80\x80\x80
250 \?\xfd\x83\x80\x80\x80\x80
254 ------------------------------------------------------------------
259 ------------------------------------------------------------------
260 Capturing subpattern count = 0
266 ------------------------------------------------------------------
271 ------------------------------------------------------------------
272 Capturing subpattern count = 0
278 ------------------------------------------------------------------
283 ------------------------------------------------------------------
284 Capturing subpattern count = 0
290 ------------------------------------------------------------------
295 ------------------------------------------------------------------
296 Capturing subpattern count = 0
302 ------------------------------------------------------------------
307 ------------------------------------------------------------------
308 Capturing subpattern count = 0
314 ------------------------------------------------------------------
319 ------------------------------------------------------------------
320 Capturing subpattern count = 0
326 ------------------------------------------------------------------
331 ------------------------------------------------------------------
332 Capturing subpattern count = 0
338 ------------------------------------------------------------------
343 ------------------------------------------------------------------
344 Capturing subpattern count = 0
350 ------------------------------------------------------------------
355 ------------------------------------------------------------------
356 Capturing subpattern count = 0
361 /\x{D55c}\x{ad6d}\x{C5B4}/DZ8
362 ------------------------------------------------------------------
364 \x{d55c}\x{ad6d}\x{c5b4}
367 ------------------------------------------------------------------
368 Capturing subpattern count = 0
372 \x{D55c}\x{ad6d}\x{C5B4}
373 0: \x{d55c}\x{ad6d}\x{c5b4}
375 /\x{65e5}\x{672c}\x{8a9e}/DZ8
376 ------------------------------------------------------------------
378 \x{65e5}\x{672c}\x{8a9e}
381 ------------------------------------------------------------------
382 Capturing subpattern count = 0
386 \x{65e5}\x{672c}\x{8a9e}
387 0: \x{65e5}\x{672c}\x{8a9e}
390 ------------------------------------------------------------------
395 ------------------------------------------------------------------
396 Capturing subpattern count = 0
402 ------------------------------------------------------------------
407 ------------------------------------------------------------------
408 Capturing subpattern count = 0
414 ------------------------------------------------------------------
419 ------------------------------------------------------------------
420 Capturing subpattern count = 0
426 ------------------------------------------------------------------
431 ------------------------------------------------------------------
432 Capturing subpattern count = 0
438 ------------------------------------------------------------------
443 ------------------------------------------------------------------
444 Capturing subpattern count = 0
449 /-- This one is here not because it's different to Perl, but because the way
450 the captured single-byte is displayed. (In Perl it becomes a character, and you
451 can't tell the difference.) --/
463 /-- This one is here because Perl gives out a grumbly error message (quite
464 correctly, but that messes up comparisons). --/
473 ------------------------------------------------------------------
475 [\x00-`c-\xbf\xf1-\xff] (neg)
478 ------------------------------------------------------------------
479 Capturing subpattern count = 0
483 Subject length lower bound = 1
484 Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
485 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
486 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
487 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
488 Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f
489 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
490 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
491 \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
492 \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
510 ------------------------------------------------------------------
516 ------------------------------------------------------------------
517 Capturing subpattern count = 0
521 Subject length lower bound = 3
522 No starting char list
523 \x{100}\x{100}\x{100}\x{100\x{100}
524 0: \x{100}\x{100}\x{100}
527 ------------------------------------------------------------------
536 ------------------------------------------------------------------
537 Capturing subpattern count = 1
541 Subject length lower bound = 1
542 Starting chars: x \xc4
545 ------------------------------------------------------------------
555 ------------------------------------------------------------------
556 Capturing subpattern count = 1
560 Subject length lower bound = 1
561 Starting chars: a x \xc4
563 /(\x{100}{0,2}a|x)/8SDZ
564 ------------------------------------------------------------------
574 ------------------------------------------------------------------
575 Capturing subpattern count = 1
579 Subject length lower bound = 1
580 Starting chars: a x \xc4
582 /(\x{100}{1,2}a|x)/8SDZ
583 ------------------------------------------------------------------
594 ------------------------------------------------------------------
595 Capturing subpattern count = 1
599 Subject length lower bound = 1
600 Starting chars: x \xc4
603 ------------------------------------------------------------------
608 ------------------------------------------------------------------
609 Capturing subpattern count = 0
614 /a\x{100}\x{101}*/8DZ
615 ------------------------------------------------------------------
621 ------------------------------------------------------------------
622 Capturing subpattern count = 0
627 /a\x{100}\x{101}+/8DZ
628 ------------------------------------------------------------------
634 ------------------------------------------------------------------
635 Capturing subpattern count = 0
641 ------------------------------------------------------------------
646 ------------------------------------------------------------------
647 Capturing subpattern count = 0
653 ------------------------------------------------------------------
658 ------------------------------------------------------------------
659 Capturing subpattern count = 0
673 ------------------------------------------------------------------
678 ------------------------------------------------------------------
679 Capturing subpattern count = 0
687 ------------------------------------------------------------------
692 ------------------------------------------------------------------
693 Capturing subpattern count = 0
698 /\x{100}abc(xyz(?1))/8DZ
699 ------------------------------------------------------------------
708 ------------------------------------------------------------------
709 Capturing subpattern count = 1
719 Capturing subpattern count = 0
729 ------------------------------------------------------------------
735 ------------------------------------------------------------------
736 Capturing subpattern count = 0
742 ------------------------------------------------------------------
748 ------------------------------------------------------------------
749 Capturing subpattern count = 0
755 Failed: missing terminating ] for character class at offset 15
757 /-- This tests the stricter UTF-8 check according to RFC 3629. --/
761 Error -10 (bad UTF-8 string) offset=0 reason=14
765 Error -10 (bad UTF-8 string) offset=0 reason=14
769 Error -10 (bad UTF-8 string) offset=0 reason=14
773 Error -10 (bad UTF-8 string) offset=0 reason=13
777 Error -10 (bad UTF-8 string) offset=0 reason=11
781 Error -10 (bad UTF-8 string) offset=0 reason=12
789 /(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
790 Capturing subpattern count = 0
791 Options: bsr_unicode utf
792 Forced newline sequence: CRLF
797 Capturing subpattern count = 0
801 Subject length lower bound = 1
802 Starting chars: \x09 \x20 \xc2 \xe1 \xe2 \xe3
823 Capturing subpattern count = 0
827 Subject length lower bound = 1
828 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
843 Capturing subpattern count = 0
847 Subject length lower bound = 1
848 Starting chars: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
853 Capturing subpattern count = 0
857 Subject length lower bound = 2
858 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
861 Capturing subpattern count = 0
865 Subject length lower bound = 4
866 Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x
869 Capturing subpattern count = 0
873 Subject length lower bound = 5
874 Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
881 Capturing subpattern count = 0
885 Subject length lower bound = 3
886 Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
887 \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
888 \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
889 D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
890 i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
891 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
892 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
893 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
894 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
904 Error -11 (bad UTF-8 offset)
912 Error -24 (bad offset value)
915 Capturing subpattern count = 0
916 Options: caseless utf
919 Subject length lower bound = 1
923 Capturing subpattern count = 0
924 Options: caseless utf
927 Subject length lower bound = 1
931 Capturing subpattern count = 0
932 Options: caseless utf
935 Subject length lower bound = 1
939 Capturing subpattern count = 0
940 Options: caseless utf
943 Subject length lower bound = 2
947 ------------------------------------------------------------------
952 ------------------------------------------------------------------
953 Capturing subpattern count = 0
959 ------------------------------------------------------------------
965 ------------------------------------------------------------------
966 Capturing subpattern count = 0
972 Capturing subpattern count = 0
976 Subject length lower bound = 1
977 Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2
980 ------------------------------------------------------------------
985 ------------------------------------------------------------------
986 Capturing subpattern count = 0
992 ------------------------------------------------------------------
998 ------------------------------------------------------------------
1003 ------------------------------------------------------------------
1009 ------------------------------------------------------------------
1014 ------------------------------------------------------------------
1020 ------------------------------------------------------------------
1025 ------------------------------------------------------------------
1031 ------------------------------------------------------------------
1036 ------------------------------------------------------------------
1042 ------------------------------------------------------------------
1047 ------------------------------------------------------------------
1053 ------------------------------------------------------------------
1058 ------------------------------------------------------------------
1065 ------------------------------------------------------------------
1070 ------------------------------------------------------------------
1077 ------------------------------------------------------------------
1082 ------------------------------------------------------------------
1088 ------------------------------------------------------------------
1093 ------------------------------------------------------------------
1099 ------------------------------------------------------------------
1104 ------------------------------------------------------------------
1111 ------------------------------------------------------------------
1116 ------------------------------------------------------------------
1123 ------------------------------------------------------------------
1129 ** Character \x{ff000041} is greater than 0x7fffffff and so cannot be converted to UTF-8
1131 Error -10 (bad UTF-8 string) offset=0 reason=12
1134 Failed: setting UTF is disabled by the application at offset 0
1137 Failed: setting UTF is disabled by the application at offset 0
1139 /-- End of testinput15 --/