docs/TheArtOfHttpScripting

   1                                   _   _ ____  _
   2                               ___| | | |  _ \| |
   3                              / __| | | | |_) | |
   4                             | (__| |_| |  _ <| |___
   5                              \___|\___/|_| \_\_____|
   6
   7
   8 The Art Of Scripting HTTP Requests Using Curl
   9
  10  1. HTTP Scripting
  11  1.1 Background
  12  1.2 The HTTP Protocol
  13  1.3 See the Protocol
  14  1.4 See the Timing
  15  1.5 See the Response
  16  2. URL
  17  2.1 Spec
  18  2.2 Host
  19  2.3 Port number
  20  2.4 User name and password
  21  2.5 Path part
  22  3. Fetch a page
  23  3.1 GET
  24  3.2 HEAD
  25  3.3 Multiple URLs in a single command line
  26  3.4 Multiple HTTP methods in a single command line
  27  4. HTML forms
  28  4.1 Forms explained
  29  4.2 GET
  30  4.3 POST
  31  4.4 File Upload POST
  32  4.5 Hidden Fields
  33  4.6 Figure Out What A POST Looks Like
  34  5. HTTP upload
  35  5.1 PUT
  36  6. HTTP Authentication
  37  6.1 Basic Authentication
  38  6.2 Other Authentication
  39  6.3 Proxy Authentication
  40  6.4 Hiding credentials
  41  7. More HTTP Headers
  42  7.1 Referer
  43  7.2 User Agent
  44  8. Redirects
  45  8.1 Location header
  46  8.2 Other redirects
  47  9. Cookies
  48  9.1 Cookie Basics
  49  9.2 Cookie options
  50  10. HTTPS
  51  10.1 HTTPS is HTTP secure
  52  10.2 Certificates
  53  11. Custom Request Elements
  54  11.1 Modify method and headers
  55  11.2 More on changed methods
  56  12. Web Login
  57  12.1 Some login tricks
  58  13. Debug
  59  13.1 Some debug tricks
  60  14. References
  61  14.1 Standards
  62  14.2 Sites
  63
  64 ==============================================================================
  65
  66 1. HTTP Scripting
  67
  68  1.1 Background
  69
  70  This document assumes that you're familiar with HTML and general networking.
  71
  72  The increasing amount of applications moving to the web has made "HTTP
  73  Scripting" more frequently requested and wanted. To be able to automatically
  74  extract information from the web, to fake users, to post or upload data to
  75  web servers are all important tasks today.
  76
  77  Curl is a command line tool for doing all sorts of URL manipulations and
  78  transfers, but this particular document will focus on how to use it when
  79  doing HTTP requests for fun and profit. I'll assume that you know how to
  80  invoke 'curl --help' or 'curl --manual' to get basic information about it.
  81
  82  Curl is not written to do everything for you. It makes the requests, it gets
  83  the data, it sends data and it retrieves the information. You probably need
  84  to glue everything together using some kind of script language or repeated
  85  manual invokes.
  86
  87  1.2 The HTTP Protocol
  88
  89  HTTP is the protocol used to fetch data from web servers. It is a very simple
  90  protocol that is built upon TCP/IP. The protocol also allows information to
  91  get sent to the server from the client using a few different methods, as will
  92  be shown here.
  93
  94  HTTP is plain ASCII text lines being sent by the client to a server to
  95  request a particular action, and then the server replies a few text lines
  96  before the actual requested content is sent to the client.
  97
  98  The client, curl, sends a HTTP request. The request contains a method (like
  99  GET, POST, HEAD etc), a number of request headers and sometimes a request
 100  body. The HTTP server responds with a status line (indicating if things went
 101  well), response headers and most often also a response body. The "body" part
 102  is the plain data you requested, like the actual HTML or the image etc.
 103
 104  1.3 See the Protocol
 105
 106   Using curl's option --verbose (-v as a short option) will display what kind
 107   of commands curl sends to the server, as well as a few other informational
 108   texts.
 109
 110   --verbose is the single most useful option when it comes to debug or even
 111   understand the curl<->server interaction.
 112
 113   Sometimes even --verbose is not enough. Then --trace and --trace-ascii offer
 114   even more details as they show EVERYTHING curl sends and receives. Use it
 115   like this:
 116
 117       curl --trace-ascii debugdump.txt http://www.example.com/
 118
 119  1.4 See the Timing
 120
 121   Many times you may wonder what exactly is taking all the time, or you just
 122   want to know the amount of milliseconds between two points in a
 123   transfer. For those, and other similar situations, the --trace-time option
 124   is what you need. It'll prepend the time to each trace output line:
 125
 126       curl --trace-ascii d.txt --trace-time http://example.com/
 127
 128  1.5 See the Response
 129
 130   By default curl sends the response to stdout. You need to redirect it
 131   somewhere to avoid that, most often that is done with -o or -O.
 132
 133 2. URL
 134
 135  2.1 Spec
 136
 137  The Uniform Resource Locator format is how you specify the address of a
 138  particular resource on the Internet. You know these, you've seen URLs like
 139  https://curl.haxx.se or https://yourbank.com a million times. RFC 3986 is the
 140  canonical spec. And yeah, the formal name is not URL, it is URI.
 141
 142  2.2 Host
 143
 144  The host name is usually resolved using DNS or your /etc/hosts file to an IP
 145  address and that's what curl will communicate with. Alternatively you specify
 146  the IP address directly in the URL instead of a name.
 147
 148  For development and other trying out situations, you can point to a different
 149  IP address for a host name than what would otherwise be used, by using curl's
 150  --resolve option:
 151
 152       curl --resolve www.example.org:80:127.0.0.1 http://www.example.org/
 153
 154  2.3 Port number
 155
 156  Each protocol curl supports operates on a default port number, be it over TCP
 157  or in some cases UDP. Normally you don't have to take that into
 158  consideration, but at times you run test servers on other ports or
 159  similar. Then you can specify the port number in the URL with a colon and a
 160  number immediately following the host name. Like when doing HTTP to port
 161  1234:
 162
 163       curl http://www.example.org:1234/
 164
 165  The port number you specify in the URL is the number that the server uses to
 166  offer its services. Sometimes you may use a local proxy, and then you may
 167  need to specify that proxy's port number separately for what curl needs to
 168  connect to locally. Like when using a HTTP proxy on port 4321:
 169
 170       curl --proxy http://proxy.example.org:4321 http://remote.example.org/
 171
 172  2.4 User name and password
 173
 174  Some services are setup to require HTTP authentication and then you need to
 175  provide name and password which is then transferred to the remote site in
 176  various ways depending on the exact authentication protocol used.
 177
 178  You can opt to either insert the user and password in the URL or you can
 179  provide them separately:
 180
 181       curl http://user:password@example.org/
 182
 183  or
 184
 185       curl -u user:password http://example.org/
 186
 187  You need to pay attention that this kind of HTTP authentication is not what
 188  is usually done and requested by user-oriented web sites these days. They
 189  tend to use forms and cookies instead.
 190
 191  2.5 Path part
 192
 193  The path part is just sent off to the server to request that it sends back
 194  the associated response. The path is what is to the right side of the slash
 195  that follows the host name and possibly port number.
 196
 197 3. Fetch a page
 198
 199  3.1 GET
 200
 201  The simplest and most common request/operation made using HTTP is to GET a
 202  URL. The URL could itself refer to a web page, an image or a file. The client
 203  issues a GET request to the server and receives the document it asked for.
 204  If you issue the command line
 205
 206         curl https://curl.haxx.se
 207
 208  you get a web page returned in your terminal window. The entire HTML document
 209  that that URL holds.
 210
 211  All HTTP replies contain a set of response headers that are normally hidden,
 212  use curl's --include (-i) option to display them as well as the rest of the
 213  document.
 214
 215  3.2 HEAD
 216
 217  You can ask the remote server for ONLY the headers by using the --head (-I)
 218  option which will make curl issue a HEAD request. In some special cases
 219  servers deny the HEAD method while others still work, which is a particular
 220  kind of annoyance.
 221
 222  The HEAD method is defined and made so that the server returns the headers
 223  exactly the way it would do for a GET, but without a body. It means that you
 224  may see a Content-Length: in the response headers, but there must not be an
 225  actual body in the HEAD response.
 226
 227  3.3 Multiple URLs in a single command line
 228
 229  A single curl command line may involve one or many URLs. The most common case
 230  is probably to just use one, but you can specify any amount of URLs. Yes
 231  any. No limits. You'll then get requests repeated over and over for all the
 232  given URLs.
 233
 234  Example, send two GETs:
 235
 236     curl http://url1.example.com http://url2.example.com
 237
 238  If you use --data to POST to the URL, using multiple URLs means that you send
 239  that same POST to all the given URLs.
 240
 241  Example, send two POSTs:
 242
 243     curl --data name=curl http://url1.example.com http://url2.example.com
 244
 245
 246  3.4 Multiple HTTP methods in a single command line
 247
 248  Sometimes you need to operate on several URLs in a single command line and do
 249  different HTTP methods on each. For this, you'll enjoy the --next option. It
 250  is basically a separator that separates a bunch of options from the next. All
 251  the URLs before --next will get the same method and will get all the POST
 252  data merged into one.
 253
 254  When curl reaches the --next on the command line, it'll sort of reset the
 255  method and the POST data and allow a new set.
 256
 257  Perhaps this is best shown with a few examples. To send first a HEAD and then
 258  a GET:
 259
 260    curl -I http://example.com --next http://example.com
 261
 262  To first send a POST and then a GET:
 263
 264    curl -d score=10 http://example.com/post.cgi --next http://example.com/results.html
 265
 266
 267 4. HTML forms
 268
 269  4.1 Forms explained
 270
 271  Forms are the general way a web site can present a HTML page with fields for
 272  the user to enter data in, and then press some kind of 'OK' or 'Submit'
 273  button to get that data sent to the server. The server then typically uses
 274  the posted data to decide how to act. Like using the entered words to search
 275  in a database, or to add the info in a bug tracking system, display the entered
 276  address on a map or using the info as a login-prompt verifying that the user
 277  is allowed to see what it is about to see.
 278
 279  Of course there has to be some kind of program on the server end to receive
 280  the data you send. You cannot just invent something out of the air.
 281
 282  4.2 GET
 283
 284   A GET-form uses the method GET, as specified in HTML like:
 285
 286         <form method="GET" action="junk.cgi">
 287           <input type=text name="birthyear">
 288           <input type=submit name=press value="OK">
 289         </form>
 290
 291   In your favorite browser, this form will appear with a text box to fill in
 292   and a press-button labeled "OK". If you fill in '1905' and press the OK
 293   button, your browser will then create a new URL to get for you. The URL will
 294   get "junk.cgi?birthyear=1905&press=OK" appended to the path part of the
 295   previous URL.
 296
 297   If the original form was seen on the page "www.hotmail.com/when/birth.html",
 298   the second page you'll get will become
 299   "www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK".
 300
 301   Most search engines work this way.
 302
 303   To make curl do the GET form post for you, just enter the expected created
 304   URL:
 305
 306         curl "http://www.hotmail.com/when/junk.cgi?birthyear=1905&press=OK"
 307
 308  4.3 POST
 309
 310   The GET method makes all input field names get displayed in the URL field of
 311   your browser. That's generally a good thing when you want to be able to
 312   bookmark that page with your given data, but it is an obvious disadvantage
 313   if you entered secret information in one of the fields or if there are a
 314   large amount of fields creating a very long and unreadable URL.
 315
 316   The HTTP protocol then offers the POST method. This way the client sends the
 317   data separated from the URL and thus you won't see any of it in the URL
 318   address field.
 319
 320   The form would look very similar to the previous one:
 321
 322         <form method="POST" action="junk.cgi">
 323           <input type=text name="birthyear">
 324           <input type=submit name=press value=" OK ">
 325         </form>
 326
 327   And to use curl to post this form with the same data filled in as before, we
 328   could do it like:
 329
 330         curl --data "birthyear=1905&press=%20OK%20" \
 331         http://www.example.com/when.cgi
 332
 333   This kind of POST will use the Content-Type
 334   application/x-www-form-urlencoded and is the most widely used POST kind.
 335
 336   The data you send to the server MUST already be properly encoded, curl will
 337   not do that for you. For example, if you want the data to contain a space,
 338   you need to replace that space with %20 etc. Failing to comply with this
 339   will most likely cause your data to be received wrongly and messed up.
 340
 341   Recent curl versions can in fact url-encode POST data for you, like this:
 342
 343         curl --data-urlencode "name=I am Daniel" http://www.example.com
 344
 345   If you repeat --data several times on the command line, curl will
 346   concatenate all the given data pieces - and put a '&' symbol between each
 347   data segment.
 348
 349  4.4 File Upload POST
 350
 351   Back in late 1995 they defined an additional way to post data over HTTP. It
 352   is documented in the RFC 1867, why this method sometimes is referred to as
 353   RFC1867-posting.
 354
 355   This method is mainly designed to better support file uploads. A form that
 356   allows a user to upload a file could be written like this in HTML:
 357
 358     <form method="POST" enctype='multipart/form-data' action="upload.cgi">
 359       <input type=file name=upload>
 360       <input type=submit name=press value="OK">
 361     </form>
 362
 363   This clearly shows that the Content-Type about to be sent is
 364   multipart/form-data.
 365
 366   To post to a form like this with curl, you enter a command line like:
 367
 368         curl --form upload=@localfilename --form press=OK [URL]
 369
 370  4.5 Hidden Fields
 371
 372   A very common way for HTML based applications to pass state information
 373   between pages is to add hidden fields to the forms. Hidden fields are
 374   already filled in, they aren't displayed to the user and they get passed
 375   along just as all the other fields.
 376
 377   A similar example form with one visible field, one hidden field and one
 378   submit button could look like:
 379
 380     <form method="POST" action="foobar.cgi">
 381       <input type=text name="birthyear">
 382       <input type=hidden name="person" value="daniel">
 383       <input type=submit name="press" value="OK">
 384     </form>
 385
 386   To POST this with curl, you won't have to think about if the fields are
 387   hidden or not. To curl they're all the same:
 388
 389         curl --data "birthyear=1905&press=OK&person=daniel" [URL]
 390
 391  4.6 Figure Out What A POST Looks Like
 392
 393   When you're about fill in a form and send to a server by using curl instead
 394   of a browser, you're of course very interested in sending a POST exactly the
 395   way your browser does.
 396
 397   An easy way to get to see this, is to save the HTML page with the form on
 398   your local disk, modify the 'method' to a GET, and press the submit button
 399   (you could also change the action URL if you want to).
 400
 401   You will then clearly see the data get appended to the URL, separated with a
 402   '?'-letter as GET forms are supposed to.
 403
 404 5. HTTP upload
 405
 406  5.1 PUT
 407
 408  Perhaps the best way to upload data to a HTTP server is to use PUT. Then
 409  again, this of course requires that someone put a program or script on the
 410  server end that knows how to receive a HTTP PUT stream.
 411
 412  Put a file to a HTTP server with curl:
 413
 414         curl --upload-file uploadfile http://www.example.com/receive.cgi
 415
 416 6. HTTP Authentication
 417
 418  6.1 Basic Authentication
 419
 420  HTTP Authentication is the ability to tell the server your username and
 421  password so that it can verify that you're allowed to do the request you're
 422  doing. The Basic authentication used in HTTP (which is the type curl uses by
 423  default) is *plain* *text* based, which means it sends username and password
 424  only slightly obfuscated, but still fully readable by anyone that sniffs on
 425  the network between you and the remote server.
 426
 427  To tell curl to use a user and password for authentication:
 428
 429         curl --user name:password http://www.example.com
 430
 431  6.2 Other Authentication
 432
 433  The site might require a different authentication method (check the headers
 434  returned by the server), and then --ntlm, --digest, --negotiate or even
 435  --anyauth might be options that suit you.
 436
 437  6.3 Proxy Authentication
 438
 439  Sometimes your HTTP access is only available through the use of a HTTP
 440  proxy. This seems to be especially common at various companies. A HTTP proxy
 441  may require its own user and password to allow the client to get through to
 442  the Internet. To specify those with curl, run something like:
 443
 444         curl --proxy-user proxyuser:proxypassword curl.haxx.se
 445
 446  If your proxy requires the authentication to be done using the NTLM method,
 447  use --proxy-ntlm, if it requires Digest use --proxy-digest.
 448
 449  If you use any one of these user+password options but leave out the password
 450  part, curl will prompt for the password interactively.
 451
 452  6.4 Hiding credentials
 453
 454  Do note that when a program is run, its parameters might be possible to see
 455  when listing the running processes of the system. Thus, other users may be
 456  able to watch your passwords if you pass them as plain command line
 457  options. There are ways to circumvent this.
 458
 459  It is worth noting that while this is how HTTP Authentication works, very
 460  many web sites will not use this concept when they provide logins etc. See
 461  the Web Login chapter further below for more details on that.
 462
 463 7. More HTTP Headers
 464
 465  7.1 Referer
 466
 467  A HTTP request may include a 'referer' field (yes it is misspelled), which
 468  can be used to tell from which URL the client got to this particular
 469  resource. Some programs/scripts check the referer field of requests to verify
 470  that this wasn't arriving from an external site or an unknown page. While
 471  this is a stupid way to check something so easily forged, many scripts still
 472  do it. Using curl, you can put anything you want in the referer-field and
 473  thus more easily be able to fool the server into serving your request.
 474
 475  Use curl to set the referer field with:
 476
 477         curl --referer http://www.example.come http://www.example.com
 478
 479  7.2 User Agent
 480
 481  Very similar to the referer field, all HTTP requests may set the User-Agent
 482  field. It names what user agent (client) that is being used. Many
 483  applications use this information to decide how to display pages. Silly web
 484  programmers try to make different pages for users of different browsers to
 485  make them look the best possible for their particular browsers. They usually
 486  also do different kinds of javascript, vbscript etc.
 487
 488  At times, you will see that getting a page with curl will not return the same
 489  page that you see when getting the page with your browser. Then you know it
 490  is time to set the User Agent field to fool the server into thinking you're
 491  one of those browsers.
 492
 493  To make curl look like Internet Explorer 5 on a Windows 2000 box:
 494
 495   curl --user-agent "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" [URL]
 496
 497  Or why not look like you're using Netscape 4.73 on an old Linux box:
 498
 499   curl --user-agent "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" [URL]
 500
 501 8. Redirects
 502
 503  8.1 Location header
 504
 505  When a resource is requested from a server, the reply from the server may
 506  include a hint about where the browser should go next to find this page, or a
 507  new page keeping newly generated output. The header that tells the browser
 508  to redirect is Location:.
 509
 510  Curl does not follow Location: headers by default, but will simply display
 511  such pages in the same manner it displays all HTTP replies. It does however
 512  feature an option that will make it attempt to follow the Location: pointers.
 513
 514  To tell curl to follow a Location:
 515
 516         curl --location http://www.example.com
 517
 518  If you use curl to POST to a site that immediately redirects you to another
 519  page, you can safely use --location (-L) and --data/--form together. Curl will
 520  only use POST in the first request, and then revert to GET in the following
 521  operations.
 522
 523  8.2 Other redirects
 524
 525  Browser typically support at least two other ways of redirects that curl
 526  doesn't: first the html may contain a meta refresh tag that asks the browser
 527  to load a specific URL after a set number of seconds, or it may use
 528  javascript to do it.
 529
 530 9. Cookies
 531
 532  9.1 Cookie Basics
 533
 534  The way the web browsers do "client side state control" is by using
 535  cookies. Cookies are just names with associated contents. The cookies are
 536  sent to the client by the server. The server tells the client for what path
 537  and host name it wants the cookie sent back, and it also sends an expiration
 538  date and a few more properties.
 539
 540  When a client communicates with a server with a name and path as previously
 541  specified in a received cookie, the client sends back the cookies and their
 542  contents to the server, unless of course they are expired.
 543
 544  Many applications and servers use this method to connect a series of requests
 545  into a single logical session. To be able to use curl in such occasions, we
 546  must be able to record and send back cookies the way the web application
 547  expects them. The same way browsers deal with them.
 548
 549  9.2 Cookie options
 550
 551  The simplest way to send a few cookies to the server when getting a page with
 552  curl is to add them on the command line like:
 553
 554         curl --cookie "name=Daniel" http://www.example.com
 555
 556  Cookies are sent as common HTTP headers. This is practical as it allows curl
 557  to record cookies simply by recording headers. Record cookies with curl by
 558  using the --dump-header (-D) option like:
 559
 560         curl --dump-header headers_and_cookies http://www.example.com
 561
 562  (Take note that the --cookie-jar option described below is a better way to
 563  store cookies.)
 564
 565  Curl has a full blown cookie parsing engine built-in that comes in use if you
 566  want to reconnect to a server and use cookies that were stored from a
 567  previous connection (or hand-crafted manually to fool the server into
 568  believing you had a previous connection). To use previously stored cookies,
 569  you run curl like:
 570
 571         curl --cookie stored_cookies_in_file http://www.example.com
 572
 573  Curl's "cookie engine" gets enabled when you use the --cookie option. If you
 574  only want curl to understand received cookies, use --cookie with a file that
 575  doesn't exist. Example, if you want to let curl understand cookies from a
 576  page and follow a location (and thus possibly send back cookies it received),
 577  you can invoke it like:
 578
 579         curl --cookie nada --location http://www.example.com
 580
 581  Curl has the ability to read and write cookie files that use the same file
 582  format that Netscape and Mozilla once used. It is a convenient way to share
 583  cookies between scripts or invokes. The --cookie (-b) switch automatically
 584  detects if a given file is such a cookie file and parses it, and by using the
 585  --cookie-jar (-c) option you'll make curl write a new cookie file at the end
 586  of an operation:
 587
 588         curl --cookie cookies.txt --cookie-jar newcookies.txt \
 589         http://www.example.com
 590
 591 10. HTTPS
 592
 593  10.1 HTTPS is HTTP secure
 594
 595  There are a few ways to do secure HTTP transfers. By far the most common
 596  protocol for doing this is what is generally known as HTTPS, HTTP over
 597  SSL. SSL encrypts all the data that is sent and received over the network and
 598  thus makes it harder for attackers to spy on sensitive information.
 599
 600  SSL (or TLS as the latest version of the standard is called) offers a
 601  truckload of advanced features to allow all those encryptions and key
 602  infrastructure mechanisms encrypted HTTP requires.
 603
 604  Curl supports encrypted fetches when built to use a TLS library and it can be
 605  built to use one out of a fairly large set of libraries - "curl -V" will show
 606  which one your curl was built to use (if any!). To get a page from a HTTPS
 607  server, simply run curl like:
 608
 609         curl https://secure.example.com
 610
 611  10.2 Certificates
 612
 613   In the HTTPS world, you use certificates to validate that you are the one
 614   you claim to be, as an addition to normal passwords. Curl supports client-
 615   side certificates. All certificates are locked with a pass phrase, which you
 616   need to enter before the certificate can be used by curl. The pass phrase
 617   can be specified on the command line or if not, entered interactively when
 618   curl queries for it. Use a certificate with curl on a HTTPS server like:
 619
 620         curl --cert mycert.pem https://secure.example.com
 621
 622   curl also tries to verify that the server is who it claims to be, by
 623   verifying the server's certificate against a locally stored CA cert
 624   bundle. Failing the verification will cause curl to deny the connection. You
 625   must then use --insecure (-k) in case you want to tell curl to ignore that
 626   the server can't be verified.
 627
 628   More about server certificate verification and ca cert bundles can be read
 629   in the SSLCERTS document, available online here:
 630
 631         https://curl.haxx.se/docs/sslcerts.html
 632
 633   At times you may end up with your own CA cert store and then you can tell
 634   curl to use that to verify the server's certificate:
 635
 636         curl --cacert ca-bundle.pem https://example.com/
 637
 638
 639 11. Custom Request Elements
 640
 641 11.1 Modify method and headers
 642
 643  Doing fancy stuff, you may need to add or change elements of a single curl
 644  request.
 645
 646  For example, you can change the POST request to a PROPFIND and send the data
 647  as "Content-Type: text/xml" (instead of the default Content-Type) like this:
 648
 649          curl --data "<xml>" --header "Content-Type: text/xml" \
 650               --request PROPFIND url.com
 651
 652  You can delete a default header by providing one without content. Like you
 653  can ruin the request by chopping off the Host: header:
 654
 655         curl --header "Host:" http://www.example.com
 656
 657  You can add headers the same way. Your server may want a "Destination:"
 658  header, and you can add it:
 659
 660         curl --header "Destination: http://nowhere" http://example.com
 661
 662  11.2 More on changed methods
 663
 664  It should be noted that curl selects which methods to use on its own
 665  depending on what action to ask for. -d will do POST, -I will do HEAD and so
 666  on. If you use the --request / -X option you can change the method keyword
 667  curl selects, but you will not modify curl's behavior. This means that if you
 668  for example use -d "data" to do a POST, you can modify the method to a
 669  PROPFIND with -X and curl will still think it sends a POST. You can change
 670  the normal GET to a POST method by simply adding -X POST in a command line
 671  like:
 672
 673         curl -X POST http://example.org/
 674
 675  ... but curl will still think and act as if it sent a GET so it won't send any
 676  request body etc.
 677
 678
 679 12. Web Login
 680
 681  12.1 Some login tricks
 682
 683  While not strictly just HTTP related, it still causes a lot of people problems
 684  so here's the executive run-down of how the vast majority of all login forms
 685  work and how to login to them using curl.
 686
 687  It can also be noted that to do this properly in an automated fashion, you
 688  will most certainly need to script things and do multiple curl invokes etc.
 689
 690  First, servers mostly use cookies to track the logged-in status of the
 691  client, so you will need to capture the cookies you receive in the
 692  responses. Then, many sites also set a special cookie on the login page (to
 693  make sure you got there through their login page) so you should make a habit
 694  of first getting the login-form page to capture the cookies set there.
 695
 696  Some web-based login systems feature various amounts of javascript, and
 697  sometimes they use such code to set or modify cookie contents. Possibly they
 698  do that to prevent programmed logins, like this manual describes how to...
 699  Anyway, if reading the code isn't enough to let you repeat the behavior
 700  manually, capturing the HTTP requests done by your browsers and analyzing the
 701  sent cookies is usually a working method to work out how to shortcut the
 702  javascript need.
 703
 704  In the actual <form> tag for the login, lots of sites fill-in random/session
 705  or otherwise secretly generated hidden tags and you may need to first capture
 706  the HTML code for the login form and extract all the hidden fields to be able
 707  to do a proper login POST. Remember that the contents need to be URL encoded
 708  when sent in a normal POST.
 709
 710 13. Debug
 711
 712  13.1 Some debug tricks
 713
 714  Many times when you run curl on a site, you'll notice that the site doesn't
 715  seem to respond the same way to your curl requests as it does to your
 716  browser's.
 717
 718  Then you need to start making your curl requests more similar to your
 719  browser's requests:
 720
 721  * Use the --trace-ascii option to store fully detailed logs of the requests
 722  for easier analyzing and better understanding
 723
 724  * Make sure you check for and use cookies when needed (both reading with
 725  --cookie and writing with --cookie-jar)
 726
 727  * Set user-agent to one like a recent popular browser does
 728
 729  * Set referer like it is set by the browser
 730
 731  * If you use POST, make sure you send all the fields and in the same order as
 732  the browser does it.
 733
 734  A very good helper to make sure you do this right, is the LiveHTTPHeader tool
 735  that lets you view all headers you send and receive with Mozilla/Firefox
 736  (even when using HTTPS). Chrome features similar functionality out of the box
 737  among the developer's tools.
 738
 739  A more raw approach is to capture the HTTP traffic on the network with tools
 740  such as ethereal or tcpdump and check what headers that were sent and
 741  received by the browser. (HTTPS makes this technique inefficient.)
 742
 743 14. References
 744
 745  14.1 Standards
 746
 747  RFC 7230 is a must to read if you want in-depth understanding of the HTTP
 748  protocol
 749
 750  RFC 3986 explains the URL syntax
 751
 752  RFC 1867 defines the HTTP post upload format
 753
 754  RFC 6525 defines how HTTP cookies work
 755
 756  14.2 Sites
 757
 758  https://curl.haxx.se is the home of the curl project